Make VIE faster by jishnujayakumar · Pull Request #16 · IRVLUTD/HRT1

jishnujayakumar · 2026-05-07T18:54:39Z

No description provided.

…(default off) The matplotlib overlay path in propagate_masks_and_save was creating a fresh figure per frame, opening the source image, redrawing every prior centroid in an O(N^2) loop, and calling savefig — dwarfing the actual mask cost. The binary mask PNGs (the output downstream BundleSDF actually consumes) are now the only thing produced by default; the overlay is opt-in via --save_traj_overlay. When opted-in the figure is reused across frames and the centroid trail is appended incrementally rather than replotted from scratch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two no-quality-loss changes: 1. The Nelder-Mead translation refinement was running with xatol=1e-8 (very tight), no maxiter (unbounded), and disp=True (per-call console I/O). It typically converges in well under 50 iterations to within 1e-5 in metric units; the tighter tolerance was buying nothing visible in the depth-aligned mesh and dominated runtime. Defaults are now xatol=1e-5, maxiter=50, disp=False, all overrideable via --opt_xatol / --opt_maxiter / --opt_disp. 2. The per-batch-item regression_img + side_img pyrender passes and the {img_fn}_all.jpg overlay write are pure debug visualizations. They are now gated behind --save_debug_renders (default off). cam_view itself still renders since the depth_pc target_mask is derived from it. Mesh outputs (model/, 3dhand/, scene/) and the optimized translation are unaffected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Picks up Phase 1 grasp-transfer wins: - Hoist AdamGraspTransfer out of per-frame loop - Skip redundant target_handmodel reload in reset() - Expose --max_iter (default 100, was 300) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two Phase 1 wins for the BundleSDF leg: 1. Bumps the BundleSDF submodule to jishnu/fasten-vie@298918c, which adds -k (keep) and skip-rebuild handling to docker/start_docker.sh. Repeat launches reuse the running container instead of doing down + up --build each time. 2. Adds --n_step to run_bundlesdf.py, plumbed through BundleSDFProcessor into cfg_nerf['n_step']. Default unchanged (config.yml's value, currently 10) so this is a no-op at default; lower values trade reconstruction quality for NeRF training speed and are intended for Phase 3 tuning. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…tation The per-bbox loop in run_gdino_samv2 was calling propagate_masks_and_save once per detected bbox. Each call ran SAM2's init_state(video_path=video_dir), which scans + caches every video frame — N times the I/O for N bboxes, even though SAM2 supports tracking multiple objects simultaneously. New propagate_masks_and_save_multi(video_dir, bboxes, ...) calls init_state once, registers every bbox as its own object id on frame 0, and runs a single propagate_in_video loop that yields all object masks per frame. Filename rule: when len(bboxes) == 1 the saved mask path is unchanged; when N > 1 each file is prefixed obj{i}_<frame>.png so masks no longer overwrite each other (the prior code silently dropped all but the last bbox's masks). A timing log line (init / propagate+save / total) prints at the end so users can see the speedup directly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…efaults Three combined changes for real-time hand extraction: 1. Warm-start: cache the optimized translation_new per hand side (left=0, right=1) on HandInfoExtractor and feed it as x0 to the next frame's minimize() instead of mean(depth_pc.points). Hand poses change smoothly between frames so this seeds the optimizer near the answer. 2. Aggressive defaults: xatol 1e-5 -> 1e-4 (≈0.1mm), maxiter 50 -> 30. With warm-starting these are typically enough for sub-mm convergence; revert via --opt_xatol / --opt_maxiter if quality regresses. 3. Per-frame timing summary printed at the end of the run (avg ms/frame, total) so the speedup is observable without external profiling. Disable warm-starting with --no_warm_start to A/B against the cold-start path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…sive defaults) Picks up jishnu/fasten-vie@bdca654 with: - AdamGraspTransfer warm-start from prior frame's q_current - num_particles 32 -> 16, max_iter 100 -> 50 defaults - Per-frame timing summary Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…iming Phase 2/3 BundleSDF wins outside Docker: 1. Pre-cache: when not using a live segmenter, all masks are loaded into RAM in one pass before the frame loop instead of being read from disk inside the hot loop. Mask files are ~tens of KB each, so this adds at most a few MB of memory for hundreds of frames and removes per-frame disk seeks. 2. Aggressive default: --n_step now defaults to 5 (was None = config.yml's 10). NeRF training was running 10 iters every keyframe trigger; with continual=true this fires repeatedly. 5 is usually enough for tracking- accurate poses; raise back to 10 if reconstruction quality regresses. 3. Per-frame timing summary printed at the end of process(), so the speedup is visible without external profiling. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Runs gdino+samv2, hamer, rfp-grasp-transfer, and bundlesdf on a given task dir, activating the appropriate conda env per module and logging to a timestamped file under the task dir. Each module's own timing instrumentation ([samv2] / [hamer] / [grasp-transfer] / [bundlesdf]) lands in the log together with a wall-clock '[bench] N/4 ... OK in Xs' summary per step. Skips any module whose inputs are missing (no /rgb, no /depth, no MANO models, no docker) so partial runs are useful. Usage: ./bench_vie.sh /path/to/task_data_root [text_prompt] git checkout main && ./bench_vie.sh ... git checkout jishnu/fasten-vie && ./bench_vie.sh ... diff /path/to/task/bench_*.log Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds Python 3.10 / PyTorch / CUDA / MIT license / IRVL UTD lab badges plus upstream attribution for GroundingDINO, SAM 2, HaMeR, and BundleSDF at the top of vie/README.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…er module Adds a per-module benchmark section and TOC entries: - GDINO+SAMv2: measured 7.86x speedup on robokit/perception.py::propagate_masks_and_save (139.70s -> 17.77s, 1995.6 -> 253.8 ms/frame) on task_39_seasoning_on_omlette_v1 with single bbox on RTX 5070 Laptop. Measured with a SAM2-only mini-bench that bypasses GDINO; the win comes from gating per-frame matplotlib overlay generation behind --save_traj_overlay (off by default). - HaMeR / rfp-grasp-transfer / BundleSDF: described with the underlying mechanism (warm-starts, hoisting, gated debug renders, persistent docker, smaller particle batches), the per-module timing log to look for, and an honest note that they were not measured on the dev machine due to MANO models, Blackwell-incompatible torch in robokit-py3.10, and missing Docker. Pointers users to scripts/bench_vie.sh for end-to-end A/B on a working rig. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…th measured numbers Replaces the "not measured here" sections with real A/B numbers from isolated micro-benches that bypass the env walls (no MANO, no Blackwell torch needed): HaMeR: 209.57s -> 32.27s (70 frames, 138 calls), 6.49x speedup. Bench drives off existing pred_vertices/pred_cam_t in out/hamer/model/*.npz so MANO + the HaMeR forward pass aren't required; only the scipy minimize stage differs between branches. rfp-grasp-transfer: ~1240 ms/frame -> ~870 ms/frame, ~1.5x. Synthetic smooth-walking q on CPU (robokit env's torch 2.3.1+cu118 lacks sm_120). Smaller than the survey's "expected 4-8x" — discusses why (thermal noise, fixed reset() cost on CPU) and notes GPU speedup should be larger. Also documents the Phase 1 reset() reload-skip that benchmarking caught as a regression and got reverted in 2071aab — a real "the obvious optimization is the wrong one" finding. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two stacking changes for Phase 4. 1. HaMeR fp16 autocast on the transformer forward pass. torch.amp.autocast(dtype=fp16) wrapped around self.model(batch). Active on CUDA only (no-op on CPU); enabled by default. Expected ~1.5-2x speedup on the model fwd with no observable mesh-quality regression. Disable via --no_fp16 to fall back to fp32. Note: this change is code-only on the dev rig (no MANO models + the robokit-py3.10 conda env's torch lacks Blackwell sm_120 kernels), so the speedup is not measured here. It will land on a working rig where extract_hand_bboxes_and_meshes.py actually runs. 2. Bump rfp-grasp-transfer submodule to jishnu/fasten-vie@f8badfb (Phase 4 correspondence cache + jittered particle init). Investigated and rejected: cKDTree drop-in for sklearn.neighbors.KDTree in hamer/mesh_to_sdf/rgbd2pc.py. Bench ran 17+ minutes vs sklearn's 4 minutes before being killed — cKDTree is slower for this query pattern (777 verts against ~300k depth points, 138 queries per scipy minimize call). Sticking with sklearn KDTree. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ee rejected Adds Phase 4 sections to the existing HaMeR and rfp-grasp-transfer benchmark blocks: - HaMeR: documents fp16 autocast on the transformer fwd (default on, --no_fp16 disables) and explicitly notes that cKDTree-as-drop-in for sklearn.neighbors.KDTree was investigated and rejected (slower for the 777 verts × 300k depth points × 138 queries-per-minimize pattern). - rfp-grasp-transfer: documents the correspondence cache and jittered- particle-init wins, plus the CPU re-bench (within Phase 3 noise; gains are convergence-quality, not raw wall-time on CPU). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Standing this up on a fresh Blackwell laptop revealed five real walls that the original setup_vie.sh + requirements.txt didn't anticipate. This commit captures every workaround so a clean re-install just works. Specifically: - setup_vie.sh - pin transformers==4.47.1 (>=5 dropped BertModel.get_head_mask which the pinned old GroundingDINO needs) - pin setuptools<70 before installing mmcv (legacy mmcv setup.py imports pkg_resources which newer setuptools dropped) - install mmcv==1.5.0 explicitly with --no-build-isolation (HaMeR's pinned mmcv==1.3.9 fails to build on Python 3.10 toolchains, and mmpose 0.24 only accepts mmcv in [1.3.8, 1.5.0]) - install hamer with --no-deps so its strict mmcv pin doesn't undo the above - apply an in-place patch to groundingdino's ms_deform_attn.py so it falls back to the pure-PyTorch implementation when the _C CUDA extension isn't built (which is the common case — the pip wheel ships no _C and source builds need a matching CUDA toolchain) - re-pin numpy<2 after HaMeR's editable install (HaMeR drags in numpy>=2 which breaks matplotlib + many c-extensions) - print a clear MANO + Blackwell-torch reminder at the end - requirements.txt - pin transformers==4.47.1 - pin setuptools<70 (build-time) - add the deps that hamer needs but its setup.py doesn't list cleanly (yacs, smplx, einops, jaxtyping, iopath, fvcore, omegaconf, hydra-core, pytorch_lightning, torchmetrics, timm, huggingface_hub, tokenizers, safetensors) - hamer/setup.py - relax mmcv==1.3.9 to mmcv>=1.3.8,<=1.5.0 with a comment explaining why - robokit/perception.py - emit a clear actionable warning at import time if groundingdino._C is missing AND ms_deform_attn.py hasn't been patched (so users know to re-run setup_vie.sh) - README.md - new "Install Gotchas" section documenting all five workarounds so users debugging a fresh install can map a symptom to a fix Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chumpy 0.70 (used by smplx to unpickle MANO .pkl files) does from numpy import bool, int, float, complex, object, unicode, str, nan, inf which fails on numpy 1.20+ where these bare-Python aliases were removed from the numpy namespace. Patch chumpy's __init__.py in-place to set the aliases on numpy before the legacy import line, so MANO loading succeeds without needing a stale numpy. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ight GPUs The full HaMeR pipeline loads ViTDet-Huge (~2.5GB) + ViTPose (~1.2GB) + the HaMeR transformer (~4GB) + BERT simultaneously, which OOMs on 8GB cards (e.g. RTX 5070 Laptop). The detector module already supports a 'regnety' alternative that's ~10x smaller; this change wires it through to the CLI as --body_detector. Default stays 'vitdet' (no behavior change for users with plenty of VRAM). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…en no viz)

…4x speedup)

…fer, 2.78x at F=4)

…amGraspTransfer) Adds two new sub-sections under the rfp-grasp-transfer benchmark block: Phase 5: deepcopy snapshot in reset() Profiling found URDF reload in reset() was ~98% of remaining per-frame cost with high variance (150-500 ms). Replaced with copy.deepcopy of a snapshot taken at __init__. After: 267-277 ms rock-solid, 1.67 it/s. Phase 6: BatchedAdamGraspTransfer (frame batching) Process N frames in a single Adam call. Measured on RTX 5070 Laptop: F=1 1486 ms/frame 119s wall 1.00x F=4 301 ms/frame 43s wall 2.78x <- sweet spot F=8 425 ms/frame 47s wall 2.51x F=16 331 ms/frame 47s wall 2.52x Verified: all 70 frames produced PLYs in both paths (138 due to two frames having only left hand in original HaMeR output). Combined Phases 1-6 = ~5x wall-clock vs main on this rig. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Profiling showed 96% of per-frame time in the GSAM pipeline is the SAM2 forward pass itself; the only meaningful lever left is the model size. Adding --sam2_size {large|base_plus|small|tiny} as a CLI flag (default 'large' = no behavior change for existing users) with auto-download of the checkpoint on first use. Measured on RTX 5070 Laptop, task_39 (70 frames): large propagate+save 13.69s total 26.34s wall 85.90s baseline base_plus propagate+save 7.17s total 17.33s wall 67.13s 1.91x propagate Per-frame steady-state: 196ms (large) -> 102ms (base_plus). Quality drop on clean foreground objects has been minimal in our spot checks; small/textured objects may need 'large'. Smaller variants ('small', 'tiny') wired but unbenched here. The init_state cost is roughly model-independent (~10s, dominated by JPEG decode of all frames) so wall-clock ratio is smaller than propagate ratio. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

SAM2's load_video_frames_from_jpg_images does sequential PIL JPEG decode + resize per frame. On a 70-frame clip this was ~10s of upfront cost, dominated by single-thread JPEG decode. NVIDIA DALI with nvJPEG GPU decoding processes all frames through a single pipeline run, returning the same (N, 3, H, H) ImageNet-normalized tensor SAM2 expects. Implementation: at import time, perception.py tries to install DALI as a monkey-patch for sam2.utils.misc.load_video_frames_from_jpg_images. If DALI isn't installed the original PIL loader stays in place (zero behavior change). Measured on RTX 5070 Laptop, task_39 (70 frames, --sam2_size base_plus): no DALI: init=10.15s propagate=7.17s total=17.33s wall=67.13s with DALI: init=0.93s propagate=6.58s total=7.51s wall=17.55s init_state: 10.9x faster. Full wall-clock: 3.8x over Phase 7 alone (4.9x over Phase 1 baseline). Caveats: - async_loading_frames=True still uses the original loader (DALI's eager pipeline doesn't fit the lazy-frame use case). - batch_size = num_frames; very long videos (1000+ frames) may need a chunked DALI pipeline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…noise Replaces the pile of ad-hoc logging.info / print calls + the ~12 lines of upstream library deprecation/registry warnings that fired on every run with: - A new robokit/log.py module: rich-based Console, RichHandler, plus helper functions section() / step() / note() / warn() / error() / success() / progress() / summary() / fmt_duration() / fmt_rate(). Reusable across all vie entry points. - run_gdino_samv2.py: - top-of-file: warnings.filterwarnings("ignore"), TRANSFORMERS_VERBOSITY and PYTHONWARNINGS env vars, absl.set_verbosity(WARNING), and root-logger level WARNING. Kills upstream chatter. - main() restructured into Configuration / Loading / Detection / Tracking sections with timed step lines and a final colored summary panel. - Wraps GDINO + SAM2 init in a stdout-redirect context manager to swallow BERT's `final text_encoder_type` print and similar one-shot stdouts. - perception.py: - Demoted noisy print() calls in load_model_hf + _load_predictor to logger.debug. - Removed the redundant defensive _C warning (was firing even when the GDINO patch was already applied due to a logic bug; the patch's own one-line warning suffices). - Demoted the [samv2] timing log line to debug since callers now render their own rich summary with these numbers. Visual outcome: • Cyan section rules, green ✓ for steps, dim grey notes for config echoes. • Bordered cyan summary panel at end with objects/frames/detection/ propagation/total wall/fps/output path. • Bold green "✓ Done." banner on success. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… waits Previously each long-running stage (GroundingDINO load, SAM2 load + ckpt download, GDINO inference, propagate init_state) ran silently and only emitted a "✓ X done in Yms" line *after* completing. On a fresh run that's ~15s of staring at nothing. Adds vlog.working(msg) — a context manager that: - shows a live spinner with the message during the op - replaces the spinner line in-place with "✓ msg (Xs)" on success - replaces with "✗ msg (failed after Xs)" + re-raises on exception Wires it into run_gdino_samv2.py for each long stage so users see exactly what's happening at all times. The SAM2 propagate stage already has its own tqdm bar, so we just print a "propagating ..." note before it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The heavy ML imports (torch / GroundingDINO / SAM2 / robokit.perception) take ~5-8s on a fresh process and happen *before* main() runs, so previously the user saw a blank terminal during that whole window. Restructured the script: only the lightweight imports (os, sys, time, vlog) happen at the very top. Once vlog is available we immediately print the section header + start a spinner labeled "Importing ML stack ...", then do the heavy imports inside that spinner's context. Spinner replaces in-place with the usual ✓ confirmation when imports finish. Also moved the post-import flag-definition + logger-quieting steps inside the spinner block (they're trivial after imports anyway), and renamed the in-main "GDINO + SAMv2" header to "Configuration" since the top-level banner already announces what script we're in. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Profiled `python -X importtime` and found four heavy modules being imported at top of perception.py that the GDINO+SAM2-video hot path doesn't touch: pyrender -> drags in pyglet (with GL context init), tkinter, freetype, imageio plugin registry. ~5s. Used only by DepthPC visualization helpers (vis=True paths). mobile_sam -> pulls in MobileSAM encoders + SamPredictor. ~2s. Used only by SegmentAnythingPredictor (mobile-Sam, not SAM2-video). matplotlib + cm -> ~1s. Used only by the opt-in trajectory-overlay path. sklearn.neighbors.KDTree -> ~1s. Used only inside DepthPC. Moved each import to its actual use site (inside class __init__ / method / gated branch). The imports still happen lazily on first use — no behavior change for existing callers — but the GDINO+SAM2-video script no longer pays the cost. Measured on RTX 5070 Laptop, --sam2_size base_plus: before: Import ML stack 13.00s | GDINO load 12.37s | SAM2 load 13.23s | total 43s after: Import ML stack 2.63s | GDINO load 4.33s | SAM2 load 0.72s | total 8s ~5x less wall-clock wait before propagation actually starts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ransfer Adds a single CLI flag name (--save_viz) to all three entry-point scripts that switches on the per-script viz/debug output. Existing per-script flags (--save_traj_overlay, --save_debug_renders, --debug_plots) keep working as before for fine-grained control; --save_viz is the convenience alias users asked for so they don't have to remember three different names. run_gdino_samv2.py : --save_viz -> save_traj_overlay hamer/extract_*.py : --save_viz -> save_debug_renders rfp/transfer_from_*.py : --save_viz -> debug_plots Bumps the rfp-grasp-transfer submodule pointer to pick up the same flag + the batched-path Plotly HTML output that pairs with it. Also fixes a latent bug exposed when actually exercising the overlay path after the lazy-import refactor: SAM2VideoPredictor.show_mask referenced a module-level `plt` that no longer exists. Now imports plt locally. Verified end-to-end: GSAM --save_viz: 70 trajectory-overlay PNGs written. rfp --save_viz (per-frame and batched): 138 Plotly HTMLs written. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ch UX Adds a new "✨ What's new on jishnu/fasten-vie" section near the top summarizing the speed gains, the speed-vs-quality flag table, and the unified --save_viz flag that's now consistent across run_gdino_samv2, extract_hand_bboxes_and_meshes, and transfer_from_hamer. Updates the per-step examples (Steps 3, 4, 5) to: - drop --debug_plots from the rfp example (no longer required for normal operation; only when you want plotly viz) - mention --save_viz, --sam2_size, --frame_batch_size, --no_warm_start, --body_detector inline as opt-in knobs - clarify that the extra_plots / transfer_extra_plots dirs are populated only when --save_viz is on; downstream pipeline data (binary masks, MANO npzs, scene PLYs, gripper PLYs, BundleSDF poses) is always written Also notes the new colored-banner / spinner / summary-panel UX backed by vie/robokit/log.py. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Renames every "jishnu/fasten-vie" reference to just "fasten-vie", and the "What's new on jishnu/fasten-vie" section header to "Performance & UX improvements". The branch is local convention; the README shouldn't read like one person's effort. The two remaining `jishnujayakumar/{robokit,BundleSDF}` URLs in the Acknowledgments are upstream-fork repo links that the project actually depends on at the package level, not author attribution — left intact. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three top-level imports in the HaMeR module that the hot path doesn't exercise on every frame: open3d -> only used in save_point_cloud + save_point_cloud_as_ply pyrender -> only used in compute_sdf_cost(vis=True), opt-in viewer sklearn KMeans -> only used in RGBD2PC.__init__(use_kmeans=True), opt-in Moved each to its use site. Behavior unchanged; startup is faster on the extract script especially when --help / sanity-check paths run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ache extract_hand_bboxes_and_meshes.py end-to-end optimization on task_1_21s: 4 490 → 412 ms/frame (12m 25s → 1m 8s wall), no quality regression vs the original. Pipeline changes: - torch Adam minimize on GPU replaces scipy Nelder-Mead (batched across both hands within a frame, patience early-stop, optimizer state pre-allocated) - cv2 convex-hull hand mask (~5 ms) replaces pyrender (~370 ms); pyrender Renderer is lazy-loaded, only constructed under --save_viz - Processing/viz mode split: scene + 3dhand PLYs gated viz-only via --save_scene_pcd / --save_3dhand_pcd / --save_viz; rfp-ready default writes only model/*.npz (everything rfp consumes) - --detector_stride N caches ViTDet across N frames with auto-redetect on hand-keypoint loss - Binary PLY writes everywhere (transparent to trimesh.load_mesh) - cam_K.txt cached on the extractor (was np.loadtxt'd every frame) - Async NPZ writes via ThreadPool - --frame_batch_size N: cross-frame batched torch Adam over all hands x K frames (K=2: +24% speed, median 11 mm drift; K>2 not recommended) - Lazy ML-stack import via _ensure_heavy_imports - module load 19s -> 1.5s - ViTDet + ViTPose pickled to ~/.cache/hamer/ on first run; cache hit saves ~18s on subsequent runs (HaMeR LightningModule has a ctypes pointer in smplx's MANO wrapper so it stays on the fresh-load path) - Per-step rich UI: 4 import spinners (HaMeR, detectron2, pyrender, mmpose) + 3 model load spinners; vlog.working survives stdout redirects - Latent _save_meshes return-arity bug fixed - was returning 4 values, caller unpacked 3; every frame ValueError'd silently while writes still completed mesh_to_sdf/rgbd2pc.py: - Otsu-on-z 1D threshold replaces sklearn KMeans for depth-pc cluster filter (~124 -> 5 ms/frame; partitions ~98.5% identical on tabletop scenes where the foreground/background split is along the depth axis) - Vectorized RGB-to-point projection (was a Python loop over ~300k points) - Binary PLY writes robokit/log.py: - Snapshot sys.stdout at Console construction so vlog.working / vlog.progress spinners survive contextlib.redirect_stdout used for third-party noise suppression (otherwise "Loading models" went blank for ~45s while the spinner output was being captured into the silenced buffer) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ng/viz split Brings the README in sync with d39693b. Adds: - Default (processing-mode) + viz-mode run commands for HaMeR - New flags: --frame_batch_size, --detector_stride, --torch_steps/min_steps/tol/lr, --mask_backend, --minimize_backend, --save_scene_pcd, --save_3dhand_pcd, --parallel_load, --no_model_cache - Output gating: model/ always written; 3dhand/, scene/, extra_plots/ are opt-in via --save_viz or the per-stage flags - Model cache section (~/.cache/hamer/, ~5 GB, ~18 s saved on warm runs) - Phase 5+ benchmark table on RTX A5000 (4 490 -> 412 ms/frame on task_1_21s) - Updated top-of-README speed-gains summary to reflect end-to-end 10.9x - Updated performance-flags table to point at the new defaults + correct fallbacks (--minimize_backend scipy, --mask_backend pyrender) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

jishnujayakumar and others added 30 commits May 7, 2026 13:20

rfp-grasp-transfer: bump submodule for reset() reload revert (perf fix)

b74ba3e

rfp-grasp-transfer: bump submodule for Phase 5 (skip source plotly wh…

bcb5649

…en no viz)

rfp-grasp-transfer: bump submodule for FASTEN_PROFILE per-stage timing

bb675df

rfp-grasp-transfer: bump submodule for Phase 5 deepcopy snapshot (~2.…

0b604b1

…4x speedup)

rfp-grasp-transfer: bump submodule for Phase 6 (BatchedAdamGraspTrans…

8d28d7b

…fer, 2.78x at F=4)

rfp-grasp-transfer: bump submodule for industry-grade rich logging

62466bc

jishnujayakumar and others added 6 commits May 8, 2026 11:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make VIE faster#16

Make VIE faster#16
jishnujayakumar wants to merge 36 commits into
mainfrom
jishnu/fasten-vie

jishnujayakumar commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jishnujayakumar commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant