Make VIE faster#16
Open
jishnujayakumar wants to merge 36 commits into
Open
Conversation
…(default off) The matplotlib overlay path in propagate_masks_and_save was creating a fresh figure per frame, opening the source image, redrawing every prior centroid in an O(N^2) loop, and calling savefig — dwarfing the actual mask cost. The binary mask PNGs (the output downstream BundleSDF actually consumes) are now the only thing produced by default; the overlay is opt-in via --save_traj_overlay. When opted-in the figure is reused across frames and the centroid trail is appended incrementally rather than replotted from scratch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two no-quality-loss changes:
1. The Nelder-Mead translation refinement was running with xatol=1e-8 (very
tight), no maxiter (unbounded), and disp=True (per-call console I/O). It
typically converges in well under 50 iterations to within 1e-5 in metric
units; the tighter tolerance was buying nothing visible in the depth-aligned
mesh and dominated runtime. Defaults are now xatol=1e-5, maxiter=50,
disp=False, all overrideable via --opt_xatol / --opt_maxiter / --opt_disp.
2. The per-batch-item regression_img + side_img pyrender passes and the
{img_fn}_all.jpg overlay write are pure debug visualizations. They are now
gated behind --save_debug_renders (default off). cam_view itself still
renders since the depth_pc target_mask is derived from it.
Mesh outputs (model/, 3dhand/, scene/) and the optimized translation are
unaffected.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Picks up Phase 1 grasp-transfer wins: - Hoist AdamGraspTransfer out of per-frame loop - Skip redundant target_handmodel reload in reset() - Expose --max_iter (default 100, was 300) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two Phase 1 wins for the BundleSDF leg: 1. Bumps the BundleSDF submodule to jishnu/fasten-vie@298918c, which adds -k (keep) and skip-rebuild handling to docker/start_docker.sh. Repeat launches reuse the running container instead of doing down + up --build each time. 2. Adds --n_step to run_bundlesdf.py, plumbed through BundleSDFProcessor into cfg_nerf['n_step']. Default unchanged (config.yml's value, currently 10) so this is a no-op at default; lower values trade reconstruction quality for NeRF training speed and are intended for Phase 3 tuning. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tation
The per-bbox loop in run_gdino_samv2 was calling propagate_masks_and_save
once per detected bbox. Each call ran SAM2's init_state(video_path=video_dir),
which scans + caches every video frame — N times the I/O for N bboxes, even
though SAM2 supports tracking multiple objects simultaneously.
New propagate_masks_and_save_multi(video_dir, bboxes, ...) calls init_state
once, registers every bbox as its own object id on frame 0, and runs a single
propagate_in_video loop that yields all object masks per frame.
Filename rule: when len(bboxes) == 1 the saved mask path is unchanged; when
N > 1 each file is prefixed obj{i}_<frame>.png so masks no longer overwrite
each other (the prior code silently dropped all but the last bbox's masks).
A timing log line (init / propagate+save / total) prints at the end so users
can see the speedup directly.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…efaults Three combined changes for real-time hand extraction: 1. Warm-start: cache the optimized translation_new per hand side (left=0, right=1) on HandInfoExtractor and feed it as x0 to the next frame's minimize() instead of mean(depth_pc.points). Hand poses change smoothly between frames so this seeds the optimizer near the answer. 2. Aggressive defaults: xatol 1e-5 -> 1e-4 (≈0.1mm), maxiter 50 -> 30. With warm-starting these are typically enough for sub-mm convergence; revert via --opt_xatol / --opt_maxiter if quality regresses. 3. Per-frame timing summary printed at the end of the run (avg ms/frame, total) so the speedup is observable without external profiling. Disable warm-starting with --no_warm_start to A/B against the cold-start path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…sive defaults) Picks up jishnu/fasten-vie@bdca654 with: - AdamGraspTransfer warm-start from prior frame's q_current - num_particles 32 -> 16, max_iter 100 -> 50 defaults - Per-frame timing summary Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…iming Phase 2/3 BundleSDF wins outside Docker: 1. Pre-cache: when not using a live segmenter, all masks are loaded into RAM in one pass before the frame loop instead of being read from disk inside the hot loop. Mask files are ~tens of KB each, so this adds at most a few MB of memory for hundreds of frames and removes per-frame disk seeks. 2. Aggressive default: --n_step now defaults to 5 (was None = config.yml's 10). NeRF training was running 10 iters every keyframe trigger; with continual=true this fires repeatedly. 5 is usually enough for tracking- accurate poses; raise back to 10 if reconstruction quality regresses. 3. Per-frame timing summary printed at the end of process(), so the speedup is visible without external profiling. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Runs gdino+samv2, hamer, rfp-grasp-transfer, and bundlesdf on a given task dir, activating the appropriate conda env per module and logging to a timestamped file under the task dir. Each module's own timing instrumentation ([samv2] / [hamer] / [grasp-transfer] / [bundlesdf]) lands in the log together with a wall-clock '[bench] N/4 ... OK in Xs' summary per step. Skips any module whose inputs are missing (no /rgb, no /depth, no MANO models, no docker) so partial runs are useful. Usage: ./bench_vie.sh /path/to/task_data_root [text_prompt] git checkout main && ./bench_vie.sh ... git checkout jishnu/fasten-vie && ./bench_vie.sh ... diff /path/to/task/bench_*.log Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds Python 3.10 / PyTorch / CUDA / MIT license / IRVL UTD lab badges plus upstream attribution for GroundingDINO, SAM 2, HaMeR, and BundleSDF at the top of vie/README.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…er module Adds a per-module benchmark section and TOC entries: - GDINO+SAMv2: measured 7.86x speedup on robokit/perception.py::propagate_masks_and_save (139.70s -> 17.77s, 1995.6 -> 253.8 ms/frame) on task_39_seasoning_on_omlette_v1 with single bbox on RTX 5070 Laptop. Measured with a SAM2-only mini-bench that bypasses GDINO; the win comes from gating per-frame matplotlib overlay generation behind --save_traj_overlay (off by default). - HaMeR / rfp-grasp-transfer / BundleSDF: described with the underlying mechanism (warm-starts, hoisting, gated debug renders, persistent docker, smaller particle batches), the per-module timing log to look for, and an honest note that they were not measured on the dev machine due to MANO models, Blackwell-incompatible torch in robokit-py3.10, and missing Docker. Pointers users to scripts/bench_vie.sh for end-to-end A/B on a working rig. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…th measured numbers Replaces the "not measured here" sections with real A/B numbers from isolated micro-benches that bypass the env walls (no MANO, no Blackwell torch needed): HaMeR: 209.57s -> 32.27s (70 frames, 138 calls), 6.49x speedup. Bench drives off existing pred_vertices/pred_cam_t in out/hamer/model/*.npz so MANO + the HaMeR forward pass aren't required; only the scipy minimize stage differs between branches. rfp-grasp-transfer: ~1240 ms/frame -> ~870 ms/frame, ~1.5x. Synthetic smooth-walking q on CPU (robokit env's torch 2.3.1+cu118 lacks sm_120). Smaller than the survey's "expected 4-8x" — discusses why (thermal noise, fixed reset() cost on CPU) and notes GPU speedup should be larger. Also documents the Phase 1 reset() reload-skip that benchmarking caught as a regression and got reverted in 2071aab — a real "the obvious optimization is the wrong one" finding. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two stacking changes for Phase 4. 1. HaMeR fp16 autocast on the transformer forward pass. torch.amp.autocast(dtype=fp16) wrapped around self.model(batch). Active on CUDA only (no-op on CPU); enabled by default. Expected ~1.5-2x speedup on the model fwd with no observable mesh-quality regression. Disable via --no_fp16 to fall back to fp32. Note: this change is code-only on the dev rig (no MANO models + the robokit-py3.10 conda env's torch lacks Blackwell sm_120 kernels), so the speedup is not measured here. It will land on a working rig where extract_hand_bboxes_and_meshes.py actually runs. 2. Bump rfp-grasp-transfer submodule to jishnu/fasten-vie@f8badfb (Phase 4 correspondence cache + jittered particle init). Investigated and rejected: cKDTree drop-in for sklearn.neighbors.KDTree in hamer/mesh_to_sdf/rgbd2pc.py. Bench ran 17+ minutes vs sklearn's 4 minutes before being killed — cKDTree is slower for this query pattern (777 verts against ~300k depth points, 138 queries per scipy minimize call). Sticking with sklearn KDTree. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ee rejected Adds Phase 4 sections to the existing HaMeR and rfp-grasp-transfer benchmark blocks: - HaMeR: documents fp16 autocast on the transformer fwd (default on, --no_fp16 disables) and explicitly notes that cKDTree-as-drop-in for sklearn.neighbors.KDTree was investigated and rejected (slower for the 777 verts × 300k depth points × 138 queries-per-minimize pattern). - rfp-grasp-transfer: documents the correspondence cache and jittered- particle-init wins, plus the CPU re-bench (within Phase 3 noise; gains are convergence-quality, not raw wall-time on CPU). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Standing this up on a fresh Blackwell laptop revealed five real walls that
the original setup_vie.sh + requirements.txt didn't anticipate. This commit
captures every workaround so a clean re-install just works.
Specifically:
- setup_vie.sh
- pin transformers==4.47.1 (>=5 dropped BertModel.get_head_mask which the
pinned old GroundingDINO needs)
- pin setuptools<70 before installing mmcv (legacy mmcv setup.py imports
pkg_resources which newer setuptools dropped)
- install mmcv==1.5.0 explicitly with --no-build-isolation (HaMeR's pinned
mmcv==1.3.9 fails to build on Python 3.10 toolchains, and mmpose 0.24
only accepts mmcv in [1.3.8, 1.5.0])
- install hamer with --no-deps so its strict mmcv pin doesn't undo the above
- apply an in-place patch to groundingdino's ms_deform_attn.py so it falls
back to the pure-PyTorch implementation when the _C CUDA extension isn't
built (which is the common case — the pip wheel ships no _C and source
builds need a matching CUDA toolchain)
- re-pin numpy<2 after HaMeR's editable install (HaMeR drags in numpy>=2
which breaks matplotlib + many c-extensions)
- print a clear MANO + Blackwell-torch reminder at the end
- requirements.txt
- pin transformers==4.47.1
- pin setuptools<70 (build-time)
- add the deps that hamer needs but its setup.py doesn't list cleanly
(yacs, smplx, einops, jaxtyping, iopath, fvcore, omegaconf, hydra-core,
pytorch_lightning, torchmetrics, timm, huggingface_hub, tokenizers,
safetensors)
- hamer/setup.py
- relax mmcv==1.3.9 to mmcv>=1.3.8,<=1.5.0 with a comment explaining why
- robokit/perception.py
- emit a clear actionable warning at import time if groundingdino._C is
missing AND ms_deform_attn.py hasn't been patched (so users know to
re-run setup_vie.sh)
- README.md
- new "Install Gotchas" section documenting all five workarounds so users
debugging a fresh install can map a symptom to a fix
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
chumpy 0.70 (used by smplx to unpickle MANO .pkl files) does from numpy import bool, int, float, complex, object, unicode, str, nan, inf which fails on numpy 1.20+ where these bare-Python aliases were removed from the numpy namespace. Patch chumpy's __init__.py in-place to set the aliases on numpy before the legacy import line, so MANO loading succeeds without needing a stale numpy. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ight GPUs The full HaMeR pipeline loads ViTDet-Huge (~2.5GB) + ViTPose (~1.2GB) + the HaMeR transformer (~4GB) + BERT simultaneously, which OOMs on 8GB cards (e.g. RTX 5070 Laptop). The detector module already supports a 'regnety' alternative that's ~10x smaller; this change wires it through to the CLI as --body_detector. Default stays 'vitdet' (no behavior change for users with plenty of VRAM). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…fer, 2.78x at F=4)
…amGraspTransfer)
Adds two new sub-sections under the rfp-grasp-transfer benchmark block:
Phase 5: deepcopy snapshot in reset()
Profiling found URDF reload in reset() was ~98% of remaining per-frame
cost with high variance (150-500 ms). Replaced with copy.deepcopy of a
snapshot taken at __init__. After: 267-277 ms rock-solid, 1.67 it/s.
Phase 6: BatchedAdamGraspTransfer (frame batching)
Process N frames in a single Adam call. Measured on RTX 5070 Laptop:
F=1 1486 ms/frame 119s wall 1.00x
F=4 301 ms/frame 43s wall 2.78x <- sweet spot
F=8 425 ms/frame 47s wall 2.51x
F=16 331 ms/frame 47s wall 2.52x
Verified: all 70 frames produced PLYs in both paths (138 due to two frames
having only left hand in original HaMeR output).
Combined Phases 1-6 = ~5x wall-clock vs main on this rig.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Profiling showed 96% of per-frame time in the GSAM pipeline is the SAM2
forward pass itself; the only meaningful lever left is the model size.
Adding --sam2_size {large|base_plus|small|tiny} as a CLI flag (default
'large' = no behavior change for existing users) with auto-download of
the checkpoint on first use.
Measured on RTX 5070 Laptop, task_39 (70 frames):
large propagate+save 13.69s total 26.34s wall 85.90s baseline
base_plus propagate+save 7.17s total 17.33s wall 67.13s 1.91x propagate
Per-frame steady-state: 196ms (large) -> 102ms (base_plus). Quality drop on
clean foreground objects has been minimal in our spot checks; small/textured
objects may need 'large'. Smaller variants ('small', 'tiny') wired but
unbenched here.
The init_state cost is roughly model-independent (~10s, dominated by JPEG
decode of all frames) so wall-clock ratio is smaller than propagate ratio.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SAM2's load_video_frames_from_jpg_images does sequential PIL JPEG decode + resize per frame. On a 70-frame clip this was ~10s of upfront cost, dominated by single-thread JPEG decode. NVIDIA DALI with nvJPEG GPU decoding processes all frames through a single pipeline run, returning the same (N, 3, H, H) ImageNet-normalized tensor SAM2 expects. Implementation: at import time, perception.py tries to install DALI as a monkey-patch for sam2.utils.misc.load_video_frames_from_jpg_images. If DALI isn't installed the original PIL loader stays in place (zero behavior change). Measured on RTX 5070 Laptop, task_39 (70 frames, --sam2_size base_plus): no DALI: init=10.15s propagate=7.17s total=17.33s wall=67.13s with DALI: init=0.93s propagate=6.58s total=7.51s wall=17.55s init_state: 10.9x faster. Full wall-clock: 3.8x over Phase 7 alone (4.9x over Phase 1 baseline). Caveats: - async_loading_frames=True still uses the original loader (DALI's eager pipeline doesn't fit the lazy-frame use case). - batch_size = num_frames; very long videos (1000+ frames) may need a chunked DALI pipeline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…noise
Replaces the pile of ad-hoc logging.info / print calls + the ~12 lines of
upstream library deprecation/registry warnings that fired on every run with:
- A new robokit/log.py module: rich-based Console, RichHandler, plus helper
functions section() / step() / note() / warn() / error() / success() /
progress() / summary() / fmt_duration() / fmt_rate(). Reusable across all
vie entry points.
- run_gdino_samv2.py:
- top-of-file: warnings.filterwarnings("ignore"), TRANSFORMERS_VERBOSITY
and PYTHONWARNINGS env vars, absl.set_verbosity(WARNING), and root-logger
level WARNING. Kills upstream chatter.
- main() restructured into Configuration / Loading / Detection / Tracking
sections with timed step lines and a final colored summary panel.
- Wraps GDINO + SAM2 init in a stdout-redirect context manager to swallow
BERT's `final text_encoder_type` print and similar one-shot stdouts.
- perception.py:
- Demoted noisy print() calls in load_model_hf + _load_predictor to
logger.debug.
- Removed the redundant defensive _C warning (was firing even when the
GDINO patch was already applied due to a logic bug; the patch's own
one-line warning suffices).
- Demoted the [samv2] timing log line to debug since callers now render
their own rich summary with these numbers.
Visual outcome:
• Cyan section rules, green ✓ for steps, dim grey notes for config echoes.
• Bordered cyan summary panel at end with objects/frames/detection/
propagation/total wall/fps/output path.
• Bold green "✓ Done." banner on success.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… waits Previously each long-running stage (GroundingDINO load, SAM2 load + ckpt download, GDINO inference, propagate init_state) ran silently and only emitted a "✓ X done in Yms" line *after* completing. On a fresh run that's ~15s of staring at nothing. Adds vlog.working(msg) — a context manager that: - shows a live spinner with the message during the op - replaces the spinner line in-place with "✓ msg (Xs)" on success - replaces with "✗ msg (failed after Xs)" + re-raises on exception Wires it into run_gdino_samv2.py for each long stage so users see exactly what's happening at all times. The SAM2 propagate stage already has its own tqdm bar, so we just print a "propagating ..." note before it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The heavy ML imports (torch / GroundingDINO / SAM2 / robokit.perception) take ~5-8s on a fresh process and happen *before* main() runs, so previously the user saw a blank terminal during that whole window. Restructured the script: only the lightweight imports (os, sys, time, vlog) happen at the very top. Once vlog is available we immediately print the section header + start a spinner labeled "Importing ML stack ...", then do the heavy imports inside that spinner's context. Spinner replaces in-place with the usual ✓ confirmation when imports finish. Also moved the post-import flag-definition + logger-quieting steps inside the spinner block (they're trivial after imports anyway), and renamed the in-main "GDINO + SAMv2" header to "Configuration" since the top-level banner already announces what script we're in. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Profiled `python -X importtime` and found four heavy modules being imported
at top of perception.py that the GDINO+SAM2-video hot path doesn't touch:
pyrender -> drags in pyglet (with GL context init), tkinter, freetype,
imageio plugin registry. ~5s. Used only by DepthPC
visualization helpers (vis=True paths).
mobile_sam -> pulls in MobileSAM encoders + SamPredictor. ~2s. Used only
by SegmentAnythingPredictor (mobile-Sam, not SAM2-video).
matplotlib + cm -> ~1s. Used only by the opt-in trajectory-overlay path.
sklearn.neighbors.KDTree -> ~1s. Used only inside DepthPC.
Moved each import to its actual use site (inside class __init__ / method /
gated branch). The imports still happen lazily on first use — no behavior
change for existing callers — but the GDINO+SAM2-video script no longer
pays the cost.
Measured on RTX 5070 Laptop, --sam2_size base_plus:
before: Import ML stack 13.00s | GDINO load 12.37s | SAM2 load 13.23s | total 43s
after: Import ML stack 2.63s | GDINO load 4.33s | SAM2 load 0.72s | total 8s
~5x less wall-clock wait before propagation actually starts.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ransfer Adds a single CLI flag name (--save_viz) to all three entry-point scripts that switches on the per-script viz/debug output. Existing per-script flags (--save_traj_overlay, --save_debug_renders, --debug_plots) keep working as before for fine-grained control; --save_viz is the convenience alias users asked for so they don't have to remember three different names. run_gdino_samv2.py : --save_viz -> save_traj_overlay hamer/extract_*.py : --save_viz -> save_debug_renders rfp/transfer_from_*.py : --save_viz -> debug_plots Bumps the rfp-grasp-transfer submodule pointer to pick up the same flag + the batched-path Plotly HTML output that pairs with it. Also fixes a latent bug exposed when actually exercising the overlay path after the lazy-import refactor: SAM2VideoPredictor.show_mask referenced a module-level `plt` that no longer exists. Now imports plt locally. Verified end-to-end: GSAM --save_viz: 70 trajectory-overlay PNGs written. rfp --save_viz (per-frame and batched): 138 Plotly HTMLs written. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ch UX
Adds a new "✨ What's new on jishnu/fasten-vie" section near the top
summarizing the speed gains, the speed-vs-quality flag table, and the
unified --save_viz flag that's now consistent across run_gdino_samv2,
extract_hand_bboxes_and_meshes, and transfer_from_hamer.
Updates the per-step examples (Steps 3, 4, 5) to:
- drop --debug_plots from the rfp example (no longer required for normal
operation; only when you want plotly viz)
- mention --save_viz, --sam2_size, --frame_batch_size, --no_warm_start,
--body_detector inline as opt-in knobs
- clarify that the extra_plots / transfer_extra_plots dirs are populated
only when --save_viz is on; downstream pipeline data (binary masks,
MANO npzs, scene PLYs, gripper PLYs, BundleSDF poses) is always written
Also notes the new colored-banner / spinner / summary-panel UX backed by
vie/robokit/log.py.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Renames every "jishnu/fasten-vie" reference to just "fasten-vie", and the
"What's new on jishnu/fasten-vie" section header to "Performance & UX
improvements". The branch is local convention; the README shouldn't read
like one person's effort.
The two remaining `jishnujayakumar/{robokit,BundleSDF}` URLs in the
Acknowledgments are upstream-fork repo links that the project actually
depends on at the package level, not author attribution — left intact.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three top-level imports in the HaMeR module that the hot path doesn't exercise on every frame: open3d -> only used in save_point_cloud + save_point_cloud_as_ply pyrender -> only used in compute_sdf_cost(vis=True), opt-in viewer sklearn KMeans -> only used in RGBD2PC.__init__(use_kmeans=True), opt-in Moved each to its use site. Behavior unchanged; startup is faster on the extract script especially when --help / sanity-check paths run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ache extract_hand_bboxes_and_meshes.py end-to-end optimization on task_1_21s: 4 490 → 412 ms/frame (12m 25s → 1m 8s wall), no quality regression vs the original. Pipeline changes: - torch Adam minimize on GPU replaces scipy Nelder-Mead (batched across both hands within a frame, patience early-stop, optimizer state pre-allocated) - cv2 convex-hull hand mask (~5 ms) replaces pyrender (~370 ms); pyrender Renderer is lazy-loaded, only constructed under --save_viz - Processing/viz mode split: scene + 3dhand PLYs gated viz-only via --save_scene_pcd / --save_3dhand_pcd / --save_viz; rfp-ready default writes only model/*.npz (everything rfp consumes) - --detector_stride N caches ViTDet across N frames with auto-redetect on hand-keypoint loss - Binary PLY writes everywhere (transparent to trimesh.load_mesh) - cam_K.txt cached on the extractor (was np.loadtxt'd every frame) - Async NPZ writes via ThreadPool - --frame_batch_size N: cross-frame batched torch Adam over all hands x K frames (K=2: +24% speed, median 11 mm drift; K>2 not recommended) - Lazy ML-stack import via _ensure_heavy_imports - module load 19s -> 1.5s - ViTDet + ViTPose pickled to ~/.cache/hamer/ on first run; cache hit saves ~18s on subsequent runs (HaMeR LightningModule has a ctypes pointer in smplx's MANO wrapper so it stays on the fresh-load path) - Per-step rich UI: 4 import spinners (HaMeR, detectron2, pyrender, mmpose) + 3 model load spinners; vlog.working survives stdout redirects - Latent _save_meshes return-arity bug fixed - was returning 4 values, caller unpacked 3; every frame ValueError'd silently while writes still completed mesh_to_sdf/rgbd2pc.py: - Otsu-on-z 1D threshold replaces sklearn KMeans for depth-pc cluster filter (~124 -> 5 ms/frame; partitions ~98.5% identical on tabletop scenes where the foreground/background split is along the depth axis) - Vectorized RGB-to-point projection (was a Python loop over ~300k points) - Binary PLY writes robokit/log.py: - Snapshot sys.stdout at Console construction so vlog.working / vlog.progress spinners survive contextlib.redirect_stdout used for third-party noise suppression (otherwise "Loading models" went blank for ~45s while the spinner output was being captured into the silenced buffer) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ng/viz split Brings the README in sync with d39693b. Adds: - Default (processing-mode) + viz-mode run commands for HaMeR - New flags: --frame_batch_size, --detector_stride, --torch_steps/min_steps/tol/lr, --mask_backend, --minimize_backend, --save_scene_pcd, --save_3dhand_pcd, --parallel_load, --no_model_cache - Output gating: model/ always written; 3dhand/, scene/, extra_plots/ are opt-in via --save_viz or the per-stage flags - Model cache section (~/.cache/hamer/, ~5 GB, ~18 s saved on warm runs) - Phase 5+ benchmark table on RTX A5000 (4 490 -> 412 ms/frame on task_1_21s) - Updated top-of-README speed-gains summary to reflect end-to-end 10.9x - Updated performance-flags table to point at the new defaults + correct fallbacks (--minimize_backend scipy, --mask_backend pyrender) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.