Claude/pensive hoover 06d545#360
Open
lin285170 wants to merge 39 commits into
Open
Conversation
… serving docs. - Refactor generate.py: build_parser, parse_args, args_from_job_dict for programmatic jobs - Add generate_job.py for torchrun entrypoint - Add serve/ package (FastAPI, Redis queue, multi-node launcher, worker) - Add run_api_server.py, requirements_serve.txt, pyproject optional serve deps - Add docker-compose.yml, docker/Dockerfiles, compose env example, .dockerignore - Document deployment in DEPLOY_SERVE.md; extend .gitignore for placeholder ckpt Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
…nfig flexibility - Extract shared pipeline logic (T5 encoding, VAE init, scheduler creation, model config, dual-expert switching, seed handling, distributed helpers) into WanPipelineBase in pipeline_base.py - Refactor WanT2V, WanI2V, WanTI2V, WanS2V, WanAnimate to inherit from WanPipelineBase, reducing significant code duplication - Remove hardcoded /home/HPCBase paths from serve/config.py and serve/launcher.py; conda_env and conda_exe now default to empty, LD_LIBRARY_PATH and OMP_NUM_THREADS are configurable via environment variables WAN_REMOTE_LD_LIBRARY_PATH and WAN_REMOTE_OMP_NUM_THREADS - Add conda_env/conda_exe to Settings.from_env() for environment-based configuration
…f SSH Two-node architecture: master (redis+api+worker0) and worker (worker1). Each node runs torchrun locally; NCCL/TCP rendezvous connects them. Master signals worker1 via Redis pub/sub to coordinate job dispatch.
- DEPLOY.md: step-by-step dual-node deployment instructions - tests/test_api.py: integration tests for all API endpoints (auth, task CRUD, file download) - tests/test_serve.py: unit tests for config, job_build, worker routing, store signals, launcher, schemas
… s2v) - schemas: add ModelEnum for valid model IDs, video/mask fields for animate - api: per-model input validation (i2v requires image, animate requires video, s2v requires audio or enable_tts) - job_build: fill default size per model when not provided - tests: 37 unit tests covering all model types and validation rules
…ployment job_build now auto-appends model-specific subdirectory to global ckpt_dir: wan2.2-t2v-a14b → /ckpt/Wan2.2-T2V-A14B wan2.2-i2v-a14b → /ckpt/Wan2.2-I2V-A14B etc. parameters.ckpt_dir still overrides for custom paths. Just specify the model in the request and the system finds the right weights automatically. Updated docker-compose, .env example, DEPLOY.md for multi-model layout.
Critical fix: generate.py has --src_root_path but no --video arg.
VideoInput.video is now mapped to src_root_path in job_build.py,
preventing ValueError("Unknown job field: video") in generate_job.py.
Also fixes test_api.py:
- s2v test now includes required image field
- added per-model validation tests (i2v requires image, etc.)
- added animate and ti2v creation tests
- removed stale test_empty_prompt (now returns 400)
- fixed size format (use * instead of x)
…t issue Docker bridge network isolates containers, preventing torchrun from binding and exposing port 29500 for NCCL rendezvous. This causes the "client socket timed out" error seen in production. worker0 and worker1 now use network_mode: host (PyTorch's recommended approach for distributed training). Redis and API stay on bridge network. worker0 connects to Redis via localhost (WAN_REDIS_URL_LOCAL). WAN_MASTER_ADDR must use the host's real IP (not 0.0.0.0).
…del_S2V) The class was renamed in an earlier commit but __init__.py still imported the old name, causing ImportError when running generate_job.py.
The class in audio_encoder.py is named AudioEncoder, not Wav2Vec2Encoder. This ImportError prevented the s2v pipeline from loading.
Generator exists in motion_encoder.py but animate.py imports MotionEncoder. Upstream also has this mismatch — the MotionEncoder wrapper class was missing. It loads the Generator from checkpoint and exposes get_motion().
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.