Claude/pensive hoover 06d545 by lin285170 · Pull Request #360 · Wan-Video/Wan2.2

lin285170 · 2026-05-14T04:05:07Z

No description provided.

… serving docs. - Refactor generate.py: build_parser, parse_args, args_from_job_dict for programmatic jobs - Add generate_job.py for torchrun entrypoint - Add serve/ package (FastAPI, Redis queue, multi-node launcher, worker) - Add run_api_server.py, requirements_serve.txt, pyproject optional serve deps - Add docker-compose.yml, docker/Dockerfiles, compose env example, .dockerignore - Document deployment in DEPLOY_SERVE.md; extend .gitignore for placeholder ckpt Co-authored-by: Cursor <cursoragent@cursor.com>

Co-authored-by: Cursor <cursoragent@cursor.com>

…nfig flexibility - Extract shared pipeline logic (T5 encoding, VAE init, scheduler creation, model config, dual-expert switching, seed handling, distributed helpers) into WanPipelineBase in pipeline_base.py - Refactor WanT2V, WanI2V, WanTI2V, WanS2V, WanAnimate to inherit from WanPipelineBase, reducing significant code duplication - Remove hardcoded /home/HPCBase paths from serve/config.py and serve/launcher.py; conda_env and conda_exe now default to empty, LD_LIBRARY_PATH and OMP_NUM_THREADS are configurable via environment variables WAN_REMOTE_LD_LIBRARY_PATH and WAN_REMOTE_OMP_NUM_THREADS - Add conda_env/conda_exe to Settings.from_env() for environment-based configuration

…f SSH Two-node architecture: master (redis+api+worker0) and worker (worker1). Each node runs torchrun locally; NCCL/TCP rendezvous connects them. Master signals worker1 via Redis pub/sub to coordinate job dispatch.

- DEPLOY.md: step-by-step dual-node deployment instructions - tests/test_api.py: integration tests for all API endpoints (auth, task CRUD, file download) - tests/test_serve.py: unit tests for config, job_build, worker routing, store signals, launcher, schemas

… s2v) - schemas: add ModelEnum for valid model IDs, video/mask fields for animate - api: per-model input validation (i2v requires image, animate requires video, s2v requires audio or enable_tts) - job_build: fill default size per model when not provided - tests: 37 unit tests covering all model types and validation rules

…on rules

…ployment job_build now auto-appends model-specific subdirectory to global ckpt_dir: wan2.2-t2v-a14b → /ckpt/Wan2.2-T2V-A14B wan2.2-i2v-a14b → /ckpt/Wan2.2-I2V-A14B etc. parameters.ckpt_dir still overrides for custom paths. Just specify the model in the request and the system finds the right weights automatically. Updated docker-compose, .env example, DEPLOY.md for multi-model layout.

Critical fix: generate.py has --src_root_path but no --video arg. VideoInput.video is now mapped to src_root_path in job_build.py, preventing ValueError("Unknown job field: video") in generate_job.py. Also fixes test_api.py: - s2v test now includes required image field - added per-model validation tests (i2v requires image, etc.) - added animate and ti2v creation tests - removed stale test_empty_prompt (now returns 400) - fixed size format (use * instead of x)

…t issue Docker bridge network isolates containers, preventing torchrun from binding and exposing port 29500 for NCCL rendezvous. This causes the "client socket timed out" error seen in production. worker0 and worker1 now use network_mode: host (PyTorch's recommended approach for distributed training). Redis and API stay on bridge network. worker0 connects to Redis via localhost (WAN_REDIS_URL_LOCAL). WAN_MASTER_ADDR must use the host's real IP (not 0.0.0.0).

…del_S2V) The class was renamed in an earlier commit but __init__.py still imported the old name, causing ImportError when running generate_job.py.

The class in audio_encoder.py is named AudioEncoder, not Wav2Vec2Encoder. This ImportError prevented the s2v pipeline from loading.

Generator exists in motion_encoder.py but animate.py imports MotionEncoder. Upstream also has this mismatch — the MotionEncoder wrapper class was missing. It loads the Generator from checkpoint and exposes get_motion().

…ation

…rhead

lin285170 and others added 30 commits May 8, 2026 17:04

docs: merge deployment guide into README.md, stub DEPLOY_SERVE.md

b5741c9

Co-authored-by: Cursor <cursoragent@cursor.com>

docs: expand dual-node 2x4 GPU deployment steps in README

5550d73

Co-authored-by: Cursor <cursoragent@cursor.com>

fix(serve): import Header in api.py for _auth_dep

fc0a3fd

Co-authored-by: Cursor <cursoragent@cursor.com>

1

d01fbbc

1

5f84204

Merge branch 'main' of https://github.com/lin285170/Wan2.2 into main

291199f

fix: rename WanModel_S2V to WanS2VModel to resolve ImportError

46c717c

docs: update DEPLOY.md with all 5 model request examples and validati…

60afa8b

…on rules

fix: update s2v __init__.py to import WanS2VModel (renamed from WanMo…

88eeda6

…del_S2V) The class was renamed in an earlier commit but __init__.py still imported the old name, causing ImportError when running generate_job.py.

fix: use AudioEncoder instead of Wav2Vec2Encoder in speech2video.py

56bd65f

The class in audio_encoder.py is named AudioEncoder, not Wav2Vec2Encoder. This ImportError prevented the s2v pipeline from loading.

fix: add MotionEncoder wrapper class for animate pipeline

61d2f79

Generator exists in motion_encoder.py but animate.py imports MotionEncoder. Upstream also has this mismatch — the MotionEncoder wrapper class was missing. It loads the Generator from checkpoint and exposes get_motion().

fix: add XLMRobertaEncoder wrapper class for animate pipeline

5d0b143

fix: set shm_size 16g on worker containers for NCCL shared memory

4ccd0b6

fix: add NCCL env vars for cross-node communication debugging

50be3a1

fix: reduce cluster lock TTL to 600s and clear stale lock on startup

192dfb4

fix: use Redis list instead of pub/sub for worker signals

40807dd

fix: extract timesteps from UniPC scheduler in _create_scheduler

3243eba

fix: auto-enable offload_model and t5_cpu to prevent OOM

1d8f5c4

fix: auto-enable dit_fsdp for multi-GPU and skip offload under FSDP

694aa86

fix: bind-mount host data dir instead of Docker volume for outputs

f3dc34b

perf: DPM++ 20 steps, batch CFG, convert_model_dtype for faster gener…

7b4699a

…ation

fix: unsqueeze timestep before expand for batch CFG

44355d7

lin285170 added 9 commits May 15, 2026 15:37

fix: revert batch CFG to avoid OOM on A100 40GB

8c8beae

fix: auto-enable sequence parallel for high-resolution to avoid OOM

e9c3cbd

feat: add 1920*1080 and 1080*1920 resolution support

0a5efec

perf: use FSDP NO_SHARD when SP is active to reduce communication ove…

9ead7c0

…rhead

fix: high-res uses SP + FSDP FULL_SHARD, revert NO_SHARD that caused OOM

2e83630

feat: add WebUI for video generation

cfd1b8b

feat: fix model input params, add file upload and video input to WebUI

dd6143d

fix: reorder WebUI route before static mount to fix 404

4f30e4a

fix: upload to shared job_dir, ASCII filenames, webp→jpg conversion

382c492

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Claude/pensive hoover 06d545#360

Claude/pensive hoover 06d545#360
lin285170 wants to merge 39 commits into
Wan-Video:mainfrom
lin285170:claude/pensive-hoover-06d545

lin285170 commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lin285170 commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant