Fix: rezero buffers variable action space by SteamMachinist · Pull Request #480 · opendilab/LightZero

SteamMachinist · 2026-04-28T17:11:05Z

No description provided.

…on_shape to work with *_model_mlp.py

…-space

puyuan1996 · 2026-05-30T04:17:30Z

Thanks for the PR. I agree with the motivation to handle variable action spaces correctly, but I think this change may break the existing child_visit_segment contract.

For varied_action_space, the current non-reanalyzed target path treats child_visit[current_index] as a compact distribution aligned with legal_actions, and expands it back to the full action space later. This PR writes the expanded full-action policy back into child_visit_segment. As a result, when the same segment is later used by the non-reanalyzed path, it will be expanded again incorrectly.

For example, with action_space_size=5, legal_actions=[1, 3], and compact distribution [0.25, 0.75], this PR stores [0, 0.25, 0, 0.75, 0]. Later _compute_target_policy_non_reanalyzed() will read distributions[0] and distributions[1] for legal actions 1 and 3, producing an incorrect target.

I think we should either:

keep child_visit_segment compact and only return full-size target_policies, or
change all consumers of child_visit_segment to treat it as full-size, update the docs/comments, and add regression tests.

Could you add a unit test covering ReZero + varied_action_space, especially the path where a reanalyzed segment is later sampled through the non-reanalyzed target computation?

SteamMachinist and others added 3 commits April 18, 2026 15:44

fix: RepresentationNetworkMLP param name observation_dim -> observati…

ed4f7c6

…on_shape to work with *_model_mlp.py

fix: rezero mz and ez game buffers for variable action space

b459cc6

Merge branch 'opendilab:main' into fix/rezero-buffers-variable-action…

b5c37aa

…-space

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: rezero buffers variable action space#480

Fix: rezero buffers variable action space#480
SteamMachinist wants to merge 3 commits into
opendilab:mainfrom
SteamMachinist:fix/rezero-buffers-variable-action-space

SteamMachinist commented Apr 28, 2026

Uh oh!

puyuan1996 commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

SteamMachinist commented Apr 28, 2026

Uh oh!

puyuan1996 commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants