Skip to content

Refactor algo in one place#1482

Closed
copybara-service[bot] wants to merge 0 commit into
mainfrom
test_908254319
Closed

Refactor algo in one place#1482
copybara-service[bot] wants to merge 0 commit into
mainfrom
test_908254319

Conversation

@copybara-service
Copy link
Copy Markdown

Refactor algo in one place
Move all loss function, advantage estimator into algo_core. So both agentic rl and non-agentic rl share same algorithms.

  1. Combine agentic grpo and grpo loss fn and advantage estimator.
  2. Use np.array for group advantage for faster computation, details in cl/845477002

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant