[Flux] fix incorrect seed setting for dp shard #844

CarlosGomes98 · 2025-11-18T17:59:42Z

This fixes a bug in the reference. For flux, it is important that each DP rank has a different seed, in order to sample different noise for each data sample. Failure to do this results in slower convergence,.

The torchtitan code was set up in such a way that seeds were different amongst dp ranks but then the same among fsdp ranks. This was reported and fixed. This MR pulls in the changes made to the torchtitan repository.

This affects RCPs and will require their recalculation. For GBS 1k and larger, the convergence change is not very large (3-4%). However, for GBS 512, the difference is quite large. This is due to the fact that this RCP was computed with a smaller number of nodes. Since fsdp was being used within nodes, this is more affected by this change, and speeds up convergence by 14%.

github-actions · 2025-11-18T17:59:52Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

CarlosGomes98 · 2025-11-19T14:45:46Z

relevant rcp update: mlcommons/logging#443

ShriyaRishab · 2025-12-04T18:04:13Z

Approved in 12/4 WG

fix incorrect seed setting for dp shard

c3a5fd3

CarlosGomes98 requested a review from a team as a code owner November 18, 2025 17:59

CarlosGomes98 mentioned this pull request Nov 19, 2025

update flux rcp mlcommons/logging#443

Merged

ShriyaRishab approved these changes Dec 4, 2025

View reviewed changes

ShriyaRishab merged commit 803adc1 into mlcommons:master Dec 4, 2025
1 check passed

github-actions bot locked and limited conversation to collaborators Dec 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Flux] fix incorrect seed setting for dp shard #844

[Flux] fix incorrect seed setting for dp shard #844

Uh oh!

CarlosGomes98 commented Nov 18, 2025

Uh oh!

github-actions bot commented Nov 18, 2025

Uh oh!

CarlosGomes98 commented Nov 19, 2025

Uh oh!

ShriyaRishab commented Dec 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Flux] fix incorrect seed setting for dp shard #844

[Flux] fix incorrect seed setting for dp shard #844

Uh oh!

Conversation

CarlosGomes98 commented Nov 18, 2025

Uh oh!

github-actions bot commented Nov 18, 2025

Uh oh!

CarlosGomes98 commented Nov 19, 2025

Uh oh!

ShriyaRishab commented Dec 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants