saving output from different ranks #62

hanaol · 2026-01-16T20:16:59Z

Handles logging/saving the performance metric across multiple ranks.

forklady42 · 2026-01-21T19:45:03Z

src/electrai/lightning.py

-                for ind, err in zip(index, nmae.tolist(), strict=False):
-                    f.write(f"{ind},{err}\n")
+        # write final CSV with header
+        with open(final_csv, "w") as f_out:


Currently, all nodes will try to write to this file, which can cause issues. Let's change this to add torch.distributed.barrier() to synchronize the processes and only let rank==0 write the final_csv.

forklady42 · 2026-01-21T19:49:29Z

src/electrai/lightning.py

+        if self.log_dir is not None:
+            self.log_dir = Path(self.log_dir)
+            self.log_dir.mkdir(exist_ok=True, parents=True)
+            self.tmp_dir = Path(self.out_dir) / "tmp"


self.out_dir could be None here. Should this be in the previous block?

forklady42 · 2026-01-21T19:51:44Z

src/electrai/lightning.py

+                nmae = nmae.unsqueeze(0)
+            tmp_csv = self.tmp_dir / f"metrics_batch_{self.global_rank}_{batch_idx}.csv"
+            with open(tmp_csv, "w") as f:
+                for i, n in zip(indices, nmae, strict=False):


Strongly prefer strict=True unless there is a reason we do not expect the dimensions to match.

forklady42 · 2026-01-21T19:57:25Z

src/electrai/lightning.py

+            for tmp_csv in all_tmp_csvs:
+                with open(tmp_csv) as f_in:
+                    for line in f_in:
+                        f_out.write(line)


Would be nice to clean up the tmp_dir files once this is done.

saving output from different ranks

5810308

forklady42 requested changes Jan 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

saving output from different ranks #62

saving output from different ranks #62

Uh oh!

hanaol commented Jan 16, 2026

Uh oh!

forklady42 Jan 21, 2026

Uh oh!

forklady42 Jan 21, 2026

Uh oh!

forklady42 Jan 21, 2026

Uh oh!

forklady42 Jan 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

saving output from different ranks #62

Are you sure you want to change the base?

saving output from different ranks #62

Uh oh!

Conversation

hanaol commented Jan 16, 2026

Uh oh!

forklady42 Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

forklady42 Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

forklady42 Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

forklady42 Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants