Skip to content

Conversation

@bittremieux
Copy link
Collaborator

Useful for timing estimates in the progress bar.

Useful for timing estimates in the progress bar.
@bittremieux bittremieux requested a review from wfondrie July 24, 2024 07:39
@codecov
Copy link

codecov bot commented Jul 24, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.50%. Comparing base (486221c) to head (e188d80).

Additional details and impacted files
@@           Coverage Diff           @@
##             main      #60   +/-   ##
=======================================
  Coverage   97.49%   97.50%           
=======================================
  Files          24       24           
  Lines         957      960    +3     
=======================================
+ Hits          933      936    +3     
  Misses         24       24           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@bittremieux
Copy link
Collaborator Author

Hmm, using this a bit more, it actually gives the following warning:

WARNING: UserWarning: Your IterableDataset has __len__ defined. In combination with multi-process data loading (when num_workers > 1), __len__ could be inaccurate if each worker is not configured independently to avoid having duplicate data.

So maybe not an ideal tweak in the end. Is the SpectrumDataset compatible with getting accurate timing estimates from the PyTorch Lightning progress bar (for which the number of batches is needed)?

@bittremieux
Copy link
Collaborator Author

Addition: because SpectrumDataset is an IterableDataset, it also doesn't support shuffling. We might want to do this during training.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants