Skip to content

Conversation

@crypto-a
Copy link
Contributor

@crypto-a crypto-a commented Sep 8, 2025

Fix Issues #2219, #2385 and the first part of #2489

This commit adds new test cases and the necessary implementation changes to correctly support the padding_idx=None option in the aten_embedding_bag operator. This aligns the ONNX Script operator with PyTorch's native behavior and expands test coverage for this feature.

Key Changes:

  • core.py: The aten_embedding_bag_padding_idx function has been updated to handle padding_idx=None. This new code routes the operation to the standard aten_embedding_bag implementation when no padding indices are specified.
  • extra_opinfo.py: Two new OpInfo definitions, test_embedding_bag_with_padding_idx_none and test_embedding_bag_with_padding_idx_int, have been added to the OP_DB list. These provide input samples to test the new and existing padding_idx functionality.
  • ops_test_data.py: The TESTED_TORCHLIB_OPS tuple has been updated to include the new tests, ensuring they are discovered and executed by the test runner.

@crypto-a
Copy link
Contributor Author

crypto-a commented Sep 8, 2025

@microsoft-github-policy-service agree

@codecov
Copy link

codecov bot commented Sep 9, 2025

Codecov Report

❌ Patch coverage is 5.88235% with 32 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.03%. Comparing base (fb6b40f) to head (d3f7a58).
⚠️ Report is 7 commits behind head on main.

Files with missing lines Patch % Lines
onnxscript/function_libs/torch_lib/ops/core.py 5.88% 31 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2549      +/-   ##
==========================================
- Coverage   70.07%   70.03%   -0.04%     
==========================================
  Files         226      226              
  Lines       27276    27285       +9     
  Branches     2754     2756       +2     
==========================================
- Hits        19113    19110       -3     
- Misses       7213     7224      +11     
- Partials      950      951       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@crypto-a
Copy link
Contributor Author

@justinchuby I can handle the linting issues, but I’m confused about the other CI failures — could you help?

@crypto-a
Copy link
Contributor Author

@justinchuby updated fixed the issues in the code. can you review?

@jsmonson
Copy link

@crypto-a I would love to see this PR go in! Are you still planning on contributing it?

@justinchuby
Copy link
Collaborator

Changes LGTM. CI is reporting

TypeError: aten_embedding_bag() got an unexpected keyword argument 'padding_idx'

@justinchuby justinchuby added this to the 0.5.6 milestone Oct 29, 2025
@justinchuby justinchuby removed this from the 0.5.6 milestone Dec 12, 2025
@justinchuby
Copy link
Collaborator

@crypto-a could you take a look at the CI errors and rebase from main? Thanks

@crypto-a
Copy link
Contributor Author

Will take a look at it

@crypto-a
Copy link
Contributor Author

The CI is failing because my refactored masking-based approach is producing different numerical results than the original algorithm for mean/max aggregation modes, causing test failures. I am working on fixing the implementation

@crypto-a
Copy link
Contributor Author

@justinchuby , I’ve fixed the CI issues. I had to remove the test cases I added myself. Turns out the scenarios I needed for my code to work were already covered in the OpInfo tests. All tests are now passing on my end. Could you please rerun CI? Thanks!

@crypto-a
Copy link
Contributor Author

crypto-a commented Dec 16, 2025

Tests are passing, the branch is up to date, and it’s ready to be merged

# Modify this section ##########################################################


def _embedding_bag_input_wrangler(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Final round of reivews: Is this necessary? I think it should accept a None input?

sparse: bool = False,
per_sample_weights: Optional[TFloat] = None,
include_last_offset: bool = False,
padding_idx: Optional[int] = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked https://github.com/pytorch/pytorch/blob/8656dea039bd0a31952a6a9792566e70b07429dc/aten/src/ATen/native/native_functions.yaml#L2372-L2378 and found that embedding_bag does not have padding_idx as a parameter. I think you only need to update aten_embedding_bag_padding_idx

@justinchuby justinchuby merged commit 134449b into microsoft:main Dec 19, 2025
32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

3 participants