Skip to content

Conversation

@WaVEV
Copy link
Collaborator

@WaVEV WaVEV commented Nov 20, 2025

Generated query:

[{'$group': {'_id': None,
             'count': {'$sum': {'$cond': {'else': 1,
                                          'if': {'$in': [{'$type': '$_id'},
                                                         ['missing', 'null']]},
                                          'then': None}}}}},
 {'$addFields': {'__now': '$$NOW'}},
 {'$unionWith': {'pipeline': [{'$documents': [{}]}]}},
 {'$project': {'count': {'$ifNull': ['$count', {'$literal': 0}]}}}]```

@aclark4life
Copy link
Collaborator

Awesome, thank you and thanks @asya999

Copy link
Collaborator

@aclark4life aclark4life left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test, particular one that would test something that breaks in the Django admin with QE (even though QE is not merged yet)?

}
)
pipeline.append({"$group": group})
# It may be a bug, $$NOW has to be called to be reachable in the rest of the pipeline.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add link to the SERVER ticket and add an explanatory comment about the purpose of the $unionWith clause.

I think we can also mention it in the release notes as a performance improvement since the team said $facet prevents the use of optimizations.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly, this cannot be implemented as it is.
To handle the global aggregation and the default values (for example:

SELECT coalesce(avg(age), 35) FROM people WHERE age > 100;

we need to wrap the aggregation so MongoDB always returns a single document.
This is done by wrapping the pipeline inside a lookup. Example:

db.z.aggregate([
    { $collStats: {} },
    { $lookup: { from: "z", as: "wrap", pipeline: [...pipeline stages] } },
    { $replaceWith: {"$cond": [{"$eq": ["$wrap", []]}, {}, {$first: "$wrap"}]}}
])

It’s not very intuitive use of $collStats, but it works 😄. It replaces the need for $unionWith.
$collStats is used as a pivot so we can apply the required operator after the group result.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well played (I think?) 😄

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😆 The idea comes from the team

@Jibola
Copy link
Contributor

Jibola commented Nov 21, 2025

Can you provide, in this PR, an example of what the removal of $facet makes the new query signatures look like?

@WaVEV WaVEV force-pushed the INTPYTHON-833-Refactor-Remove-$facet-usage-in-aggregation-pipeline branch from 3e2c32d to 273957f Compare November 25, 2025 03:41
@timgraham
Copy link
Collaborator

I updated the test on the Django fork so it won't fail.

@WaVEV WaVEV force-pushed the INTPYTHON-833-Refactor-Remove-$facet-usage-in-aggregation-pipeline branch from 273957f to c082f0d Compare November 26, 2025 03:56
@WaVEV WaVEV changed the title Remove $facet in top level group stages INTPYTHON-833 Remove $facet in top level group stages Nov 26, 2025
Copy link
Collaborator

@timgraham timgraham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need a test with assertAggregateQuery, perhaps in tests/aggregation_.

# If distinct=True or resolve_inner_expression=False, sum the size of the
# set.
lhs_mql = process_lhs(self, compiler, connection, as_expr=True)
lhs_mql = {"$ifNull": [lhs_mql, []]}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment about what fails without this?

@WaVEV WaVEV force-pushed the INTPYTHON-833-Refactor-Remove-$facet-usage-in-aggregation-pipeline branch from 9830b06 to 4dea72d Compare December 2, 2025 03:46
if self.wrap_for_global_aggregation:
# Use $collStats as a pivot to guarantee a single input document
pipeline = [
{"$collStats": {}},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed on Zoom, it sounds like $collStats causes extra filesystem access that could be avoided by using $document. If it's straightforward, use that for now and I'll add a condition to use $collStats only for encrypted connections in that PR.

Comment on lines 96 to 100
if self.wrap_for_global_aggregation:
# Add an empty extra document to handle default values on empty results
pipeline.append({"$unionWith": {"pipeline": [{"$documents": [{}]}]}})
if self.having:
pipeline.append({"$match": self.having})
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this make the test pass, but I think this should be tested in deep, need to add a test that having filter all the groups.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the current django tests cover the case and it is right.

@WaVEV WaVEV force-pushed the INTPYTHON-833-Refactor-Remove-$facet-usage-in-aggregation-pipeline branch from f65c7f1 to b8900af Compare December 5, 2025 05:37
@timgraham
Copy link
Collaborator

There is still some problem on evergreen / MongoDB 8:

ERROR: test_aggregation_default_using_duration_from_database (aggregation.tests.AggregateTestCase)
...
[2025/12/05 01:14:54.701]   File "/data/mci/6fa58471acf0c3d789a3318d07c79185/src/venv/lib/python3.10/site-packages/pymongo/helpers_shared.py", line 284, in _check_command_response
[2025/12/05 01:14:54.701]     raise OperationFailure(errmsg, code, response, max_wire_version)
[2025/12/05 01:14:54.701] pymongo.errors.OperationFailure: PlanExecutor error during aggregation :: caused by :: Builtin variable '$$NOW' is not available, full error: {'ok': 0.0, 'errmsg': "PlanExecutor error during aggregation :: caused by :: Builtin variable '$$NOW' is not available", 'code': 51144, 'codeName': 'Location51144'}

Do we need that workaround you had before?

Comment on lines +637 to +650
def test_overlap_empty_values(self):
qs = NullableIntegerArrayModel.objects.filter(order__lt=-30)
self.assertCountEqual(
NullableIntegerArrayModel.objects.filter(
field__overlap=qs.values_list("field"),
),
[],
)
self.assertCountEqual(
NullableIntegerArrayModel.objects.filter(
field__overlap=qs.values("field"),
),
[],
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is additional test coverage, not regression tests. Additional test coverage be separate commits that precede refactoring to make it clear that a refactor does not change behavior.

"tmp_name": {"$addToSet": expr.as_mql(compiler, connection, as_expr=True)},
}
},
{"$unionWith": {"pipeline": [{"$documents": [{"tmp_name": []}]}]}},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this won't work with QE (but maybe the query isn't supported for some other reason).

A problem with our current testing setup is that we have to replicate all the tests we want to do with QE under the "encryption_" test app.

So if we have two different code paths in this method and under if self.wrap_for_global_aggregation: in query.py, we won't have very comprehensive testing.

Something to think about.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 probably use some of the wrap ideas will work in order to provide support on QE, but I didn't explore it in deep. The current version with $facet isn't working either.

@WaVEV
Copy link
Collaborator Author

WaVEV commented Dec 9, 2025

There is still some problem on evergreen / MongoDB 8:

ERROR: test_aggregation_default_using_duration_from_database (aggregation.tests.AggregateTestCase)
...
[2025/12/05 01:14:54.701]   File "/data/mci/6fa58471acf0c3d789a3318d07c79185/src/venv/lib/python3.10/site-packages/pymongo/helpers_shared.py", line 284, in _check_command_response
[2025/12/05 01:14:54.701]     raise OperationFailure(errmsg, code, response, max_wire_version)
[2025/12/05 01:14:54.701] pymongo.errors.OperationFailure: PlanExecutor error during aggregation :: caused by :: Builtin variable '$$NOW' is not available, full error: {'ok': 0.0, 'errmsg': "PlanExecutor error during aggregation :: caused by :: Builtin variable '$$NOW' is not available", 'code': 51144, 'codeName': 'Location51144'}

Do we need that workaround you had before?

mmh, well, the ticket is closed. So, maybe the image wasn't the appropriate? will try different images

@timgraham
Copy link
Collaborator

Do we need that workaround you had before?

mmh, well, the ticket is closed. So, maybe the image wasn't the appropriate? will try different images

Generally, I don't know if we can rely on users having the latest micro version, however, since the test is very obscure perhaps keeping CI green is enough for this case.

@WaVEV
Copy link
Collaborator Author

WaVEV commented Dec 9, 2025

Do we need that workaround you had before?

mmh, well, the ticket is closed. So, maybe the image wasn't the appropriate? will try different images

Generally, I don't know if we can rely on users having the latest micro version, however, since the test is very obscure perhaps keeping CI green is enough for this case.

Right, so the workaround will be.

@WaVEV WaVEV force-pushed the INTPYTHON-833-Refactor-Remove-$facet-usage-in-aggregation-pipeline branch 2 times, most recently from 702a518 to 598ef67 Compare December 9, 2025 16:38
# $addToSet. The use of {_id: null} results in a
# single grouped array. However, because arrays from
# multiple documents are aggregated, the result is a
# list of lists.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly:
-> The project creates a list of documents based on sub-filer.
-> We unwind that list
-> We concatenate the list we unwound into one supreme list?
-> If list is empty we make it a bigger list
-> project only $tmp_name

can we rename tmp_name to something else to make it easier to follow? Maybe something like subquery_results?

Comment on lines +241 to +243
pipeline.append({"$group": {"_id": None, **group}})
# If there are no ids and no having clause, apply a global aggregation
self.wrap_for_global_aggregation = not bool(self.having)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I get an example of a query that would lead us here? It's harder to visualize all the way down. Not to say we shouldn't do this, but I think this necessitates a COLLSCAN.

Copy link
Collaborator Author

@WaVEV WaVEV Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeap, when the query does not have having

agg = Sum(
    Case(When(friends__age=40, then=F("friends__age"))),
    filter=Q(friends__name__startswith="test"),
)
self.assertEqual(Author.objects.aggregate(age=agg)["age"], 80)

Extracted from django test.

@WaVEV WaVEV force-pushed the INTPYTHON-833-Refactor-Remove-$facet-usage-in-aggregation-pipeline branch from 598ef67 to 4929c30 Compare December 9, 2025 20:37
@WaVEV WaVEV force-pushed the INTPYTHON-833-Refactor-Remove-$facet-usage-in-aggregation-pipeline branch from c08dc58 to 4921adc Compare December 10, 2025 00:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants