-
Notifications
You must be signed in to change notification settings - Fork 31
INTPYTHON-833 Remove $facet in top level group stages #449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
INTPYTHON-833 Remove $facet in top level group stages #449
Conversation
|
Awesome, thank you and thanks @asya999 |
aclark4life
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test, particular one that would test something that breaks in the Django admin with QE (even though QE is not merged yet)?
django_mongodb_backend/compiler.py
Outdated
| } | ||
| ) | ||
| pipeline.append({"$group": group}) | ||
| # It may be a bug, $$NOW has to be called to be reachable in the rest of the pipeline. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add link to the SERVER ticket and add an explanatory comment about the purpose of the $unionWith clause.
I think we can also mention it in the release notes as a performance improvement since the team said $facet prevents the use of optimizations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly, this cannot be implemented as it is.
To handle the global aggregation and the default values (for example:
SELECT coalesce(avg(age), 35) FROM people WHERE age > 100;we need to wrap the aggregation so MongoDB always returns a single document.
This is done by wrapping the pipeline inside a lookup. Example:
db.z.aggregate([
{ $collStats: {} },
{ $lookup: { from: "z", as: "wrap", pipeline: [...pipeline stages] } },
{ $replaceWith: {"$cond": [{"$eq": ["$wrap", []]}, {}, {$first: "$wrap"}]}}
])
It’s not very intuitive use of $collStats, but it works 😄. It replaces the need for $unionWith.
$collStats is used as a pivot so we can apply the required operator after the group result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well played (I think?) 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😆 The idea comes from the team
|
Can you provide, in this PR, an example of what the removal of $facet makes the new query signatures look like? |
3e2c32d to
273957f
Compare
|
I updated the test on the Django fork so it won't fail. |
273957f to
c082f0d
Compare
timgraham
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need a test with assertAggregateQuery, perhaps in tests/aggregation_.
django_mongodb_backend/aggregates.py
Outdated
| # If distinct=True or resolve_inner_expression=False, sum the size of the | ||
| # set. | ||
| lhs_mql = process_lhs(self, compiler, connection, as_expr=True) | ||
| lhs_mql = {"$ifNull": [lhs_mql, []]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment about what fails without this?
9830b06 to
4dea72d
Compare
django_mongodb_backend/query.py
Outdated
| if self.wrap_for_global_aggregation: | ||
| # Use $collStats as a pivot to guarantee a single input document | ||
| pipeline = [ | ||
| {"$collStats": {}}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed on Zoom, it sounds like $collStats causes extra filesystem access that could be avoided by using $document. If it's straightforward, use that for now and I'll add a condition to use $collStats only for encrypted connections in that PR.
django_mongodb_backend/query.py
Outdated
| if self.wrap_for_global_aggregation: | ||
| # Add an empty extra document to handle default values on empty results | ||
| pipeline.append({"$unionWith": {"pipeline": [{"$documents": [{}]}]}}) | ||
| if self.having: | ||
| pipeline.append({"$match": self.having}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this make the test pass, but I think this should be tested in deep, need to add a test that having filter all the groups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the current django tests cover the case and it is right.
f65c7f1 to
b8900af
Compare
|
There is still some problem on evergreen / MongoDB 8: Do we need that workaround you had before? |
| def test_overlap_empty_values(self): | ||
| qs = NullableIntegerArrayModel.objects.filter(order__lt=-30) | ||
| self.assertCountEqual( | ||
| NullableIntegerArrayModel.objects.filter( | ||
| field__overlap=qs.values_list("field"), | ||
| ), | ||
| [], | ||
| ) | ||
| self.assertCountEqual( | ||
| NullableIntegerArrayModel.objects.filter( | ||
| field__overlap=qs.values("field"), | ||
| ), | ||
| [], | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is additional test coverage, not regression tests. Additional test coverage be separate commits that precede refactoring to make it clear that a refactor does not change behavior.
django_mongodb_backend/lookups.py
Outdated
| "tmp_name": {"$addToSet": expr.as_mql(compiler, connection, as_expr=True)}, | ||
| } | ||
| }, | ||
| {"$unionWith": {"pipeline": [{"$documents": [{"tmp_name": []}]}]}}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this won't work with QE (but maybe the query isn't supported for some other reason).
A problem with our current testing setup is that we have to replicate all the tests we want to do with QE under the "encryption_" test app.
So if we have two different code paths in this method and under if self.wrap_for_global_aggregation: in query.py, we won't have very comprehensive testing.
Something to think about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 probably use some of the wrap ideas will work in order to provide support on QE, but I didn't explore it in deep. The current version with $facet isn't working either.
mmh, well, the ticket is closed. So, maybe the image wasn't the appropriate? will try different images |
Generally, I don't know if we can rely on users having the latest micro version, however, since the test is very obscure perhaps keeping CI green is enough for this case. |
Right, so the workaround will be. |
702a518 to
598ef67
Compare
| # $addToSet. The use of {_id: null} results in a | ||
| # single grouped array. However, because arrays from | ||
| # multiple documents are aggregated, the result is a | ||
| # list of lists. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly:
-> The project creates a list of documents based on sub-filer.
-> We unwind that list
-> We concatenate the list we unwound into one supreme list?
-> If list is empty we make it a bigger list
-> project only $tmp_name
can we rename tmp_name to something else to make it easier to follow? Maybe something like subquery_results?
| pipeline.append({"$group": {"_id": None, **group}}) | ||
| # If there are no ids and no having clause, apply a global aggregation | ||
| self.wrap_for_global_aggregation = not bool(self.having) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I get an example of a query that would lead us here? It's harder to visualize all the way down. Not to say we shouldn't do this, but I think this necessitates a COLLSCAN.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeap, when the query does not have having
agg = Sum(
Case(When(friends__age=40, then=F("friends__age"))),
filter=Q(friends__name__startswith="test"),
)
self.assertEqual(Author.objects.aggregate(age=agg)["age"], 80)Extracted from django test.
598ef67 to
4929c30
Compare
c08dc58 to
4921adc
Compare
Generated query:
[{'$group': {'_id': None, 'count': {'$sum': {'$cond': {'else': 1, 'if': {'$in': [{'$type': '$_id'}, ['missing', 'null']]}, 'then': None}}}}}, {'$addFields': {'__now': '$$NOW'}}, {'$unionWith': {'pipeline': [{'$documents': [{}]}]}}, {'$project': {'count': {'$ifNull': ['$count', {'$literal': 0}]}}}]```