Skip to content

Conversation

@theory
Copy link
Collaborator

@theory theory commented Dec 31, 2025

Add tests for Postgres aggregate function pushdown. Eventually it will cover all of the aggregate functions; for now it tests avg() and array_agg() with min() started but commented-out, because Postgres doesn't push it down, instead sending a an ORDER BY and LIMIT 1, which works great for some databases, but as of now pg_clickhouse doesn't push down that LIMIT 1. And it would be better to let it execute its own plan. More investigation on that in future commits.

Meanwhile, fix a few bugs discovered while testing array_agg() and a failing ClickBench query.

First, ClickHouse UUID arrays fetched via the http engine failed to parse into Postgres Arrays because ClickHouse single-quotes each UUID and Postgres does not. Update the parser to replace those single quotes with spaces, which Postgres happily ignores.

Next, dates were improperly displayed as 10529827-09-17 instead of 2025-12-05 when loaded in arrays (as by date_agg()). Tom Lane suggested using timestamptz_date instead of the time_t_to_timestamptz(val) call used previously, but it kept returning the wrong values: off by a day in the aggregation tests and 2000-01-01 in the binary insert tests.

Turns out there were two issues: 1). timestamp_date is the right function, not timestamptz_date; and 2) there was some conversion code just for Dates after selection! Switching to timestamp_date and eliminating the conversion code fixed those issues.

Finally, dates from ClickBench weren't showing up at all. I figured out that the dates I was looking at were all 1970-01-01, and that the binary conversion code handled a "special case" where an epoch value of 0 was null. It's not!

So add tests for epoch zero Date and DateTime values and remove those "special cases".

This revealed an issue where a DateTime64 epoch 0 returned 2000-01-01 on macOS, though not on Linux. Switching to time_t_to_timestamptz fixes the issue, but requires separate handling of sub-second precision, done with integers rather than floats, fixing imprecision in the output of some timestamps.

While at it, add support for Date32, which behaves exactly like Date as far as Postgres is concerned.

@theory theory self-assigned this Dec 31, 2025
@theory theory force-pushed the dates-and-aggs branch 3 times, most recently from 91ba407 to cd16e4d Compare December 31, 2025 22:58
@theory theory changed the title Test aggregates, fix http UUIDs Fix binary date/time conversion; add Date32 Dec 31, 2025
@theory theory marked this pull request as ready for review December 31, 2025 22:58
@theory theory requested a review from serprex December 31, 2025 22:59
@theory theory added bug Something isn't working data types Improve data type support aggregates Improve aggregate pushdown engines Improve binary and/or http engine support benchmarks Benchmark and standard test improvements labels Dec 31, 2025
@theory theory force-pushed the dates-and-aggs branch 5 times, most recently from 70a0a2b to 573a736 Compare January 1, 2026 16:31
@theory theory linked an issue Jan 1, 2026 that may be closed by this pull request
Add tests for Postgres aggregate function pushdown. Eventually it will
cover all of the aggregate functions; for now it tests `avg()` and
`array_agg()`, with `min()` started but commented-out, because Postgres
doesn't push it down, instead sending a an `ORDER BY` and `LIMIT 1`,
which works great for some databases, but as of now pg_clickhouse
doesn't push down that `LIMIT 1`. And it would be better to let it
execute its own plan. More investigation on that in future commits.

Meanwhile, fix a few bugs discovered while testing `array_agg()` and a
failing ClickBench query.

First, ClickHouse UUID arrays fetched via the http engine failed to
parse into Postgres Arrays because ClickHouse single-quotes each UUID
and Postgres does not. Update the parser to replace those single quotes
with spaces, which Postgres happily ignores.

Next, dates were improperly displayed as `10529827-09-17` instead of
`2025-12-05` when loaded in arrays (as by `date_agg()`). Tome Lane
suggested using `timestamptz_date` instead of the
`time_t_to_timestamptz(val)` call used previously, but it kept returning
the wrong values: off by a day in the aggregation tests and `2000-01-01`
in the binary insert tests.

Turns out there were two issues: 1). `timestamp_date` is the right
function, not `timestamptz_date`; and 2) there was some conversion code
just for Dates after selection! Switching to `timestamp_date` and
eliminating the conversion code fixed those issues.

Finally, dates from ClickBench weren't showing up at all. I figured out
that the dates I was looking at were all `1970-01-01`, and that the
binary conversion code handled a "special case" where an epoch value of
`0` was null. It's not!

So add tests for epoch zero `Date` and `DateTime` values and remove
those "special cases".

This revealed an issue where a DateTime64 epoch `0` returned
`2000-01-01` on macOS, though not on Linux. Switching to
`time_t_to_timestamptz` fixes the issue, but requires separate handling
of sub-second precision, done with integers rather than floats, fixing
imprecision in the output of some timestamps.

While at it, add support for `Date32`, which behaves exactly like `Date`
as far as Postgres is concerned.
@theory theory merged commit 2bc4c63 into main Jan 2, 2026
34 checks passed
@theory theory deleted the dates-and-aggs branch January 2, 2026 21:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

aggregates Improve aggregate pushdown benchmarks Benchmark and standard test improvements bug Something isn't working data types Improve data type support engines Improve binary and/or http engine support

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Binary Engine DateTime64 Parsing Loses Precision

3 participants