-
Notifications
You must be signed in to change notification settings - Fork 1
feat: serde-based analytics exporter #30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I think I'm missing some context, why are we doing this? We should keep our code as simple as possible and having multiple features-gated options to export to parquet goes in the opposite direction. |
|
As it turned out, When migrating to using
With the changes in this PR, it offers both options for easier migration. |
Isn't the fork changing only 3 lines of code and only exists because we use
In general I'm against this kind of changes unless there is a very good reason to do them. |
Yes. Supporting other types (e.g.
I'm not talking about
|
|
So the blocker for using native Do we really need the struct on which we derive |
Not strictly speaking a blocker, since we have a fork with
Not unless we create another layer to borrow Anyhow, since there's no appetite for replacing |
Description
Refactors
parquet-based analytics exporters to add more configuration options, as well as add aserde-based exporter (usingserde_arrowcrate).There are some minor API changes in how these serializers are instantiated and configured, but the default configuration hasn't changed.
The crate now offers two features (both disabled by default):
parquet-native: Serializes the data using the nativeparquetRecordWriter<T>implementation (often derived using theParquetRecordWritermacro).parquet-serde: Generatesparquetschema usingserde_arrowand serializes the data usingserde. Note that this requires bothSerializeandDeserializeOwnedbeing implemented on the exported data, which breaks a common use case of using&'static strin the data exports. I suggest using something likeArcStrto cover all cases of exporting strings - owned, shared and static.For easier migration from native to serde serializer, and for usage in integration testing, both serializers now offer a
schema()function that returns the schema generated for the exported type. There's also theschema_from_str()function that parses a string schema for verification. See this crate'sparquet_schemaintegration test for an example.Note that this crate is no longer using the forked versions of
parquetandparquet_derive. In case of using the native serializer, the version ofparquetused in the consumer should match the one used in this crate, whileparquet_derivecan be of any version, e.g.:How Has This Been Tested?
Existing tests, with a few new ones to cover serialization and schema matching between multiple implementations of
parquetserializers.Due Diligence