Skip to content

Conversation

@jjatria
Copy link
Collaborator

@jjatria jjatria commented Apr 24, 2025

Not to be merged here. This PR is only for internal code review. My intention is to submit this upstream after we've had a chance to look. The text below is what I expect to put on the upstream ticket.


I'm unaware of the historical reasons behind this, but the Perl codebase is a little unconventional in some aspects. In particular, files like lib/Avro/Schema.pm have multiple packages in them.

I'm not sure what the experience is for other developers, but while working on some of the Perl tickets that have recently been merged, this was a constant source of frustration. Not only did it mean that a package definition may live in an unexpected file, but it also meant that eg. looking for a the implementation of a specific package's method would result in several false positives before hitting the one I wanted. It was just generally awkward.

More importantly, the current structure encourages some questionable engineering choices. For example, the is_data_valid method in lib/Avro/Schema.pm is shared by all the different schema types, which means that eg. calling it for a boolean type means the code needs to check for 7 other types before it gets to the code that is relevant to booleans. And not only that, but the validation rules in that method are different from those in eg. Avro::BinaryEncoder because the encoder does not call it: the int type for example is checked with a regular expression and a range check in the encoder, but with a pack roundtrip in is_data_valid.

This change breaks each package into separate files and then uses that structure to solve some of the issues described above. The goal is that this will do away with some bugs (like the possibility of the checks in the encoder and the schema drifting apart) and make it easier to make additional improvements later down the line.

jjatria added 2 commits April 23, 2025 22:30
This change introduces no behaviour changes, but it restructures
the code primarily so that each package lives in its own file.

While Perl allows one to have multiple packages in a single file,
or indeed to spread the definition of a single package over multiple
files, most modern distributions have settled on a model where each
file in principle maps to a single package. This not only simplifies
matters for new contributors (who can easily predict where a class
definition will be) but also eases the integration with some
development tools, like test coverage generators or plain old grep.

This change primarily splits the multiple packages in
lib/Avro/Schema.pm into separate files, specially the primitive classes
in the Avro::Schema::Primitive namespace, and moves the error classes
into their own files.

One major consequence of this change is that the 'is_data_valid' method
which was shared by all primitives and multi-dispatched based on the
schema type is now split for each schema, making maintenance simpler
and avoiding unnecessary checks (calling 'is_data_valid' for a boolean
type would first check the type against 7 other types before getting to
the code that was relevant to booleans).

Wherever possible, the code code in these new files has been moved
as-is to facilitate comparisons. This new structure should make some
later refactors simpler.
This commit makes use of the new structure introduced in the
previous commit ("Refactor module structure and split primitives")
primarily to simplify the checks in Avro::BinaryEncoder, which now
validate all data using the individual schema's 'is_data_valid'.

Before this change, there were data validation rules in both the
encoder and in the underlying schemas, and they didn't always line
up (for example, the encoder checked integers using a regular
expression and a range check, while the schema checked using a
'pack'-based roundtrip), opening the door for cases where the
encoder would accept data that the schema would reject.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants