Skip to content

Conversation

@robertbastian
Copy link
Contributor

@robertbastian robertbastian commented Dec 9, 2025

The data that is currently being generated in parley_data is the same data that ICU4X ships with. However, using try_new_unstable constructors with custom data providers can be less efficient than enabling the compiled_data feature, as these constructors do runtime lookups and branching, whereas most compiled_data constructors are const.

Benchmarks look neutral:

Default Style - arabic 20 characters               [   9.9 us ...   9.7 us ]      -1.46%*
Default Style - latin 20 characters                [   4.5 us ...   4.3 us ]      -4.27%*
Default Style - japanese 20 characters             [   9.1 us ...   8.9 us ]      -2.30%*
Default Style - arabic 1 paragraph                 [  55.5 us ...  55.6 us ]      +0.13%
Default Style - latin 1 paragraph                  [  18.2 us ...  17.9 us ]      -1.49%*
Default Style - japanese 1 paragraph               [  76.8 us ...  76.9 us ]      +0.16%
Default Style - arabic 4 paragraph                 [ 234.0 us ... 235.1 us ]      +0.48%
Default Style - latin 4 paragraph                  [  69.0 us ...  68.2 us ]      -1.05%*
Default Style - japanese 4 paragraph               [ 131.9 us ... 136.0 us ]      +3.11%
Styled - arabic 20 characters                      [  11.3 us ...  11.3 us ]      -0.43%
Styled - latin 20 characters                       [   6.3 us ...   6.3 us ]      -0.99%
Styled - japanese 20 characters                    [   9.9 us ...   9.7 us ]      -1.80%*
Styled - arabic 1 paragraph                        [  59.4 us ...  58.5 us ]      -1.40%
Styled - latin 1 paragraph                         [  23.7 us ...  23.3 us ]      -1.82%*
Styled - japanese 1 paragraph                      [  86.6 us ...  87.5 us ]      +1.05%*
Styled - arabic 4 paragraph                        [ 251.7 us ... 252.5 us ]      +0.32%
Styled - latin 4 paragraph                         [  90.4 us ...  89.1 us ]      -1.45%*
Styled - japanese 4 paragraph                      [ 123.7 us ... 124.0 us ]      +0.24%

@robertbastian robertbastian force-pushed the baked-data branch 2 times, most recently from 160ed2e to d637c00 Compare December 22, 2025 22:56
@robertbastian robertbastian marked this pull request as ready for review December 22, 2025 23:01
Copy link
Contributor

@taj-p taj-p Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how much simpler this PR makes Parley and Parley Data, but I worry about how it impacts future work, which you may be able to provide expertise in guiding.

Removing the compartmentalisation provided by AnalysisDataSources would make it harder for us to enable a BYO data mechanism IIUC. In the short term, we want to enable support for complex scripts (and for consumers to pass that data in).

For context, we want to enable a workflow such that, on the web, we can ship the binary separately from ICU data to clients. This enables us to evolve a binary (which is more volatile than ICU data) without the consumer needing to download the same ICU data each binary version.

Separately, this enables a more "pay for what you use" approach with the application layer deciding what ICU data may be provided for a given application state (which may evolve during a client's session).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this change reduces the size of the Vello Editor example from 9.7 MB to 9.57 MB 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants