Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
ed48865
feat: implement T-Digest
tisonkun Dec 16, 2025
26ee955
impl merge and compress
tisonkun Dec 16, 2025
88ac87e
impl get_rank
tisonkun Dec 16, 2025
09afcc9
impl merge and add tests
tisonkun Dec 16, 2025
a3271d7
demo iter
tisonkun Dec 16, 2025
81ba5af
impl ser
tisonkun Dec 16, 2025
a213242
impl de
tisonkun Dec 16, 2025
5cc6e21
fine tune deserialize tags
tisonkun Dec 16, 2025
d90491d
define code in one place
tisonkun Dec 16, 2025
9497d24
centralize compare logics
tisonkun Dec 16, 2025
53b74ee
finish serde
tisonkun Dec 16, 2025
7175d98
enable freeze TDigestMut
tisonkun Dec 16, 2025
b37f08b
add serde compat test files
tisonkun Dec 16, 2025
88837bf
support deserialize_compat
tisonkun Dec 17, 2025
fe04e84
impl cdf and pmf
tisonkun Dec 17, 2025
c2e322a
fine tune docs
tisonkun Dec 17, 2025
8d7ed90
naming and let to do the reserve
tisonkun Dec 17, 2025
11fee5f
further tidy
tisonkun Dec 17, 2025
bebd87c
best effort avoid NaN
tisonkun Dec 17, 2025
243dc28
fixup! best effort avoid NaN
tisonkun Dec 17, 2025
2a4ad3d
concrete tag
tisonkun Dec 17, 2025
ab73d58
Merge branch 'main' into tdigests
tisonkun Dec 17, 2025
ddbe0e2
filter invalid inputs
tisonkun Dec 17, 2025
2f61d4f
weight nonzero and should not overflow
tisonkun Dec 17, 2025
b35cdb2
other_mean - self_mean may produce inf
tisonkun Dec 18, 2025
caffa5a
Merge branch 'main' into tdigests
tisonkun Dec 19, 2025
743ede9
no need for checking in sk files now
tisonkun Dec 19, 2025
1f0ce3e
reuse test data loading logics
tisonkun Dec 19, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
131 changes: 131 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 4 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,12 @@ all-features = true
rustdoc-args = ["--cfg", "docsrs"]

[dependencies]
byteorder = { version = "1.5.0" }
mur3 = { version = "0.1.0" }

[dev-dependencies]
googletest = { version = "0.14.2" }

[lints.rust]
unknown_lints = "deny"
unsafe_code = "deny"
Expand Down
1 change: 1 addition & 0 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -28,3 +28,4 @@

pub mod error;
pub mod hll;
pub mod tdigest;
55 changes: 55 additions & 0 deletions src/tdigest/mod.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.

//! T-Digest implementation for estimating quantiles and ranks.
//!
//! The implementation in this library is based on the MergingDigest described in
//! [Computing Extremely Accurate Quantiles Using t-Digests][paper] by Ted Dunning and Otmar Ertl.
//!
//! The implementation in this library has a few differences from the reference implementation
//! associated with that paper:
//!
//! * Merge does not modify the input
//! * Deserialization similar to other sketches in this library, although reading the reference
//! implementation format is supported
//!
//! Unlike all other algorithms in the library, t-digest is empirical and has no mathematical
//! basis for estimating its error and its results are dependent on the input data. However,
//! for many common data distributions, it can produce excellent results. t-digest also operates
//! only on numeric data and, unlike the quantiles family algorithms in the library which return
//! quantile approximations from the input domain, t-digest interpolates values and will hold and
//! return data points not seen in the input.
//!
//! The closest alternative to t-digest in this library is REQ sketch. It prioritizes one chosen
//! side of the rank domain: either low rank accuracy or high rank accuracy. t-digest (in this
//! implementation) prioritizes both ends of the rank domain and has lower accuracy towards the
//! middle of the rank domain (median).
//!
//! Measurements show that t-digest is slightly biased (tends to underestimate low ranks and
//! overestimate high ranks), while still doing very well close to the extremes. The effect seems
//! to be more pronounced with more input values.
//!
//! For more information on the performance characteristics, see the
//! [Datasketches page on t-digest](https://datasketches.apache.org/docs/tdigest/tdigest.html).
//!
//! [paper]: https://arxiv.org/abs/1902.04023

mod serialization;

mod sketch;
pub use self::sketch::TDigest;
pub use self::sketch::TDigestMut;
28 changes: 28 additions & 0 deletions src/tdigest/serialization.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.

pub(super) const PREAMBLE_LONGS_EMPTY_OR_SINGLE: u8 = 1;
pub(super) const PREAMBLE_LONGS_MULTIPLE: u8 = 2;
pub(super) const SERIAL_VERSION: u8 = 1;
pub(super) const TDIGEST_FAMILY_ID: u8 = 20;
pub(super) const FLAGS_IS_EMPTY: u8 = 1 << 0;
pub(super) const FLAGS_IS_SINGLE_VALUE: u8 = 1 << 1;
pub(super) const FLAGS_REVERSE_MERGE: u8 = 1 << 2;
/// the format of the reference implementation is using double (f64) precision
pub(super) const COMPAT_DOUBLE: u32 = 1;
/// the format of the reference implementation is using float (f32) precision
pub(super) const COMPAT_FLOAT: u32 = 2;
Loading