Skip to content

patrhez/hyperloglog

Repository files navigation

HyperLogLog

This is a toy program used to verify my understanding of the HyperLogLog algorithm.

The implementation is almost as the original paper of HyperLogLog. However, in order to get better results, a few changes are made to the original algorithm. For example, I compute the j(the index of the register) using the rightmost b bits of the hash value instead of the leftmost b bits. The reason is that when I was debugging, I noticed that the non-carefully selected hash function may cause the first b bits of the hash to be usually only 1 to 2 bits, which has a large impact on the result.

The functions used to generate the test data were written by ChatGPT, readers may choose other methods to get the test dataset.

I also compared this program with Redis' HyperLogLog using the same test data, see the hyperloglog_test.go.

References

  • Flajolet, Philippe, et al. "Hyperloglog: the analysis of a near-optimal cardinality estimation algorithm." Discrete mathematics & theoretical computer science Proceedings (2007).
  • Heule, Stefan, Marc Nunkesser, and Alexander Hall. "Hyperloglog in practice: Algorithmic engineering of a state of the art cardinality estimation algorithm." Proceedings of the 16th International Conference on Extending Database Technology. 2013.

Benchmark

Murmur3 is chosen for this test.

Benchmark on Intel mackbook pro:

go test -bench=. -benchtime=30s
count: 987141
relative error of my implementation: 0.002860
redis count: 979807
relative error of Redis' implementation: 0.010194
goos: darwin
goarch: amd64
pkg: github.com/hepengzheng/gohll
cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
Benchmark_MyHLL-12       	   72052	    498500 ns/op
Benchmark_RedisHLL-12    	  475570	     77690 ns/op
PASS
ok  	github.com/hepengzheng/gohll	145.780s

Benchmark on M3 mackbook pro:

$ go test -bench=. -benchtime=30s
count: 987141
relative error of my implementation: 0.002860
redis count: 979807
relative error of Redis' implementation: 0.010194
goos: darwin
goarch: arm64
pkg: github.com/hepengzheng/gohll
Benchmark_MyHLL-8      	  119472	    298700 ns/op
Benchmark_RedisHLL-8   	 1579178	     23010 ns/op
PASS
ok  	github.com/hepengzheng/gohll	129.392s

About

A toy program used to verify my understanding of the HyperLogLog algorithm

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages