Voronoi multi-precision using 64bit limbs on a 64bit compiler, GMP integration #42

bubnikv · 2020-05-25T16:12:11Z

On a 64bit compiler, 64bit int is used as a base for the multi-precision int. This makes the Voronoi about 25% faster than the original code.
If available, GMP hand crafted vectorized fixed point operators are used. This makes the Voronoi around 50% faster than the original code.

A sqr() operator was introduced instead of x * x, as the sqr operator may be optimized by GMP better than mul.

The unit tests seem to pass all right. I was not quite sure about voronoi_ctype_traits

#ifdef BOOST_VORONOI_64_T
  // using uint64 
  typedef extended_int<33> big_int_type;
#else
  // using uint32
  typedef extended_int<64> big_int_type;
#endif

I am using 33 limbs on a 64bit compiler instead of 32. The extended_int multiplication, add and subtract operators drop the top most zero, which leaves space for one limb. With 32bit limbs, the top most 32bit limb is released, while with the same operands the top most 64bit limb would possibly not be zero, therefore it would not be released. Therefore I increased the number of 64bit limbs to 33. Maybe @asydorchuk will consider this one extra limb unnecessary?

sqr operator instead of x * x, as the sqr operator may be optimized for complex (multi-precision) types. Conditional GMP integration for integer multi-precision expressions.

by an eval2ext() function, which is a variant of eval2(), which avoids overflow by normalization.

(Visual Studio compiler does not support int128 types) Fixed regression - compilation on 32bit systems.

bubnikv · 2020-05-30T05:59:25Z

It looks like my overly optimistic comments on performance increase were due to some bug that I have fixed now. The performance gains are there, but less pronounced, likely around 30% for GMP. That was measured on a single thread on a modern AMD chip with a lot of cache.

Switching to uint64 for the limbs does not make sense on Visual Studio with the current implementation, where there the compiler does not support uint64 natively and boost multiprecision is used instead.

On my old Intel Haswell laptop with 6GB cache and compiled with Visual Studio, the GMP gains are in the range of 23% when running Voronoi on a single thread, but only around 7% when running on all hardware threads as the Voronoi generator is is not quite cache friendly.

bubnikv added 4 commits May 22, 2020 13:56

Voronoi multi-precision using 64bit limbs on a 64bit compiler.

0dfa83f

sqr operator instead of x * x, as the sqr operator may be optimized for complex (multi-precision) types. Conditional GMP integration for integer multi-precision expressions.

Fixed some errors.

9935800

Crude fix of boostorg#41

626bca4

by an eval2ext() function, which is a variant of eval2(), which avoids overflow by normalization.

Fix of prusa3d/PrusaSlicer#4299

679b551

(Visual Studio compiler does not support int128 types) Fixed regression - compilation on 32bit systems.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Voronoi multi-precision using 64bit limbs on a 64bit compiler, GMP integration #42

Voronoi multi-precision using 64bit limbs on a 64bit compiler, GMP integration #42

Uh oh!

bubnikv commented May 25, 2020

Uh oh!

bubnikv commented May 30, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Voronoi multi-precision using 64bit limbs on a 64bit compiler, GMP integration #42

Are you sure you want to change the base?

Voronoi multi-precision using 64bit limbs on a 64bit compiler, GMP integration #42

Uh oh!

Conversation

bubnikv commented May 25, 2020

Uh oh!

bubnikv commented May 30, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant