Skip to content

Measuring cycles and optimized bzero() #1

@ghost

Description

The intrinsic function _rdtsc() doesn't serialize the processor, so you'll get even more 'unstable' readings from the timescamp counter... It is prudent to use your own as in:

inline uint64_t __rdtsc(void)
{
   uint32_t a, d;

  __asm__ __volatile__ ( "xorl %%eax,%%eax\t\n
                         "cpuid\t\n"
                         "rdtsc" 
                         : "=a" (a), "=d" (d) : : 
#ifdef __x86_64
                         "rbx", "rcx"
#else
                         "ebx", "ecx"
#endif
  );

  return ((uint64_t)d << 32) + a;
}

In newer processors (Sandy Bridge or superior, if I'm not mistaken), a single REP STOSB is faster than the combination of REP STOSD and REP STOSB... And even faster than using SIMD... So, your bzero() routine can be a single macro as:

#define bzero(ptr, cnt) \
  __asm__ __volatile__ ( \
    "rep; movsb" \
    : : "D" (ptr), "c" (cnt), "a" (0) \
  );

[]s
Fred

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions