-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Description
The intrinsic function _rdtsc() doesn't serialize the processor, so you'll get even more 'unstable' readings from the timescamp counter... It is prudent to use your own as in:
inline uint64_t __rdtsc(void)
{
uint32_t a, d;
__asm__ __volatile__ ( "xorl %%eax,%%eax\t\n
"cpuid\t\n"
"rdtsc"
: "=a" (a), "=d" (d) : :
#ifdef __x86_64
"rbx", "rcx"
#else
"ebx", "ecx"
#endif
);
return ((uint64_t)d << 32) + a;
}In newer processors (Sandy Bridge or superior, if I'm not mistaken), a single REP STOSB is faster than the combination of REP STOSD and REP STOSB... And even faster than using SIMD... So, your bzero() routine can be a single macro as:
#define bzero(ptr, cnt) \
__asm__ __volatile__ ( \
"rep; movsb" \
: : "D" (ptr), "c" (cnt), "a" (0) \
);[]s
Fred
Metadata
Metadata
Assignees
Labels
No labels