-
Notifications
You must be signed in to change notification settings - Fork 169
Description
An Invalid floating point operation exception may be raised when reallocating a buffer between 32 and 64 bytes. The reallocation can be explicit, e.g. a call to ReallocMem(), but could also be implicit, e.g. when concatenating strings, making it much harder to spot.
This is due to the fact that FastMM copies memory blocks between 32 and 64 bytes using the FPU registers. But it assumes that all the registers it is going to use are free. This assumption is broken if faulty code was executed prior to the reallocation.
If the FPU overflow exception mask is reset, which is the case by default in Delphi, an exception is raised.
However, if the overflow exception mask is set, the error is ignored and the register contains a magic "invalid" value instead of the data copied to it. When the register content is restored at the destination memory location, a number of the bytes are incorrect, resulting in memory corruption.
In the POC code below, the val() Delphi function errors out, leaving a value on the FPU stack. The subsequent ReallocMem() tries to copy the buffer to its new location. It does so using the 8 FPU registers, one of which is still in use. This raises an exception. If you uncomment the Set8087CW($133f), the FPU overflow mask is set, the exception is silently ignored and memory corruption occurs, which can be seen by inspecting the buffer.
procedure TForm1.Button1Click(Sender: TObject);
const
BSMALL = 64;
BBIG = 256;
type
TSmall = array[0..BSMALL-1] of byte;
var
f: double;
i: integer;
p: pointer;
begin
GetMem(p, BSMALL);
for i := 0 to BSMALL-1 do
TSmall(p^)[i] := i;
// Set8087CW($133f);
val('565E394529', f, i);
ReallocMem(p, BBIG);
end;
My suggestion would be to ffree as many registers as needed, starting at the bottom of the stack. This way the required registers are free, with the caveat that additional garbage remains, but the situation is not made any worse and it reduces the number of calls to the minimum.
Other solutions were considered.
- Calling
fninitis not a good solution because it resets the FPU code and status word as well. - Calling
emmsis 2-4 times slower than freeing the registers. - Calling
femmsis only supported on AMD. - Popping from the stack may cause an underflow exception, or involves inspecting the status word and/or tag word to examine the FPU state, which ends up being slower than just freeing the registers.
Here is an example of a fixed Move36 function, with un-related code removed for simplicity.
procedure Move36(const ASource; var ADest; ACount: Integer);
asm
// Remove the bottom 4 values from the stack to make room for our 4 values
ffree st(7)
ffree st(6)
ffree st(5)
ffree st(4)
fild qword ptr [eax]
fild qword ptr [eax + 8]
fild qword ptr [eax + 16]
fild qword ptr [eax + 24]
mov ecx, [eax + 32]
mov [edx + 32], ecx
fistp qword ptr [edx + 24]
fistp qword ptr [edx + 16]
fistp qword ptr [edx + 8]
fistp qword ptr [edx]
end;