22 Jul, 2010
Some time ago when we reformatted the glog, we said we would be posting some more technical stuff for your enjoyment. So I thought I’d describe some of the work I’ve been doing to speed up the PSP version of Game Maker.
As some of you may know, Game Maker uses doubles internally for all it’s numbers. This allows a very simple end-user experience as when you’re learning you don’t want to bother with knowing about integers or floating point numbers; you just want a number. And while the PC can handle these very quickly, the PSP doesn’t have hardware support for them, meaning they are processed by the CPU in software and are incredibly slow. In the early days of the PSP port, we spent a long time converting internal code to use INTs and FLOATs rather than doubles, but we still have to use doubles when dealing with GML as that’s what the end user uses. This means using library functions to convert from a DOUBLE format into a FLOAT or an INT. These are slow… really slow. In our latest profile they came out about 3rd top of the function time list, with only the kernel and audio mixing beating it. So before we change these, we have to understand the double number format.

IEEE 754 Double format
So this is how doubles are stored in memory and we have to try and extract the integer part of it quicker than the library function. First, what does each part do? The sign bit simply tells us if it’s a positive or negative number, and the fraction is the actual numerical data. The interesting part is the exponent. This tells us where the decimal point goes. Now, we don’t have a decimal, we have a binary one so we have a binary point. So in normal binary %11 would give us 3. But what if there was a binary point inthe middle? (%1.1) Well this would give us 1.5 while %11.1 would give us 3.5 etc. Now this takes a little thinking to get your head around, and I’m not going to go into it too much, but suffice to say that the exponent tells us where this point goes, and if we’re casting to an INT, how much data we can ditch. Since we’re only interested in whole numbers, we can remove anything after the point. The exponent is also offset by 1023, which allows for large numbers, and small fractions meaning 1.0 is actually 1, with an exponent of 1023 (i.e. the decimal point hasn’t moved at all). Or rather… it WOULD be 1, except floating point numbers use a little trick to get more accuracy; they don’t store the leading bit!
Now, it all gets (even more) technical here, but suffice to say that when you store a number in the fraction part, you have to shift the number all the way to the left. so %000000001001001101 would actually be shifted up to %1001001101000…with enough zeros to make 52bits. This is called normalising the number. Now, I also said the FPU removes the leading bit, as can assume it’s there, so our number actually becomes %001101000..(to 52bits). Yes, this is odd… but it’s what the FPU wants, so that’s what we do.
Now,thats the basic idea… if you look at a HEX dump of a double storing 1.0, it’ll actually be 0x3ff0000000000000. This is an exponent of 1023 (0x3ff), and a fraction of 0 (as the 1 is removed). yes… very odd.
Still, with all this knowledge, we can now try and speed up the casting of DOUBLE to an INT. The first thing to do would be to write it in C so that it’s easy to follow and debug. So here we go… a simple C function to convert a DOUBLE, to an INT.
int Double2Int( double _d )
{
uint32* pD = (uint32*)&_d;
uint32 s = pD[1]&0x80000000;
int exp = ((pD[1]>>20)&0x7ff)-1023; // get exponent
if( exp<0 ) return 0; // ONLY a fraction
int64 t0 = *((int64*)pD)&0x000fffffffffffffL;
t0|=0x0010000000000000L;
int shift = 52-exp;
t0 = t0>>shift;
if( s!=0 ) t0=-t0;
return (int)t0;
}
We access the double as 2 integers for ease of use by taking the base address of the double, and then extract the sign. Next we grab the exponent and work out if it’s a positive exponent and therefore has an integer part, or negate exponent and is all fraction. If its negative, then we have a fraction only (like 0.001234) and casting it to an INT would return 0, so that’s what we do.
Next we get the 52bits of the fraction and OR in the leading bit (remember the FPU drops the leading bit for better accuracy). Since the exponent is the shift of the decimal point, we do a 52-exponent to get a real shift value, and after this, it’s just a matter of shifting down the bits to remove the fractional component and getting the whole number part. As we now have a whole number all that’s left to do is check the sign and negate it if need be; and that’s it!
Now while this is okay, it still uses 64bit numbers to make it simple to debug, but as we want to make it as fast as possible on the PSP, we’ll write it directly in MIPS assembler. Don’t worry if you can’t follow this, but it’s here to show the lengths we’ll go to in order to make the PSP version as fast as possible.
asm inline int32 Double2Int( double _d )
{
// Get exponent
sra t0,a1,20
andi t0,t0,0x7ff
addi t0,t0,-1023 // rebase exponent
bltz t0, Exit // if ONLY a fraction, then return 0 (in v0)
or v0,zero,zero
// get upper 20 bits (20_32 format, 2 registers)
lui t1,0xf
ori t1,t1,0xffff
and v0,a1,t1 // Get rid of sign and exponent
lui t1,0x0010 // add in 'explicit' 1
or v0,v0,t1
// Get 32bits of data - that theres ANY chance of keeping.
sll v0,v0,11
srl t1,a0,21
or v0,v0,t1
ori t2,zero,52 // 52 - exp to get shift 'right' value
sub t0,t2,t0
addi t0,t0,-21 // take off 20 bits from the lower 32bits we've killed already.
srl v0,v0,t0 // + 1 for 'implicit' 1 at the top.
srl a1,a1,31 // Get 'sign' into bit 0 (while nuking the rest of the bits)
beq a1, zero, Exit // if NO sign, then return number, otherwise, we have to negate it.
lui t0,0x8000 // Little fudge to allow -2147483648 (cant do 0- -2147483648)
beq v0,t0,Exit
nop
jr ra
sub v0,zero,v0 // negate...
Exit:
jr ra
nop
}
Now this MIPS assembler version does the same as the C version, but only uses 32bit registers and has no memory accesses. We also have a little fudge in this version to allow for -2147483648. This is because you can’t have +2147483648 inside an INT (not enough bits), but you CAN have -2147483648. So we simply check for this number and force that as the return value.
This speeds up the whole process massively, to the point that the DOUBLE to INT cast no longer even appears inside the profile. Now there are several other functions to convert. DOUBLE to UNSIGNED INT, INT to DOUBLE, UNSIGNED INT to DOUBLE, FLOAT to DOUBLE and DOUBLE to FLOAT. Once we’ve converted these over we get a pretty sizable boost in performance, so it’s well worth doing.
Hope you followed all this… and enjoyed it. If you did, we’ll try to post more of this stuff down the line.