Steinar H. Gunderson

Wed, 27 Jul 2011 - The micro-optimization corner: unsigned to float

When dealing with performance on a low level, looking at the assembler code the compiler spits out (few of us ever program in pure assembler anymore) can often both be instructive and surprising. A while ago, as I was trying to figure out why a given loop was slow, I crashed into a problem I hadn't thought of before: How do you convert an unsigned integer to float?

To keep things reasonably simple, let's assume 32-bit x86, that the integer is 32 bits long, and that x87 and SSE (including SSE2) are available. How many different efficient ways can you come up with? Keep in mind that there is no direct conversion from unsigned to float on x86, only from signed to float.

These are the ones I've seen so far (not counting “insert a library call that probably uses one of these variants”):

Finally, there's the option I wanted the compiler to take: If you know the number is below 0x80000000, it doesn't matter if you treat it as signed or unsigned, so just do a signed load. In my case, I had just ANDed it with 0x1ffff, so the compiler had range information available (GCC tries to track the possible range of all expressions, based on their types and what operations they've been through). All that was needed was to file a small GCC bug, which was quickly and efficiently dealt with by the GCC maintainers. Yay!

Now, the horrors needed to convert the other way (float to unsigned) will be left for another day.

Update: I found another code sequence in Clang, used for 32-bit SSE code. It first loads the integer as if it were a float, then bitwise ORs in 0x4330000000000000 (making the register contain exactly to 2^52 + x, since the exponent is now 52 and the mantissa has 52 explicit bits, the lower 32 being the number itself), subtracts 0x4330000000000000 again (leaving exactly x in the register), and finally converts down to single precision. I'm not honestly sure which one of GCC's and Clang's sequences is faster; Clang's has fewer instructions, but there might be a penalty from treating an integer as a float like that, and the extra conversion will also of course take a few cycles. I haven't measured.

[10:37] | | The micro-optimization corner: unsigned to float

Steinar H. Gunderson <>