When dealing with performance on a low level, looking at the assembler code the compiler spits out (few of us ever program in pure assembler anymore) can often both be instructive and surprising. A while ago, as I was trying to figure out why a given loop was slow, I crashed into a problem I hadn't thought of before: How do you convert an unsigned integer to float?

To keep things reasonably simple, let's assume 32-bit x86, that the integer is 32 bits long, and that x87 and SSE (including SSE2) are available. How many different efficient ways can you come up with? Keep in mind that there is no direct conversion from unsigned to float on x86, only from signed to float.

These are the ones I've seen so far (not counting “insert a library call that probably uses one of these variants”):

- The simplest way is probably a
*sign extension*. You may not have a u32 load, but you do have an s64 load, and can extend from u32 → s64. On 32-bit, this means going through memory; in 64-bit, you can extend directly in the registers by just doing “mov eax, eax” or similar. This is the variant preferred by GCC when dealing with x87, but on a 32-bit platform you can't do a 64-bit integer load into SSE registers, so you'll have to look for something different. - Then, you have the method of
*combining multiple loads*. Floating-point numbers might be magical beasts, but they're still binary, so you rarely lose accuracy by shifting stuff up and down. So, you split the number in, say, the highest 16 bits, shift them down, load that as a float, multiply it up again, convert the lowest 16 bits to a float, and then add that. It's not very elegant, though, but it's what GCC prefers for SSE on 32-bit x86. - Also, you can try to
*post-correct*. Just load the number as if it were signed, and then you can do an integer test on it, adding 2^32 (converted to a float) if it's above 0x80000000. This is the approach preferred by MSVC9 if you are on x87, working in doubles (so you don't lose intermediate precision). However, it incurs a branch, and not necessarily an easily predictable one. Also, it's not very useful for single-precision SSE, unless you want to lose precision in the lower bits (e.g. -1 becomes 0.0) or want to convert via double. (There's some more magic in the MSVC code sequence if you don't specify`/fp:fast`

or similar, but I haven't looked into that.)

Finally, there's the option I *wanted* the compiler to take:
If you know the number is below 0x80000000, it doesn't matter
if you treat it as signed or unsigned, so *just do a signed load*.
In my case, I had just ANDed it with 0x1ffff, so the compiler
had range information available (GCC tries to track the possible
range of all expressions, based on their types and what operations
they've been through). All that was needed was to file
a small GCC bug,
which was quickly and efficiently dealt with by the GCC maintainers.
Yay!

Now, the horrors needed to convert the *other* way (float to
unsigned) will be left for another day.

*Update:* I found another code sequence in Clang, used for
32-bit SSE code. It first loads the integer as if it were
a float, then bitwise ORs in 0x4330000000000000 (making the
register contain exactly to 2^52 + x, since the exponent is
now 52 and the mantissa has 52 explicit bits, the lower 32 being
the number itself), subtracts 0x4330000000000000 again
(leaving exactly x in the register), and finally converts
down to single precision. I'm not honestly sure which one
of GCC's and Clang's sequences is faster; Clang's has fewer
instructions, but there might be a penalty from treating
an integer as a float like that, and the extra conversion
will also of course take a few cycles. I haven't measured.