5F3759DF: An explanation of the world's most infamous magic number
March 17, 2021
Here's a famous function in the Quake III source code. Can you guess what it does?
float Q_rsqrt( float number ){long i;float x2, y;const float threehalfs = 1.5F;x2 = number * 0.5F;y = number;i = * ( long * ) &y; // evil floating point bit level hackingi = 0x5f3759df - ( i >> 1 ); // what the fuck?y = * ( float * ) &i;y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration// y = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removedreturn y;}
This function calculates the inverse of the square root of a number.
Link to this section Preliminary Explanation
Link to this section Why?
Quake III needs to perform a lot of calculations to simulate lighting. Each face of a polygon in the scene is represented by a 3D vector pointing perpendicular to it. Then, these vectors are used to calculate the reflections of light off of the face.
Each face could be represented by a vector of any length--the important information is the direction of the vector, not the length of the vector. However, the calculations become much more simple if all of the vectors have length 1. Thus, we need to normalize these vectors. We can do this by dividing the vector by its length (it's norm, ). Let denote the vector before normalization and denote the normalized vector.
Using the Pythagorean theorem:
We can then break this down into each component (x, y, and z) of the 3-dimensional vector:
The addition and multiplication are easy enough to execute, but both the division and the square root would be very slow on a Quake III-era CPU. Thus, we need a faster method to return an approximate value of .
Link to this section IEEE 754: How Computers Store Fractions
It's pretty straightforward to store an integer in binary by just converting it into base 2:
But how do computers store fractions? One simple approach would be to put a decimal point at a fixed position in the number:
This seems to work well, but we've drastically reduced the size of the numbers we can store. We only have half as many bits to store our integer part, which limits us to a relatively small range of numbers, and we only have half as many bits for our fractional part, which doesn't give us a ton of precision.
Instead, we can borrow the idea of scientific notation, which represents numbers with a mantissa and exponent like . With this, we can store very large values and very small values with a consistently low relative error. This is called "floating point"--unlike the prior idea, the decimal point can effectively "float around" in the bit representation to give us precision at both very small and very large values. Thus, if is the number we want to store, we write it in terms of exponent and mantissa :
At the time Quake III was released, most CPUs were 32-bit. Thus, the floating point representation used 32 bits to store a number. They were allocated as follows:
1 bit | 8 bits | 23 bits |
---|---|---|
sign | exponent | mantissa |
The "sign" bit is used to represent whether the number was positive or negative. In this case, since we know the argument to the square root function should always be positive, we can assume it to always be zero. There are also some special cases that happen for special values like NaN
, inf
, or very small numbers, which we can also ignore for now.
Note that the exponent could be positive or negative. To accommodate this, it is stored with an offset of 127. For example, to store , we would take the exponent and add to it to get .
In scientific notation, the first digit of the mantissa must be between 1 and 9. In this "binary scientific notation", the first digit must be between 1 and 1. (You can see this in our example: .) Therefore, we don't actually have to store it.
Link to this section Interpreting floats as ints
An integer is just represented as a base 2 number. So, what happens if we take a number, find the bits of its floating-point representation, and then interpret those bits as a base 2 integer number?
Recall that our float is stored as:
1 bit | 8 bits | 23 bits |
---|---|---|
sign | exponent representation | mantissa representation |
(Note that the least significant bit is on the right.)
Observe that the mantissa bits start in the ones place. However, remember that the mantissa is a fraction, so we've really stored .
The exponent bits, , start in the place.
Essentially, we have the sum of the quantity and the quantity .
Let's define the function that represents this bizarre operation of taking a floating-point representation and interpreting it as an integer:
Link to this section An observation about logarithms
Let's go back to our floating-point representation of :
Now, what happens if we take the logarithm? Let's take the log base 2, since we're working with binary. Evaluating directly would be pretty slow, but let's proceed symbolically to see if we get anywhere useful.
Calculating would be hard. Instead, we can make do with an approximation. Remember that since is the mantissa, it will be greater than 1 but less than 2.
There's a conveninent approximation we can use. Here's a graph in Desmos:
The red curve is , the function we want to approximate. The purple curve is the line , which is already a pretty good approximation. However, the blue curve is even better: , where is tuned for the maximum accuracy around .
Let's continue, using :
Link to this section But what does this have to do with ?
Recall that
We proceed:
Observe the part. Earlier, we found that . Therefore, we can substitute this in:
We can solve for as follows:
And there it is: we have a much faster way to compute the logarithm.
Link to this section Rewrite the problem
Remember, we're trying to find the quantity .
We can use the properties of logarithms to find:
Now, let's try substituting in our fast logarithm:
We can do some manipulation to solve for our result:
Link to this section The magic number
Let's look at this term:
We know everything here ahead of time. Why don't we go through and calculate it?
So there's where that's from.
Anyway, we now have:
This is the gist of how the function works. Let's step through the code now.
Link to this section The Code
Let's go through this code, line by line, to see how it matches up with our mathematical approximation.
float Q_rsqrt( float number ){long i;float x2, y;const float threehalfs = 1.5F;x2 = number * 0.5F;y = number;i = * ( long * ) &y; // evil floating point bit level hackingi = 0x5f3759df - ( i >> 1 ); // what the fuck?y = * ( float * ) &i;y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration// y = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removedreturn y;}
A quick note: the argument to the function is called number
, I'll be calling it for simplicity's sake.
Link to this section Evil Floating Point Bit Level Hacking
Let's look at the first part of the function.
float Q_rsqrt( float number ){long i;float y;...y = number;i = * ( long * ) &y; // evil floating point bit level hacking...}
We start by declaring a long
, which is a 32-bit integer, called i
. Then, we declare a float
, or a floating-point representation number, y
. We store the value of the argument (number
, or ) into y
. Simple enough.
The next line, however, is where things get ugly. Starting from the right, let's go step-by-step.
y
is, of course, our floating-point number.
&y
refers to the reference to y
--the location in computer memory at which y
is stored. &y
is a pointer to a floating-point number.
( long * )
is a cast--it converts a value from one type to another. Here, we're converting &y
from "pointer to a floating-point number" to "pointer to a 32-bit integer". This doesn't modify the bits in y
at all, it modifies the type of the pointer. It tells the compiler that the value at this pointer isn't a float, it's an int.
i = * [...]
will dereference the pointer. It sets i
equal to the value at that pointer. Since the pointer is considered a pointer to a 32-bit integer, and i
is declared as a 32-bit integer, this just sets i
equal to the bits at that location in memory.
Effectively, this part takes the bits in the floating-point representation of the argument (number
) and interprets them as a 32-bit integer instead.
Does that sound familiar? It's our function we defined earlier! These lines could be written as
Or, more concisely:
Link to this section What's with all this memory trickery?
You might ask, why can't we just do this?
i = (long) y;
After all, we just want y
as an integer, right?
However, this expression will actually convert the value that y
represents into an integer. For instance, if , then this code will set . It will convert y
to an integer.
This isn't what we want. We don't want to do any conversion -- we are literally taking the bit representation of y
and interpreting it as an integer instead. Thus, we convert the pointer instead, which doesn't actually modify the bits stored in memory.
You might think that the code looks ugly. That's because it is--casting a pointer from one type to another is considered "undefined behavior," and the C standard does not guarantee what will happen. We're basically tricking the compiler here. This is definitely considered bad practice, which is why it's "Evil Floating Point Bit Level Hacking". Evil because it's relying on undefined behavior, Floating Point because we're working with a floating-point representation, Bit Level because we're directly using the bit representation as an integer, and Hacking because this is most definitely not the way casting is intended to be used.
Link to this section The "WTF" Line
float Q_rsqrt( float number ){...i = 0x5f3759df - ( i >> 1 ); // what the fuck?...}
If you aren't familiar with bitwise operations, the symbol ( i >> 1 )
might seem strange to you.
Think about doing a division problem like . It would be pretty tough, right? You'd probably need to write out the entire long division problem.
On the other hand, think about finding . You can pretty quickly tell that the result is , right? All you had to do was shift all the digits one place to the right.
We can use the same trick in binary. To divide a number by 2, we just need to shift each bit to the right. That is what >> 1
does--it's just a much faster way to divide by two.
Remember, in the last step, we set . Thus, ( i >> 1 )
is .
We then subtract this from our magic number:
Link to this section Finishing up
float Q_rsqrt( float number ){...y = * ( float * ) &i;...}
This looks very similar to the i = * ( long * ) &y;
line we looked at earlier, and that's because it is. However, instead of interpreting the floating-point representation y
as an integer, we're interpreting the integer i
as a floating-point representation. You can think of this as our function.
This step performs . Since the last step set , this effectively sets:
There we go!
Link to this section But wait, there's more: Newton's method
float Q_rsqrt( float number ){...float x2, y;const float threehalfs = 1.5F;x2 = number * 0.5F;...y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration// y = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed...}
What does this do?
This line performs "Newton's method", which is a method of refining an approximation for the root of a function.
Let's define . Notice that when , . Therefore, we can use a root-finding algorithm to try to find this root of , and we'll get back a better approximation for .
Here's a graph that shows how Newton's method works:
Note: since we're working with both arguments to and arguments to , I've decided to stick with using for the latter. Generally, when working with Newton's method, this value would be called .
We have our function in red, and an initial guess in dotted black, and we're trying to find the point at which the function crosses the t-axis. We can draw a line tangent to the function at our initial guess (in green) and then find the intersection of that line with the t-axis to get an even better guess. We can keep doing this until we're happy with the precision.
So, we have our initial guess given by our expression. Let's call this guess . How do we draw a tangent line?
Remember, the point-slope form for a line is given by . Thus, we can just plug in our initial point to get .
To get the slope , we need to take the derivative of the function . Let's do this later--just call it for now.
We're trying to find the point where this tangent line crosses the t-axis, that is, where . Substitute for :
And we've arrived at the formula for Newton's method.
Let's substitute in our and . First, we need to find the derivative:
Now, proceed:
Let's look at that line of code again:
float Q_rsqrt( float number ){...float x2, y;const float threehalfs = 1.5F;x2 = number * 0.5F;...y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration// y = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed...}
Earlier in the code, there is a line that performs . This is just so that doesn't have to be recalculated on each iteration (even though the second iteration was since removed). Note that we multiply by 0.5 instead of dividing by 2 because multiplication is faster than division. Also, we can't do the bit-shifting trick here since is a floating-point number, not an integer.
The Newton's iteration lines do:
which is equivalent to
which is exactly the expression for Newton's method we found earlier.
This line of code can be repeated to get better and better approximations. However, it appears the authors of Quake III decided that only one iteration was necessary, since the second one was removed.
Link to this section The End
float Q_rsqrt( float number ){...return y;}
I'll end with a quote from a relevant xkcd:
Some engineer out there has solved P=NP and it's locked up in an electric eggbeater calibration routine. For every 0x5f375a86 we learn about, there are thousands we never see.
(It looks like the constant Randall mentioned was based on , not .)