April 2014
Mo Tu We Th Fr Sa Su
<< >>


Fast Inverse Square Root

Posted on Nov 22 2008

The invSqrt method is defined as the inverse of the square root of a value :

function invSqrt( v : Float ) {
    return 1.0 / Math.sqrt(v);

It's used in a lot of different graphics and sound applications. For instance, if you want to calculate the unit-vector between two points, you can do the following :

function distanceVector( pa : Point, pb : Point, v : Point ) {
    var dx = pb.x - pa.x;
    var dy = pb.y - pa.y;
    var dist = Math.sqrt(dx*dx + dy*dy);
    v.x = dx / dist;
    v.y = dy / dist;

In that case, we are doing two float divides, which are expensive operations, so we can optimize the function by using two multiply and a single division :

function distanceVector( pa : Point, pb : Point, v : Point ) {
    var dx = pb.x - pa.x;
    var dy = pb.y - pa.y;
    var idist = invSqrt(dx*dx + dy*dy);
    v.x = dx * idist;
    v.y = dy * idist;

Of course, calculating the inverse square root is quite an expensive operation as well. See for instance the following benchmarks results for variable Number operations on Flash Player 10 :

+        1.3
*        1.5
/        10.9
sqrt     76.7
invSqrt  92.2

Quake Fast Inverse Square Root

A lot of research has been done to optimize the invSqrt calculus. One of the most famous was discovered in Quake source code and is commonly refered as "fast inverse square root". It's written in C language as the following function :

float InvSqrt( float x ) {
    float xhalf = 0.5f*x;
    int i = *(int*)&x;
    i = 0x5f3759df - (i>>1);
    x = *(float*)&i;
    x = x*(1.5f - xhalf*x*x);
    return x;

I'll not enter into technical details on how and why this is working, but you will notice two operations that are converting float-to-int and int-to-float using pointer calculus. This is used to get the internal representation of the float-bits as an integer, and vice-versa.

In Flash, you can do that by using a ByteArray :

  • write a Number by using the writeFloat() method
  • reset the position to 0
  • use readInt() to read the float bits

Sadly, calling these functions is very slow, so using a ByteArray to implement invSqrt is actually slower than using 1.0 / Math.sqrt...

Haxe Optimizations

When you need to do a lot of Math calculus in a loop, Haxe can really be helpful since it allows you to store the Math class into a local variable outside of the loop :

var m = Math;
for( p in array )
     p.invDist = 1.0 / m.sqrt(p.x*p.x + p.y*p.y);

By doing this, you are caching the Math class into a register, which increases performances by about 60% already !

But that's not all. Thanks to the flash.Memory API and Flash Player 10 Alchemy Opcodes (see previous post about it) we can implement the Quake invSqrt :

public function prepare() {
    var b = new flash.utils.ByteArray();
    b.length = 1024;;

public function invSqrt( x : Float ) : Float {
    var half = 0.5 * x;
    var i = flash.Memory.getI32(0);
    i = 0x5f3759df - (i>>1);
    x = flash.Memory.getFloat(0);
    x = x * (1.5 - half*x*x);
    return x;

Here are the benchmarks results :

  • classic invSqrt : 92.21
  • optimized invSqrt (Math in a local variable) : 56.71 (+62%)
  • Memory invSqrt (using Quake algorithm) : 11.72 (+686% , or around 8 times faster !)

You'll notice that by using this, the inverse-square-root takes around the same time as as Float division ! Again very good news for haxe3D, physaXe and all other Haxe Flash applications that will use this trick !

  • Ian Liu
    Nov 22, 2008 at 16:42

    Wow, nice trick. I will try to understand the math behind it.

  • laurent
    Nov 22, 2008 at 20:00

    love this trick. Magic.
    here's the magic explaination:

    ..better do n = d + d than n = d * 2, wow! :)
    how about substraction timing ??

  • Nov 22, 2008 at 20:18

    The paper linked from there,, suggests that there are better "magic" numbers; the one that author has found was 0x5f375a86

  • Ian Liu
    Nov 23, 2008 at 03:00

    Ha! Thanks for the links.
    Why they don't teach us these things on the college? =P

  • Tim Knip
    Nov 24, 2008 at 00:44

    Wow, that's cool.... I'm one of the core-devs of Papervision3D, so alas can't use it... Couldn't resist checking how Alchemy itself performs on it. The swc is here:

    As to be expected its very slow... So slow in fact that its unusable (like a 1000 times slower then 1/Math.sqrt()).

    Could you maybe review invsqrt.c to see if something's wrong with the code itself? (apart from some weird double-to-float casts it seems ok to me).

    Keep up the good work, its wild!

  • Tim Knip
    Nov 24, 2008 at 00:47

    Oops, that was a bad link :

    :-) Think its in the category 'how *not* use Alchemy.

  • Nov 24, 2008 at 04:33

    [Sorry, double post, my brain is missing]

    Kind of reminds me of the optimizations for Sin/Cos that can be seen at and only seem easy to implement in Haxe (because of it's inline functions). I'm so close to biting the bullet now and making the jump ... very soon.

    Keep up the genius work!

  • Tim
    Nov 24, 2008 at 07:11

    "I'm one of the core-devs of Papervision3D, so alas can't use it"

    Ahum. Statement withdrawn.
    Why don't I use it?

    Sorry for spamming your comments (got carried away by my old time fav code snip).

  • Nov 24, 2008 at 19:10

    Very impressive stuff. Without really understanding the quake algorithm, I'll just say I'm thrilled by the benchmarks.

  • Mike
    Nov 27, 2008 at 06:28

    You can also check out the following for more optimisation fun (specifically the "faster math functions 2003"):

  • Dec 04, 2008 at 23:55

    Very impressive. Makes me eager to try and port this on Haxe and compare performance gap :)

    Good luck with your experiments.

  • Jan 26, 2009 at 17:40

    Any plans to incorporate this into the haxe std lib?

  • Nihil
    Jun 14, 2009 at 16:17

    1 / invsqrt(x) using this quake algorithm faster than Math.sqrt()?

  • Gustavs
    Jun 16, 2009 at 10:53

    Nihil: yes, see the measurements at the top of the post.

  • Dec 19, 2009 at 12:57

    Those Quake guys are my heroes.

  • Dec 23, 2009 at 04:39

    Hah, now 0x5f375a86 got a mention here :

    I learn so much crap from xkcd. Does he know everything or what?

  • Volodymyr
    Feb 14, 2011 at 18:46

    That's cool !

    I knew about (i>>1) for i *= 2 operation. But what about float points? Does somebody know format of Flash floating point variables?

    Where is mantissa? And where is the power?

Name : Email : Website : Message :