# Fast Inverse Square Root

The `invSqrt`

method is defined as the inverse of the square root of a value :

function invSqrt( v : Float ) { return 1.0 / Math.sqrt(v); }

It's used in a lot of different graphics and sound applications. For instance, if you want to calculate the unit-vector between two points, you can do the following :

function distanceVector( pa : Point, pb : Point, v : Point ) { var dx = pb.x - pa.x; var dy = pb.y - pa.y; var dist = Math.sqrt(dx*dx + dy*dy); v.x = dx / dist; v.y = dy / dist; }

In that case, we are doing two float divides, which are expensive operations, so we can optimize the function by using two *multiply* and a single division :

function distanceVector( pa : Point, pb : Point, v : Point ) { var dx = pb.x - pa.x; var dy = pb.y - pa.y; var idist = invSqrt(dx*dx + dy*dy); v.x = dx * idist; v.y = dy * idist; }

Of course, calculating the inverse square root is quite an expensive operation as well. See for instance the following benchmarks results for variable Number operations on Flash Player 10 :

+ 1.3 * 1.5 / 10.9 sqrt 76.7 invSqrt 92.2

# Quake Fast Inverse Square Root

A lot of research has been done to optimize the `invSqrt`

calculus. One of the most famous was discovered in Quake source code and is commonly refered as "fast inverse square root". It's written in C language as the following function :

float InvSqrt( float x ) { float xhalf = 0.5f*x; int i = *(int*)&x; i = 0x5f3759df - (i>>1); x = *(float*)&i; x = x*(1.5f - xhalf*x*x); return x; }

I'll not enter into technical details on how and why this is working, but you will notice two operations that are converting float-to-int and int-to-float using pointer calculus. This is used to get the internal representation of the float-bits as an integer, and vice-versa.

In Flash, you can do that by using a `ByteArray`

:

- write a Number by using the
`writeFloat()`

method - reset the
`position`

to 0 - use
`readInt()`

to read the float bits

Sadly, calling these functions is very slow, so using a ByteArray to implement `invSqrt`

is actually slower than using `1.0 / Math.sqrt`

...

# Haxe Optimizations

When you need to do a lot of `Math`

calculus in a loop, Haxe can really be helpful since it allows you to store the `Math`

class into a local variable outside of the loop :

var m = Math; for( p in array ) p.invDist = 1.0 / m.sqrt(p.x*p.x + p.y*p.y);

By doing this, you are *caching* the `Math`

class into a register, which increases performances by about 60% already !

But that's not all. Thanks to the `flash.Memory`

API and Flash Player 10 Alchemy Opcodes (see previous post about it) we can implement the Quake `invSqrt`

:

public function prepare() { var b = new flash.utils.ByteArray(); b.length = 1024; flash.Memory.select(b); } public function invSqrt( x : Float ) : Float { var half = 0.5 * x; flash.Memory.setFloat(0,x); var i = flash.Memory.getI32(0); i = 0x5f3759df - (i>>1); flash.Memory.setI32(0,i); x = flash.Memory.getFloat(0); x = x * (1.5 - half*x*x); return x; }

Here are the benchmarks results :

- classic
`invSqrt`

: 92.21 - optimized
`invSqrt`

(Math in a local variable) : 56.71 (+62%) - Memory
`invSqrt`

(using Quake algorithm) : 11.72 (+686% , or around 8 times faster !)

You'll notice that by using this, the inverse-square-root takes around the same time as as Float division ! Again very good news for haxe3D, physaXe and all other Haxe Flash applications that will use this trick !

Wow, nice trick. I will try to understand the math behind it.

love this trick. Magic.

here's the magic explaination:

http://betterexplained.com/articles/understanding-quakes-fast-inverse-square-root/

..better do n = d + d than n = d * 2, wow! :)

how about substraction timing ??

The paper linked from there, http://www.lomont.org/Math/Papers/2003/InvSqrt.pdf, suggests that there are better "magic" numbers; the one that author has found was 0x5f375a86

Ha! Thanks for the links.

Why they don't teach us these things on the college? =P

Wow, that's cool.... I'm one of the core-devs of Papervision3D, so alas can't use it... Couldn't resist checking how Alchemy itself performs on it. The swc is here: http://www.assembla.com/spaces/floorplanner-alchemy/documents

As to be expected its very slow... So slow in fact that its unusable (like a 1000 times slower then 1/Math.sqrt()).

Could you maybe review invsqrt.c to see if something's wrong with the code itself? (apart from some weird double-to-float casts it seems ok to me).

Keep up the good work, its wild!

Oops, that was a bad link :

http://www.assembla.com/spaces/floorplanner-alchemy/documents/dZ88ruUByr3Anoab7jnrAJ/download/invsqrt.c

:-) Think its in the category 'how *not* use Alchemy.

[Sorry, double post, my brain is missing]

Kind of reminds me of the optimizations for Sin/Cos that can be seen at http://lab.polygonal.de/2007/07/18/fast-and-accurate-sinecosine-approximation/ and only seem easy to implement in Haxe (because of it's inline functions). I'm so close to biting the bullet now and making the jump ... very soon.

Keep up the genius work!

"I'm one of the core-devs of Papervision3D, so alas can't use it"

Ahum. Statement withdrawn.

Why don't I use it?

Sorry for spamming your comments (got carried away by my old time fav code snip).

Very impressive stuff. Without really understanding the quake algorithm, I'll just say I'm thrilled by the benchmarks.

You can also check out the following for more optimisation fun (specifically the "faster math functions 2003"):

http://www.research.scea.com/research/research.html

Very impressive. Makes me eager to try and port this http://wonderwhy-er.deviantart.com/art/Collision-Sparks-99408656 on Haxe and compare performance gap :)

Good luck with your experiments.

Any plans to incorporate this into the haxe std lib?

1 / invsqrt(x) using this quake algorithm faster than Math.sqrt()?

Nihil: yes, see the measurements at the top of the post.

Those Quake guys are my heroes.

Hah, now 0x5f375a86 got a mention here : http://xkcd.com/664/.

I learn so much crap from xkcd. Does he know everything or what?

That's cool !

I knew about (i>>1) for i *= 2 operation. But what about float points? Does somebody know format of Flash floating point variables?

Where is mantissa? And where is the power?