# Calendar

August 2020
Mo Tu We Th Fr Sa Su
<< >>
12
3456789
10111213141516
17181920212223
24252627282930
31

# Langs

Posted on Oct 22 2011

A few months ago I spent some time trying to run perlin noise on full GPU with a hardware shader.

Of course, thanks to HxSL I didn't have to write it in assembler.

Before getting into code details, let's have a small demo :

Here's the final shader code in HxSL :

```var input : {
pos : Float3,
};

var perlinPos : Float3;

function vertex(delta : Float3, scale : Float) {
out = pos.xyzw;
perlinPos = ([(pos.x + 1) * 0.5, 1 - (pos.y + 1) * 0.5, 0] + delta) * scale;
}

function gradperm( g : Texture, v : Float, pp : Float3 ) {
return (g.get(v,single,nearest,wrap).xyz * 2 - 1).dot(pp);
}

function lerp( x : Float, y : Float, v : Float ) {
return x * (1 - v) + y * v;
}

function fade( t : Float3 ) : Float3 {
return t * t * t * (t * (t * 6 - 15) + 10);
}

function gradient( permut : Texture, g : Texture, pos : Float3 ) {
var p = pos.frc();
var i = pos - p;
var one = 1 / 256;
i *= one;
var a = permut.get(i.xy, nearest, wrap) + i.z;
return lerp(
lerp(
lerp( gradperm(g, a.x, p), gradperm(g, a.z, p + [ -1, 0, 0] ), f.x),
lerp( gradperm(g, a.y, p + [0, -1, 0] ), gradperm(g, a.w, p + [ -1, -1, 0] ), f.x),
f.y
),
lerp(
lerp( gradperm(g, a.x + one, p + [0, 0, -1] ), gradperm(g, a.z + one, p + [ -1, 0, -1] ), f.x),
lerp( gradperm(g, a.y + one, p + [0, -1, -1] ), gradperm(g, a.w + one, p + [ -1, -1, -1] ), f.x),
f.y
),
f.z
);
}

function fragment( permut : Texture, g : Texture) {
var pos = perlinPos;
var tot = 0;
var per = 1.0;
for( k in 0...2 ) {
tot += gradient(permut, g, pos) * per;
per *= 0.5;
pos *= 2;
}
var n = (tot + 1) * 0.5;
out = [n, n, n, 1];
}```

Sadly, because of Flash11 limitation to 200 opcodes in shaders, we can only perform two octaves (which are done through for loop unrolling).

You'll notice that two textures are used for this shader : the `permut` texture and the `g` texture.

They correspond to the two main operations done by perlin noise : for each point (x,y) the perlin noise will calculate the integer part (ix,iy). It will then use a lookup in the `permut` texture to perform some kind of pseudo-randomization.

So this way the (ix,iy) integer coordinates will give us a random index into a gradient value, which is the gradient at the top-left corner of the "box". Now what's left is only to get the other 3 gradients values at the other box corners (which are at (ix+1,iy) (ix,iy+1) and (ix+1,iy+1)).

The algorithm will then interpolate between these 4 corner values : if we are at the top-left corner, it means that we should reach the top-left gradient value, and so on... The interpolation is not linear but uses a `fade` function which will smooth transitions between gradients.

One other issue with Flash11 is the lack of Float textures : the gradient values are usually between [-1 and 1] and all three values needs to represent a unit-sized 3D vector. Because we have to encode the gradients into RGB texture color space which is between [0...1] with only 1/256 precision, it is needed to perform two additional operations per gradient lookup which is (g * 2 - 1). Normalization of the gradient after that is not really needed since the precision is good-enough in most cases.

Another important part of the code is the one that builds both textures :

```static var PTBL = flash.Vector.ofArray([ 151, 160, 137, 91, 90, 15,
131,13,201,95,96,53,194,233,7,225,140,36,103,30,69,142,8,99,37,240,21,10,23,
190, 6,148,247,120,234,75,0,26,197,62,94,252,219,203,117,35,11,32,57,177,33,
88,237,149,56,87,174,20,125,136,171,168, 68,175,74,165,71,134,139,48,27,166,
77,146,158,231,83,111,229,122,60,211,133,230,220,105,92,41,55,46,245,40,244,
102,143,54, 65,25,63,161, 1,216,80,73,209,76,132,187,208, 89,18,169,200,196,
135,130,116,188,159,86,164,100,109,198,173,186, 3,64,52,217,226,250,124,123,
5,202,38,147,118,126,255,82,85,212,207,206,59,227,47,16,58,17,182,189,28,42,
223,183,170,213,119,248,152, 2,44,154,163, 70,221,153,101,155,167, 43,172,9,
129,22,39,253, 19,98,108,110,79,113,224,232,178,185, 112,104,218,246,97,228,
251,34,242,193,238,210,144,12,191,179,162,241, 81,51,145,235,249,14,239,107,
49,192,214, 31,181,199,106,157,184, 84,204,176,115,121,50,45,127, 4,150,254,
138,236,205,93,222,114,67,29,24,72,243,141,128,195,78,66,215,61,156,180
]);

1,1,0,
-1,1,0,
1,-1,0,
-1,-1,0,
1,0,1,
-1,0,1,
1,0,-1,
-1,0,-1,
0,1,1,
0,-1,1,
0,1,-1,
0,-1,-1,
1,1,0,
0,-1,1,
-1,1,0,
0,-1,-1,
];

inline function perm( x : Int ) {
return ptbl[x & 0xFF];
}

function initPermut() {
var bytes = new flash.utils.ByteArray();
bytes.length = 256 * 256 * 4;
flash.Memory.select(bytes);
var out = 0;
for( y in 0...256 )
for( x in 0...256 ) {
var a = perm(x) + y;
var aa = perm(a);
var ab = perm(a + 1);
var b = perm(x + 1) + y;
var ba = perm(b);
var bb = perm(b + 1);
flash.Memory.setByte(out++, ba); // B
flash.Memory.setByte(out++, ab); // G
flash.Memory.setByte(out++, aa); // R
flash.Memory.setByte(out++, bb); // A
}
var t = c.createTexture(256, 256, TextureFormat.BGRA, false);
return t;
}

var bytes = new flash.utils.ByteArray();
for( x in 0...256 ) {
var p = (perm(x) & 15) * 3;
var g = GRAD[p + 1];
var b = GRAD[p + 2];
bytes.writeByte(Std.int((b + 1) * 127.5));
bytes.writeByte(Std.int((g + 1) * 127.5));
bytes.writeByte(Std.int((r + 1) * 127.5));
bytes.writeByte(255); // A
}
var t = c.createTexture(256, 1, TextureFormat.BGRA, false);
return t;
}```

The PTBL is the classic perlin noise permutation table, everything else is just precomputing some perlin noise operations in order to make sure that is can be performed with less texture lookups.

Now all you have to do is to build a quad covering a part of the screen, initialize it with a `delta` corresponding to the perlin scroll and a `scale` value.

• Jon Trausti
Nov 26, 2011 at 20:54

Awesome job! It would be really interesting if it was possible to get more than 6 octaves on the GPU. I'm creating a planet renderer and making the noise on the CPU is very intense, so it would be really neat to have the normal map created on the GPU for nice detail.

Do you know if there are any plans to increase the opcode limit?

Thanks!

• Nov 27, 2011 at 23:18

@Jon : reaching 3 octaves would require already some optimization. I guess 6 is out of question given the current limitations. This would be possible with multiple passes but there's not Float textures output in Flash either... You can generate several textures and compose them together at runtime but you're losing a lot of precision as well.

• Jon Trausti
Nov 28, 2011 at 11:54

Ahh I see. I really hope that they change the limitation. I've been wanting to compose different octaves at runtime but I wasn't aware of this precision loss, so that's a good warning that I shouldn't try it :)

You can try my latest planet rendering at http://icy.ice.is/temp/newplanet but it's still in early development and needs a lot of tweaking, especially the LOD. You can enter the ship by pressing E and control it with the moues (you need to click and hold on the screen to rotate/pitch).

Hopefully one day I can generate 6+ octave perlin noise on the GPU to make normal map for the terrain.

Thank you.

• Nov 29, 2011 at 21:05

@Jon : looks great ;) the LOD transitions indeed needs some tweaking but that's a good start. Which algorithm are you using to render the atmosphere btw ?

• Jon Trausti
Nov 30, 2011 at 12:14

Thanks. Yes, I'll have to increase the LOD where the slope difference is above some x threshold.

I'm using an atmosphere shader that I found on: