Tuesday, June 28, 2011

Screen-Space Directional Occlusion

Now I have the screen-space directional occlusion working well, here's a pic :) The model I found in the standard Max scenes, I wanted some better geometry to test the lighting than my crappy programmer art.
The scene has just one diffuse shadow-casting spotlight, and the SSDO provides all the rest of the colour, using the environment cube. Note the orange shining on the sides of the crazy model, with occlusion occuring in its upper fins. And the same model, with a different environment.

This is going to look great when weather changes roll in, or during sunsets around planets, or going into nebulae out in space..

Wednesday, June 8, 2011

Screen-Space Ambient

Currently, I'm still doing a lot of this using multiple forward-rendering geometry passes, when they could instead be using screen-space techniques. I have done this so far as the performance for what I'm trying is fine, and it aids debugging greatly to be able to easily turn off entire passes.

However since the lighting was added in, with arbitrary numbers of lights (5 shadow-mapped lights now being used), then engine has slowed down greatly. Time to revisit the render channels.

Ambient Lighting is an obvious task here. Currently, I draw the depth/normals map, create the ambient cube map, calculate SSDO, and then - using forward-pass geometry - draw the ambient lighting into the scene, looking up SSDO and the ambient cube map.

However the screen-space data is all there ready to be used - screen space normals and SSDO. Additionally, the screen-space fogging that fades the scene out to the ambient cube map is almost identical to what I want to do. So hell, let's do it!

Adding Screen-space Ambient

1. Add a new RenderChannel and generateSSAmbient() to CameraScene.py
2. Create data/screen_space_ambient.xml - copied from data/fog.xml as the effect is similar
3. Create shaders2/post/ssambient.vs3 and .ps3

NOT SURE WHY : can't blend in the SSDO results in texture stage 3. Just cant seem to be able to read the texture map

4. Remove old ambient lighting path

old way : 3098 / 34134 draw commands 99.1msec
new way : 2564 / 31921 draw commands 89.9msec

Apart from the issue mentioned above, the new screen-space ambient lighting looks exactly like the old forward-render ambient lighting, except the number of draw-calls drops by about 20% and the render time drops by 10%. And there are many more instances of this kind of rendering optimisation just waiting to be done..

Optimising the procedural scene generator

First of all, its crazy doing procedural scene generation and the scene graph in Python. When you're dealing with 1000's of tiles, c++ is a far better match.

However, in keeping with the spirit of things, let's try and optimise the Python procedural scene first, as any high-level optimisations done are still likely to be beneficial after porting (and easier to prototype in Python!)

Step 1 in any optimisation is profiling, to see where we need to optimise. Here's a snippet from my results

47% in RenderChannelStrip.render

ncalls tottime percall cumtime percall filename:lineno(function)
3731 23.980 0.006 23.980 0.006 K:\Pegwars\current\Scripts\PyStingray\RenderChannelStrip.py:47(render)
44150 2.884 0.000 2.884 0.000 d:\Pegwars\current\Scripts\Scene\Object.py:39(bind)
287 2.862 0.010 3.243 0.011 d:\Pegwars\current\Scripts\Frustum\FarPlane.py:13(cull)
2525213 2.594 0.000 2.594 0.000 {method 'append' of 'list' objects}
2009 2.514 0.001 2.542 0.001 d:\Pegwars\current\Scripts\Scene\ObjectPool.py:27(updateObjectList)
1724 2.240 0.001 3.550 0.002 {method 'sort' of 'list' objects}
1723/1436 2.215 0.001 47.822 0.033 d:\Pegwars\current\Scripts\PyStingray\UpdateChannelStrip.py:24(tick)
1435 2.152 0.001 3.939 0.003 d:\Pegwars\current\Scripts\Frustum\Light.py:15(cull)
351862 1.357 0.000 1.357 0.000 d:\Pegwars\current\Scripts\Scene\Object.py:71(tick)

There are two obvious candidates here, UpdateChannelStrip.tick is taking up twice as much time as rendering. And python 'append' and 'sort' of 'list' objects take a long time.

Now Citypieces only need worldobject->tick() called once ever, when they are created, to set up their c++ reference frame. Try a onceOnly updateChannelStrip for these, OR try calling wo->tick() directly in creation. Anyone with logic units should keep on ticking(). Anyone without can probably do without.

And the python ObjectPool is currently using lists for 1000s of objects, and calling del() on objects - linear search! Change this to use set logic.

AFTER - 62% in RenderChannelStrip.render

ncalls tottime percall cumtime percall filename:lineno(function)
4433 27.502 0.006 27.502 0.006 K:\Pegwars\current\Scripts\PyStingray\RenderChannelStrip.py:47(render)
341 3.394 0.010 3.828 0.011 d:\Pegwars\current\Scripts\Frustum\FarPlane.py:13(cull)
2387 2.940 0.001 2.964 0.001 d:\Pegwars\current\Scripts\Scene\ObjectPool.py:27(updateObjectList)
2988137 2.930 0.000 2.930 0.000 {method 'append' of 'list' objects}
1705 2.558 0.002 4.595 0.003 d:\Pegwars\current\Scripts\Frustum\Light.py:15(cull)
43240 1.927 0.000 1.927 0.000 d:\Pegwars\current\Scripts\Scene\Object.py:42(bind)
341 0.511 0.001 0.922 0.003 d:\Pegwars\current\Scripts\Frustum\View.py:11(cull)
1 0.240 0.240 44.535 44.535 main.py:21(runPegwars)
2387 0.205 0.000 2.440 0.001 d:\Pegwars\current\Scripts\Scene\ObjectPool.py:50(callVizDiffCallbacks)
1050 0.201 0.000 0.201 0.000 {method 'symmetric_difference' of 'set' objects}
2283 0.169 0.000 0.169 0.000 {method 'union' of 'set' objects}

Fantastic, render now takes up 62% of the time, instead of only 47%. We aren't calling tick() on 1000's of static objects, and when this is ported to C++, the wants_tick facility gets ported too.

And the need to set based ObjectPools has been noted - see the 'symettric_difference' of 'set' objects now takes a negligible amount of time - removing the need to erase from linear lists.

Let's go on..

The next standout is the python FarPlane.py (cull). This puts objects from within the farplane into a Scene's object pool. It's currently done in python. This is a necessary function, but for now, since we are moving tile pieces, we don't need to use this for this particular instance. A gross hack.

AFTER - 66% in RenderChannelStrip.render

ncalls tottime percall cumtime percall filename:lineno(function)
4524 30.175 0.007 30.175 0.007 K:\Pegwars\current\Scripts\PyStingray\RenderChannelStrip.py:47(render)
2436 3.084 0.001 3.110 0.001 d:\Pegwars\current\Scripts\Scene\ObjectPool.py:27(updateObjectList)
41420 2.667 0.000 2.667 0.000 d:\Pegwars\current\Scripts\Scene\Object.py:42(bind)
2617790 2.667 0.000 2.667 0.000 {method 'append' of 'list' objects}
1740 2.626 0.002 4.793 0.003 d:\Pegwars\current\Scripts\Frustum\Light.py:15(cull)
349 0.568 0.002 30.889 0.089 d:\Pegwars\current\Scripts\Modules\DogFight.py:240(render)
348 0.535 0.002 0.979 0.003 d:\Pegwars\current\Scripts\Frustum\View.py:11(cull)
1 0.481 0.481 45.822 45.822 main.py:21(runPegwars)

So finally we've gone from the render using 47% of the time, to render using 66% of the time, all from easy optimisations. And the python procedural scene now has no obvious glitches when moving around a 2k x 2k map, meaning I can delay porting to C++ a little while longer!