Tuesday, June 28, 2011

Screen-Space Directional Occlusion

Now I have the screen-space directional occlusion working well, here's a pic :) The model I found in the standard Max scenes, I wanted some better geometry to test the lighting than my crappy programmer art.
The scene has just one diffuse shadow-casting spotlight, and the SSDO provides all the rest of the colour, using the environment cube. Note the orange shining on the sides of the crazy model, with occlusion occuring in its upper fins. And the same model, with a different environment.

This is going to look great when weather changes roll in, or during sunsets around planets, or going into nebulae out in space..

Wednesday, June 8, 2011

Screen-Space Ambient

Currently, I'm still doing a lot of this using multiple forward-rendering geometry passes, when they could instead be using screen-space techniques. I have done this so far as the performance for what I'm trying is fine, and it aids debugging greatly to be able to easily turn off entire passes.

However since the lighting was added in, with arbitrary numbers of lights (5 shadow-mapped lights now being used), then engine has slowed down greatly. Time to revisit the render channels.

Ambient Lighting is an obvious task here. Currently, I draw the depth/normals map, create the ambient cube map, calculate SSDO, and then - using forward-pass geometry - draw the ambient lighting into the scene, looking up SSDO and the ambient cube map.

However the screen-space data is all there ready to be used - screen space normals and SSDO. Additionally, the screen-space fogging that fades the scene out to the ambient cube map is almost identical to what I want to do. So hell, let's do it!

Adding Screen-space Ambient

1. Add a new RenderChannel and generateSSAmbient() to CameraScene.py
2. Create data/screen_space_ambient.xml - copied from data/fog.xml as the effect is similar
3. Create shaders2/post/ssambient.vs3 and .ps3

NOT SURE WHY : can't blend in the SSDO results in texture stage 3. Just cant seem to be able to read the texture map

4. Remove old ambient lighting path

old way : 3098 / 34134 draw commands 99.1msec
new way : 2564 / 31921 draw commands 89.9msec

Apart from the issue mentioned above, the new screen-space ambient lighting looks exactly like the old forward-render ambient lighting, except the number of draw-calls drops by about 20% and the render time drops by 10%. And there are many more instances of this kind of rendering optimisation just waiting to be done..

Optimising the procedural scene generator

First of all, its crazy doing procedural scene generation and the scene graph in Python. When you're dealing with 1000's of tiles, c++ is a far better match.

However, in keeping with the spirit of things, let's try and optimise the Python procedural scene first, as any high-level optimisations done are still likely to be beneficial after porting (and easier to prototype in Python!)

Step 1 in any optimisation is profiling, to see where we need to optimise. Here's a snippet from my results

47% in RenderChannelStrip.render

ncalls tottime percall cumtime percall filename:lineno(function)
3731 23.980 0.006 23.980 0.006 K:\Pegwars\current\Scripts\PyStingray\RenderChannelStrip.py:47(render)
44150 2.884 0.000 2.884 0.000 d:\Pegwars\current\Scripts\Scene\Object.py:39(bind)
287 2.862 0.010 3.243 0.011 d:\Pegwars\current\Scripts\Frustum\FarPlane.py:13(cull)
2525213 2.594 0.000 2.594 0.000 {method 'append' of 'list' objects}
2009 2.514 0.001 2.542 0.001 d:\Pegwars\current\Scripts\Scene\ObjectPool.py:27(updateObjectList)
1724 2.240 0.001 3.550 0.002 {method 'sort' of 'list' objects}
1723/1436 2.215 0.001 47.822 0.033 d:\Pegwars\current\Scripts\PyStingray\UpdateChannelStrip.py:24(tick)
1435 2.152 0.001 3.939 0.003 d:\Pegwars\current\Scripts\Frustum\Light.py:15(cull)
351862 1.357 0.000 1.357 0.000 d:\Pegwars\current\Scripts\Scene\Object.py:71(tick)

There are two obvious candidates here, UpdateChannelStrip.tick is taking up twice as much time as rendering. And python 'append' and 'sort' of 'list' objects take a long time.

Now Citypieces only need worldobject->tick() called once ever, when they are created, to set up their c++ reference frame. Try a onceOnly updateChannelStrip for these, OR try calling wo->tick() directly in creation. Anyone with logic units should keep on ticking(). Anyone without can probably do without.

And the python ObjectPool is currently using lists for 1000s of objects, and calling del() on objects - linear search! Change this to use set logic.

AFTER - 62% in RenderChannelStrip.render

ncalls tottime percall cumtime percall filename:lineno(function)
4433 27.502 0.006 27.502 0.006 K:\Pegwars\current\Scripts\PyStingray\RenderChannelStrip.py:47(render)
341 3.394 0.010 3.828 0.011 d:\Pegwars\current\Scripts\Frustum\FarPlane.py:13(cull)
2387 2.940 0.001 2.964 0.001 d:\Pegwars\current\Scripts\Scene\ObjectPool.py:27(updateObjectList)
2988137 2.930 0.000 2.930 0.000 {method 'append' of 'list' objects}
1705 2.558 0.002 4.595 0.003 d:\Pegwars\current\Scripts\Frustum\Light.py:15(cull)
43240 1.927 0.000 1.927 0.000 d:\Pegwars\current\Scripts\Scene\Object.py:42(bind)
341 0.511 0.001 0.922 0.003 d:\Pegwars\current\Scripts\Frustum\View.py:11(cull)
1 0.240 0.240 44.535 44.535 main.py:21(runPegwars)
2387 0.205 0.000 2.440 0.001 d:\Pegwars\current\Scripts\Scene\ObjectPool.py:50(callVizDiffCallbacks)
1050 0.201 0.000 0.201 0.000 {method 'symmetric_difference' of 'set' objects}
2283 0.169 0.000 0.169 0.000 {method 'union' of 'set' objects}

Fantastic, render now takes up 62% of the time, instead of only 47%. We aren't calling tick() on 1000's of static objects, and when this is ported to C++, the wants_tick facility gets ported too.

And the need to set based ObjectPools has been noted - see the 'symettric_difference' of 'set' objects now takes a negligible amount of time - removing the need to erase from linear lists.

Let's go on..

The next standout is the python FarPlane.py (cull). This puts objects from within the farplane into a Scene's object pool. It's currently done in python. This is a necessary function, but for now, since we are moving tile pieces, we don't need to use this for this particular instance. A gross hack.

AFTER - 66% in RenderChannelStrip.render

ncalls tottime percall cumtime percall filename:lineno(function)
4524 30.175 0.007 30.175 0.007 K:\Pegwars\current\Scripts\PyStingray\RenderChannelStrip.py:47(render)
2436 3.084 0.001 3.110 0.001 d:\Pegwars\current\Scripts\Scene\ObjectPool.py:27(updateObjectList)
41420 2.667 0.000 2.667 0.000 d:\Pegwars\current\Scripts\Scene\Object.py:42(bind)
2617790 2.667 0.000 2.667 0.000 {method 'append' of 'list' objects}
1740 2.626 0.002 4.793 0.003 d:\Pegwars\current\Scripts\Frustum\Light.py:15(cull)
349 0.568 0.002 30.889 0.089 d:\Pegwars\current\Scripts\Modules\DogFight.py:240(render)
348 0.535 0.002 0.979 0.003 d:\Pegwars\current\Scripts\Frustum\View.py:11(cull)
1 0.481 0.481 45.822 45.822 main.py:21(runPegwars)

So finally we've gone from the render using 47% of the time, to render using 66% of the time, all from easy optimisations. And the python procedural scene now has no obvious glitches when moving around a 2k x 2k map, meaning I can delay porting to C++ a little while longer!

Wednesday, May 4, 2011

Pegwars in Darwin..

Been a while since the last Pegwars effort, since then I've packed up shop and moved to Darwin! Also since then I've been getting into Android programming. One of my Android apps involves porting some of the Stingray engine to Java and OpenGL, which was FUN. However since dry season hit, and my cool 3D rain app can no longer be worked on (no way to test it lol) I've moved back onto Pegwars programming.

First step was reading my codebase and understanding it again. To do this, I followed through my DogFight and SpaceFlight python modules, and drilled down into the how the Scene works, with n cameras and lights and render channels and Scene vizCallbacks etc. Very happy once I'd figured it out, to remember that I had come a long way. With hindsight and a fresh perspective, I jumped in and fixed a few long-standing bugs.

Then to test out my knowledge of the renderer, I launched in and implemented both depth-of-field scene blurring and materials that reflect the dynamic skybox, and got these new features done in a single night. That was an awesome feeling, I was really happy to get 2 huge features working in such a short amount of time. It required no new architecture work, just straight implementation using existing paradigms. I'm so happy right now that the rendering engine I made last year is so flexible and that I had come so far already.

I wrote up a how-to on how to add new Post-Processing stages.

How to add a Post-Processing stage
Each CameraScene owns a "post" render channel strip.
Currently each Camera Scene has the same python code and same post setup.

1. Add a Render Channel to CameraScene.post render channel strip. e.g. rcs.add( Stingray.RenderChannel(), "DepthOfField", 9.5 )

2. Need a list of Draw Commands to render your effect. Do this via XML, and load in python - see Scripts/Post
self.dofCommands = []
path = "data/depthOfField.xml"
d = Stingray.root().open( path )
for name in d.children().keys():
dc = Stingray.DrawCommandClassFactory( path + "/" + name, None )
self.postChannelStrip.strip.addDrawCommand( "DepthOfField", dc )
3. The SinglePassFilter DrawCommand takes 4 source textures and a destination texture in XML, e.g.
4. Create custom shaders for the steps.

Pixel Shader Post Processing Constants
//from ShaderConstants::setStandardPostProcessConstants()
float4x4 view : register(c0);
float4x4 proj : register(c4);
float4x4 viewProj : register(c8);
float4x4 invViewProj : register(c12);
float4x4 invView : register(c16);
float4 user1 : register(c20); //user constant 1
float4 user2 : register(c21); //user constant 2
float4 user3 : register(c22); //user constant 3
float4 user4 : register(c23); //user constant 4
float4 lut[64] : register(c24);
5. Assign python control over shader constants

In PyStingray/ShaderConstants.py
register( "dof", "nearZ", "shaders2/post/dof_lens.pso", 20, "near Z for depth of field camera lens" )

Then based on the scene knowledge, I added something I always wanted to, image-based procedural world maps. This is really cool, I now have a 2Kx2K image, where each pixel creates a 250m. x 250m. tile of world, such as city pieces, water pieces, suburbs, forest, etc. This image is procedural, using the Python Imaging Library, I have a layer-based procedural image compositing system. In the end, I'm going to use this to generate planets, and allow players to build hand-made cities and do terraforming.

Got to go now, will post screenshots up later but I need to finish optimising the world tile streaming, currently it's pretty jerky when moving around.

Sunday, April 17, 2011

One year on..

Wooahh, one year on, and to think I was just really gaining momentum.. Guess moving to a different city and the upswell of the Android app store and promise of $$$ is a fair enough distraction.

So today marks the occasion where I've taken up Pegwars again, and with the awesome benefit of a huge pause and the hindsight that affords, I fixed 3 long-standing bugs that were sucking the gumption out of me

1. The python console that every so often went haywire is fixed. Solid.
2. The blooming that flickered in and out annoying, fixed. Solid.
3. Can now fly between solar systems without the weird rendering issue; Spaceflight App module now cleans its solar system up properly. This unlocks the door to the entire Pegwars game

I've come to this fresh now and am really happy with the progress I made. There were some gumption traps present in some bugs, but the design is solid and I feel on the cusp of some pretty fast progress now.