🎉 Celebrating 25 Years of GameDev.net! 🎉

Not many can claim 25 years on the Internet! Join us in celebrating this milestone. Learn more about our history, and thank you for being a part of our community!

Tile based deferred shading via OpenCL

Started by
0 comments, last by GameDev.net 11 years, 7 months ago
Overview
This technique implements tile based deferred shading using OpenCL for the lighting stage.
In the image above you can see 1024 point lights combined with diffuse albedo. A skybox was used to illustrate the background, which is reflected using environment mapping on a spere. The human figure that you can see in the middle is only used for showcasing purposes. The original can be downloaded from here:
http://thefree3dmode...ed_i/14-1-0-564

Motivation & other implementations
I was very impressed by the 2010 Siggraph presentation by Intel, in which they showcased the same technique using DirectX 11 compute shaders:
http://software.inte...ring-pipelines/
Unfortunately the demo was implemented in DirectX so it only runs on Windows, so I decided to implement a cross platform version of it in OpenCL (and OpenGL). My version runs on Linux, Windows and possibly on Mac too if I had a Mac and ported it.
Frostbite 2 by Dice also implements this technique via DirectX 11 compute shaders.
http://www.frostbite2.com/?page_id=50
I also found a similar, but slower implementation of this technique on Mac:
http://www.idevgames...hread-8649.html

Implementation details
The implementation contains two types of light attenuations: one linear and one full (constant, linear, quadratic) and uses the Blinn-Phong lighting model. The scene contains 1024 point lights, but spot lights and directional lights could be easily implemented as well.
The G-buffer consists of:
RGBA for diffuse albedo (the alpha channel is free currently but could be used for storing per pixel specular intensity)
RG16F for normals that are encoded using spheremap projection:
http://aras-p.info/t...malStorage.html
R32F for linear depth, because OpenCL doesn't support reading from hardware depth buffer yet:
http://mynameismjp.w...n-from-depth-3/
That is total 96 bits per pixel, which is about 11 MB in 720p and 25MB in 1080p. Possibly an additional buffer will be added for storing further material properties.

The OpenCL kernel outputs the result into a RGBA16F buffer. This is needed since OpenCL doesn't support RGB texture format yet. A possible optimization could be to output the result into a RG16F containing the red and the green channels, and a R16F buffer for the blue channel.
This is needed because using floating point buffers enable the usage of gamma correct effects (such as bloom, HDR, tone mapping etc.)

About the performance
I ran it on a AMD Radeon HD 5670 GDDR3 1GB and it did around 40-50 FPS on average.
On a AMD Radeon HD 5770 GDDR5 1GB, which has twice the power it ran around 80-100 FPS.
The Intel implementation performed around 140-160 on the 5770, and 60-80 on the 5670. [s]But note that my implementation uses deferred shading (+at least 32 bits per pixel!), while Intel's uses deferred lighting.[/s] Intel's implementation uses deferred shading too, but still optimization could add a lot to the performance. To add my implementation is currently not optimized yet.

Why this technique?
So why did I choose this implementation over the traditional deferred lighting techniques?
First of all deferred shading becomes eventually faster when scene complexity reaches a critical level when it becomes a bottleneck to render it twice. Secondly the traditional quad or geometry based deferred lighting techniques aren't very efficient due to they have to use several passes to calculate the lighting. Also with this technique no light geometry is needed to be actually rendered. To add since almost every modern game engine contains a view frustum culling technique, performance can be even further increased by pre-culling lights against the view frustum, so less lights need to be uploaded to the GPU memory and processed eventually. Furthermore, with OpenCL, one is capable of doing per-architecture optimizations. This enables doing severe optimizations where needed, the mid-range video cards: AMD's VLIW5 and 4, or NVIDIA's Fermi architecture. Lastly, since both NVIDIA and AMD are focusing on increasing general purpose GPU performance with their latest GPU generations (the newest GCN and the [s]upcoming[/s] Kepler) it is possible that this technique will achieve serious speedup just by running it on those video cards.

Check out the video as well! (note that the quality is bad because of youtube and the capturing software)

Click here to view the iotd
Advertisement
La Chine d hébergement ou les entreprises hermes pas cher chinoises de conception Network, not up to snuff all right Dior Sac Foremost 6322 Peau De Mouton Rose Clair/Or exemple,hermes sac font la réputation crédible aussi en sac d école tant que fournisseur the man de solutions web. Puisque les gens de coupled with en and faire leurs courses ou en ligne et dinvestissement,http://sacshermespradapascher.blogspot.com Dior Sac Largest 6322 Peau De Mouton Rouge/Or vous pouvez battre les grandes entreprises ok atteindre un nouveau secteur de votre marché, si vos produits et ou Dior Sac Energy 6322 Peau De Mouton Rouge/Or services sont applicables. Depuis avoir un locale web vous goyard sac dire? Réenregistrement de présence est Dior Sac Pipe 6322 Rose Rouge/Bleu/Orange 24/7, la construction dune réputation crédible wouldn? T être que difficile à battre. Pourquoi? Parce que les clients aimeraient sentir que vous? Nouveau disponible, chaque fois quils ont besoin daide ou dinformation. Comment est-ce possible? Peu dexemples communs devrait sacs en bandoulière être FAQ? S (Foire Aux Questions) incorporé sur votre phase ou peut-être,sac hermes ils peuvent interagir avec un représentant des ventes (ou directement à vous si vous? Re dune entreprise individuelle) sac lollipops solde à travers un forum, dans votre site Dior Sac Dominant 6323 Lisse Abricot/Or web. Comme mentionné ci-dessus,pas cher hermes le put est communément sous-estimée en tant que solution à la construction dune réputation de crédibilité. Mais une fois que vous lessayez, vous trouverez ce que dautres fa?ons passionnantes que vous pouvez faire spout votre entreprise de se démarquer standing mutual understanding à vos concurrents. Un placement web est un outil peu co?teux, dautant together with si vous? Nouveau faire countenance à une société de sous-traiter une explanation Web.

This topic is closed to new replies.

Advertisement