🎉 Celebrating 25 Years of GameDev.net! 🎉

Not many can claim 25 years on the Internet! Join us in celebrating this milestone. Learn more about our history, and thank you for being a part of our community!

Back to Graphics and GPU Programming

Warp Divergence and [flatten] attribute

Graphics and GPU Programming Programming CUDA HLSL

Started by __MONTE2020 June 01, 2022 09:19 AM

3 comments, last by __MONTE2020 2 years ago

__MONTE2020

Author

June 01, 2022 09:19 AM

Can anyone explain what is difference in performance between [flatten] and [branch] attributes with warp divergence when branching in shaders?

For example,

[attribute-here] if(a) doA();

else doB();

where a is some dynamic value (so not uniform or compile time constant). In both cases, the threads in warp will evaluate both doA() and doB(), right? When using [branch] it will happen because of divergence, and when using [flatten], well because we are flattening?

When exactly is [flatten] recommended to be used (except maybe here)?
Thanks

__MONTE2020

Author

June 11, 2022 01:04 PM

Anyone?

MJP

20,297

June 18, 2022 09:37 PM

If you flatten a branch then there will never be any divergence. If you check the compiler output, it will typically generate some instruction sequence where both sides of the branch are evaulated, and then the correct result is selected using a conditional move. If you force a branch instead, the compiler will emit actual branching instructions would could result in divergence within your warp/wave. If there's divergence then performance and number instructions executed will be similar to the flattened case, but it will execute differently on the actual hardware (typically GPUs will have a thread mask that they set appropriately at a branch point, which causes instructions from masked threads to have no effect). The main difference is that if all threads in the warp/wave take the same path in the branch, then an actual branch can skip over one side of the branch.

To be honest, the flatten attribute is a bit of leftover from the earlier days of programmable GPUs. Initially they didn't really have true branching at all, and then later on branching tended to be quite slow and so you only wanted to use it in cases where it really made a difference. These days that's not really the case, although there could still be some small perf differences between a flattened branch and true branch that's divergent, since the latter might still have a bit of overhead from setting up masks and issuing the branch instructions. Flattened branches can also be inefficient if there's many results and side effects from each side of the branch, since each one will require a conditional move to select the correct result.

IMO you probably don't need to really worry about it in most cases, you can just write your code with branches and let the compiler sort it out. Mainly you just want to know what to expect in terms of when a branch can and can't save you performance, dependent on how divergent the branch is within your warp/wave. For “early out” optimizations it can sometimes make sense to use wave intrinisics to only take the early-out case if all threads can take the early-out, that way you don't have to execute both the early-out and the expensive path.

The Blog | The Book

__MONTE2020

Author

June 20, 2022 11:20 AM

@mjp Thanks for the detailed response, I appreciate it!

🎉 Celebrating 25 Years of GameDev.net! 🎉

Warp Divergence and [flatten] attribute

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

🎉 Celebrating 25 Years of GameDev.net! 🎉

Warp Divergence and [flatten] attribute

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines