🎉 Celebrating 25 Years of GameDev.net! 🎉

Not many can claim 25 years on the Internet! Join us in celebrating this milestone. Learn more about our history, and thank you for being a part of our community!

Warp Divergence and [flatten] attribute

Started by
3 comments, last by __MONTE2020 2 years ago

Can anyone explain what is difference in performance between [flatten] and [branch] attributes with warp divergence when branching in shaders?

For example,

[attribute-here] if(a) doA();

else doB();

where a is some dynamic value (so not uniform or compile time constant). In both cases, the threads in warp will evaluate both doA() and doB(), right? When using [branch] it will happen because of divergence, and when using [flatten], well because we are flattening?

When exactly is [flatten] recommended to be used (except maybe here)?
Thanks

Advertisement

Anyone?

If you flatten a branch then there will never be any divergence. If you check the compiler output, it will typically generate some instruction sequence where both sides of the branch are evaulated, and then the correct result is selected using a conditional move. If you force a branch instead, the compiler will emit actual branching instructions would could result in divergence within your warp/wave. If there's divergence then performance and number instructions executed will be similar to the flattened case, but it will execute differently on the actual hardware (typically GPUs will have a thread mask that they set appropriately at a branch point, which causes instructions from masked threads to have no effect). The main difference is that if all threads in the warp/wave take the same path in the branch, then an actual branch can skip over one side of the branch.

To be honest, the flatten attribute is a bit of leftover from the earlier days of programmable GPUs. Initially they didn't really have true branching at all, and then later on branching tended to be quite slow and so you only wanted to use it in cases where it really made a difference. These days that's not really the case, although there could still be some small perf differences between a flattened branch and true branch that's divergent, since the latter might still have a bit of overhead from setting up masks and issuing the branch instructions. Flattened branches can also be inefficient if there's many results and side effects from each side of the branch, since each one will require a conditional move to select the correct result.

IMO you probably don't need to really worry about it in most cases, you can just write your code with branches and let the compiler sort it out. Mainly you just want to know what to expect in terms of when a branch can and can't save you performance, dependent on how divergent the branch is within your warp/wave. For “early out” optimizations it can sometimes make sense to use wave intrinisics to only take the early-out case if all threads can take the early-out, that way you don't have to execute both the early-out and the expensive path.

@mjp Thanks for the detailed response, I appreciate it!

This topic is closed to new replies.

Advertisement