Because you care about performance?
step being implemented as a libray function on top of a conditional doesn't really say anything about its performance vs being a dedicated instruction. Don't worry about the implementation.
Because you are curious about GPU architectures?
Look at disassembly, (open source) driver code (including LLVM) and/or ISA documentation.
Because you care about performance? step being implemented as a libray function on top of a conditional doesn't really say anything about its performance vs being a dedicated instruction. Don't worry about the implementation.
Because you are curious about GPU architectures? Look at disassembly, (open source) driver code (including LLVM) and/or ISA documentation.