GPUs and NPUs are gaining optimizations for the transformer architecture. It’s not “GPU is 3x faster this year”, it’s “GPU has gates specifically designed to accelerate your LLM workload”
See for instance [0], which is just starting to appear in commercial parts.
This is continuing; pretty much every low cost SoC maker is racing to build and extend ML optimizations.
See for instance [0], which is just starting to appear in commercial parts.
This is continuing; pretty much every low cost SoC maker is racing to build and extend ML optimizations.
0. https://www.synopsys.com/blogs/chip-design/best-edge-ai-proc...