> You seem to be moving the goalposts. First it was using in production, now it's building foundation models..
I'm not moving the goal posts but maybe I wasn't clear. When we talk about corporations adopting fine-tuning and inference that's not especially relevant to H100 sales which is the main cash-cow (~10B of revenue at 80-90% margins) and Nvidia's massive market cap growth. What is relevant is corporations like Inflection AI building 22k H100 clusters.
I work in academia so PyTorch is more common but are people in industry fine-tuning LLMs actually working directly with CUDA that much for this to be a big moat?
Companies are using both A100 and H100 for inference. The datacenter numbers include both, and I don't believe they break it down further. And no, by and large, large enterprises are not building out large clusters - but they are much of the demand for all those cloud provider build outs.
No, no one is using CUDA directly. But if you are a vendor with a new equivalent, it's no small feat integrating it with a framework like PyTorch. There is a lot of work required by various parties.
It would be interesting to see what the breakdown is on H100/A100 users. I would expect most inference users are similar to my lab and max out at a DGX node rather than being the bulk of users.
PyTorch has gotten a lot better on TPUs this year, I don’t believe there’s much of a performance hit now. Jax and TF (I don’t use the latter anymore) of course work. I never used Gaudi2 but it apparently works.
All of this to say, is it possible that Intel gets their fabs working and strategically partners with their long-time partner MS and OpenAI to extend Triton to Gaudi3 or 4 and be a threat to Nvidia within 2-3 years? Absolutely.
Is it similarly possible that Google increases development on Jax and TPUv5? Sure.
Neither of these possibilities, regardless if you think they’re probable or improbable, would need a decade to catch up to Nvidia.
I'm not moving the goal posts but maybe I wasn't clear. When we talk about corporations adopting fine-tuning and inference that's not especially relevant to H100 sales which is the main cash-cow (~10B of revenue at 80-90% margins) and Nvidia's massive market cap growth. What is relevant is corporations like Inflection AI building 22k H100 clusters.
I work in academia so PyTorch is more common but are people in industry fine-tuning LLMs actually working directly with CUDA that much for this to be a big moat?