Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think that here the main bottleneck is data movement. If you are streaming weight data from a 6GB/s SSD you'll get under 10% of the performance shown for 3090 (which will be moving data at PCIe 4 speeds of 64GB/s).

Once in unified memory the weights are accessible at about half the rate they are on the 3090 (400GB/sec on M2 Max vs 936GB/sec on 3090).



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: