Raptor Lake + 5080: 380.63 GB/s Raptor Lake (CPU for reference): 20.41 GB/s GB10 (DGX Spark): 116.14 GB/s GH200: 1697.39 GB/s
In practice, this means I can get something like 55 tokens a sec running a larger model like gpt-oss-120b-Q8_0 on the DGX Spark.
55 t/s is much better than I could expect.
In practice, this means I can get something like 55 tokens a sec running a larger model like gpt-oss-120b-Q8_0 on the DGX Spark.