Arguably DRAM-based GPUs/TPUs are quite inefficient for inference compared to SR...

		wmf 29 days ago \| parent \| context \| favorite \| on: How Taalas “prints” LLM onto a chip? Arguably DRAM-based GPUs/TPUs are quite inefficient for inference compared to SRAM-based Groq/Cerebras. GPUs are highly optimized but they still lose to different architectures that are better suited for inference.