Posts tagged with Sram

Published on
March 11, 2026
Low-Latency Inference with Speculative Decoding on d-Matrix Corsair and GPU
Inference Performance Hardware SRAM Speculative Decoding d-Matrix
We evaluated running gpt-oss-120b with a 1.6B parameter speculative decoder on d-Matrix Corsair. Compared to the same speculative decoder on GPU and equivalent energy consumption, we've found that the Corsair-based solution delivers 2-5X end-to-end request speedup on configurations optimized for interactivity, and up to 10X end-to-end speedup for energy-optimized configurations.
Published on
March 5, 2026
The emerging role of SRAM-centric chips in AI inference
Inference Performance Hardware SRAM
In this post, we'll discuss the major differences between GPUs and SRAM-centric accelerators (e.g. Cerebras, Groq, and d-Matrix), explaining why near-compute memory versus far-compute memory is the key tradeoff being made by these architectures, and what this means for inference workloads.

Low-Latency Inference with Speculative Decoding on d-Matrix Corsair and GPU