Posts tagged with D-matrix

Published on
March 11, 2026
Low-Latency Inference with Speculative Decoding on d-Matrix Corsair and GPU
Inference Performance Hardware SRAM Speculative Decoding d-Matrix
We evaluated running gpt-oss-120b with a 1.6B parameter speculative decoder on d-Matrix Corsair. Compared to the same speculative decoder on GPU and equivalent energy consumption, we've found that the Corsair-based solution delivers 2-5X end-to-end request speedup on configurations optimized for interactivity, and up to 10X end-to-end speedup for energy-optimized configurations.

Low-Latency Inference with Speculative Decoding on d-Matrix Corsair and GPU