- Published on
We evaluated running gpt-oss-120b with a 1.6B parameter speculative decoder on d-Matrix Corsair. Compared to the same speculative decoder on GPU and equivalent energy consumption, we've found that the Corsair-based solution delivers 2-5X end-to-end request speedup on configurations optimized for interactivity, and up to 10X end-to-end speedup for energy-optimized configurations.