Gimlet Labs Blog

A blog about our lab's research on high performance AI systems.

Splitting LLM inference across different hardware platforms

Splitting LLM inference across different hardware platforms

Separating prefill and decode stages of LLM inference improves token throughput because their resource needs differ. Although most deployments use NVIDIA hardware for both stages, multivendor disaggregation can actually improve efficiency while maintaining SLAs. Based on our models using NVIDIA B200s and Intel Gaudi 3, common workloads can see 1.7X TCO improvement compared to single-vendor disaggregation.
October 13, 2025
By Zain Asgar, Michelle Nguyen, Sachin Katti, Natalie Serrino
InferencePerformanceTCOIntelNVIDIA