Posts tagged with Tco

Published on
October 20, 2025
Designing infrastructure for running efficient AI workloads
Inference Performance TCO
AI workloads are shifting from simple LLM inference to complex, multi-model workflows. To run them efficiently at scale, we need a system that can dynamically decompose workloads, plan and schedule them, and map execution to the right hardware.
Published on
October 13, 2025
Splitting LLM inference across different hardware platforms
Inference Performance TCO Intel NVIDIA
Separating prefill and decode stages of LLM inference improves token throughput because their resource needs differ. Although most deployments use NVIDIA hardware for both stages, multivendor disaggregation can actually improve efficiency while maintaining SLAs. Based on our models using NVIDIA B200s and Intel Gaudi 3, common workloads can see 1.7X TCO improvement compared to single-vendor disaggregation.

Designing infrastructure for running efficient AI workloads