Introducing Gimlet Labs: AI Infrastructure for the Agentic Era

We're excited to finally share what we've been building at Gimlet Labs. Our mission is to make AI workloads 10X more efficient by expanding the pool of usable compute and improving how it's orchestrated.

Over the last two years, rapid advances in AI have driven an explosion in inference demand, but infrastructure has struggled to keep pace. Agentic systems exacerbate the problem, generating 5-15X more tokens than traditional chat models. Infrastructure teams struggle with GPU efficiency and scaling. Developers feel the downstream availability crunch and performance issues even for basic capabilities.

Gimlet solves this by decoupling agentic AI workloads from specific hardware. Our platform automatically disaggregates each workload into its component stages, and maps each stage to the most suitable accelerator. Compute-bound tasks go to high-throughput GPUs, memory-bound tasks to higher-bandwidth accelerators, and network-bound tasks to nodes with fast interconnect.

All of this happens automatically, so developers don't need to rewrite their workloads. The result is better performance for the same cost, or better cost for the same performance.

To make this work seamlessly, we've made significant technical advancements. Under the hood, there are three major components to the system:

An intelligent workload orchestrator that translates agents into compute graphs, slices the compute graphs into fragments, and dynamically distributes the fragments across available hardware.
A hardware-agnostic compiler that optimizes the execution of fragments and lowers them to optimized implementations for particular accelerators.
Autonomous kernel generation to automatically create optimized kernels for different hardware platforms.

We've been working with early customers on this system and deploying it in datacenter environments, supporting hardware vendors such as NVIDIA, Intel, and AMD today. Our technology is built to support heterogeneous environments, supporting diversity across not just vendors, but also different generations of hardware and product lines. Distributing agentic workloads across heterogeneous compute slashes the cost per token and unlocks capacity that would otherwise sit idle.

We're bringing this technology to a broader developer audience: a hosted platform for running agentic workloads built on the same stack, and a standalone toolkit for autonomous kernel generation.

Our north star of making agentic AI workloads 10X more efficient will mean more tokens for the same cost, better power efficiency, and fuller utilization of existing hardware. For developers, it will mean more capacity, better prices, more flexibility, and faster performance.

If you're building or deploying agentic workloads, join our waitlist or reach out. Thank you to our investors at Factory and our incredible angel investors. We are also hiring. Collectively, we can create the next generation of AI infrastructure and rethink how agentic AI is run.

Introducing Gimlet Labs: AI Infrastructure for the Agentic Era

Links