Gimlet Labs Blog

A blog about our lab's research on high performance AI systems.

Speeding up PyTorch inference by 87% on Apple devices with AI-generated Metal kernels
Kernel OptimizationPerformanceApple Silicon

Speeding up PyTorch inference by 87% on Apple devices with AI-generated Metal kernels

Our lab investigated whether frontier models can write optimized GPU kernels for Apple devices to speed up inference. We found that they can: our AI-generated Metal kernels were 1.87x faster across 215 PyTorch modules.
August 26, 2025
By Taras Sereda, Natalie Serrino, Zain Asgar
Kernel OptimizationPerformanceApple Silicon