NVIDIA Breaks Text Generation Speed Record with Llama 4 Maverick — 1038 Tokens Per Second
Company NVIDIA Sets New World Record for Token Processing Speed Per User, reaching 1038 tokens per second (TPS/user) in the Meta Llama 4 Maverick model. According to analysts at Artificial Analysis, this result was recorded on a DGX B200 cluster equipped with eight GPU Blackwell architecture, and surpassed the previous leader, SambaNova, by 31%.
Up to this point the record belonged to SambaNova with a result of 792 TPS/user, but NVIDIA was significantly ahead of all competitors thanks to a number of technical optimizations. In particular, Llama 4 Maverick used the TensorRT library and the Eagle-3 speculative decoding technique, which allows predicting output tokens in advance. Only these two technologies ensured 4x performance improvement over previous Blackwell results.
The performance chart shows that NVIDIA and SambaNova are far ahead of the rest of the field. Amazon (291 TPS) and Groq (276 TPS) followed in third and fourth place, while other companies including Google Vertex, Together.ai, Deepinfra, Novita and Azure failed to break the 200 TPS mark. Fireworks, Lambda Labs and Kluster.ai platforms also fell behind, demonstrating less than 180 TPS/user.
It is worth noting that the TPS/user (tokens per second per user) indicator focuses specifically on performance when individual generation rather than batch processing, which is especially important for chatbots and real-time AI services. The higher the TPS, the faster the AI responds to user requests - a key factor in the day-to-day operation of such models.
In addition to speed, NVIDIA has improved the accuracy of output using FP8 data format instead of BF16, as well as the Mixture of Experts technique and optimizations at the CUDA core level: spatial partitioning and dynamic mixing of GEMM weights. All this indicates that NVIDIA Extends Leadership in AI Infrastructure, especially in the LLM field.