arzh-CNenfrdejakoplptesuk
Search find 4120  disqus socia  tg2 f2 lin2 in2 X icon 3 y2  p2 tik steam2

AMD Unveils ROCm 7: AI Inference Acceleration Up to 3.8x and Full MI350 Support

AMD has officially announced the next generation of open source software - ROCm 7, focused on AI inference acceleration and support for the Instinct MI350 series. The new technology stack replaces ROCm 6 and includes support for FP8, FP6 and FP4 formatsand optimizations for distributed processing and prefilling.

AMD ROCm 7

ROCm 7 introduces new algorithms and kernels, including GEMM Autotuning, MoE, Attention and Python Kernels, and frameworks have also been added vLLM v1, llm-d and SGLang. The main performance gain was recorded in inference tasks: up to 3.5× faster than ROCm 6, with a maximum 3.8× in DeepSeek R1, 3.2× in Llama 3.1 70B и 3.4× in Qwen2-72B.

ROCm 7 on MI355X outperforms Blackwell B200 with CUDA by 30% by throughput in DeepSeek R1 (FP8). In addition, Accelerate learning up to 3× obtained in Llama 2, 3.1 and Qwen 1.5. The new stack also scales to CPU, GPU and DPU, providing Universal solutions for Enterprise AI and GenAI tasks.