Nvidia slaps Groq into new LPX racks for faster AI response
Nvidia’s integration of 256 Groq 3 LPUs with Vera Rubin racks aims to boost large language model inference throughput up to 35× on trillion-parameter models.
- On Monday, Nvidia announced at GTC that it will integrate Groq 3 LPUs into its Vera Rubin NVL72 rack system, saying 'We're in production with the Groq chip'.
- To speed decoding, Nvidia pairs Groq 3 LPUs as decode accelerators with Rubin GPUs so the systems jointly compute every layer for each output token, using SRAM's higher bandwidth and deploying many chips due to low per-chip capacity.
- Each Groq 3 LPU delivers 1.2 petaFLOPS and 500 MB of memory, and Nvidia plans LPX racks with 256 LPUs, 128GB on-chip SRAM, 640TB/s bandwidth, with Ian Buck saying 'The tokens per second per chip, is actually quite low'.
- Given steep per-chip costs, the systems are likely to be adopted first by major AI companies such as OpenAI, Anthropic, and Meta, while Nvidia wagers inference providers could charge $45 per million tokens.
- Because LPUs have limited on-chip memory, Nvidia plans to ship these systems later this year with Samsung manufacturing the LPUs, and about a thousand LPUs are needed for 1 trillion-parameter models.
15 Articles
15 Articles
Analysis: Is Nvidia's Groq deal the endgame for AI chip startups?
At its 2026 GTC conference, Nvidia not only unveiled its Vera CPU but also officially launched the Groq 3 LPU chip, developed through a prior technology licensing arrangement with Groq and brought into its own ecosystem. Alongside it, Nvidia introduced the Groq 3 LPX platform - a server rack composed of 128 Groq 3 LPUs that can be directly integrated with the Vera Rubin solution. The move signals that Nvidia has successfully absorbed Groq's tech…
Decoding the Future of Inference At NVIDIA: Groq LPUs Join Vera Rubin Platform For Low-Latency Inference
With its upcoming Vera Rubin rackscale architecture, NVIDIA is going to be integrating LPUs from acquihire Groq, marking a major expansion beyond using GPUs alone for AI inference The post Decoding the Future of Inference At NVIDIA: Groq LPUs Join Vera Rubin Platform For Low-Latency Inference appeared first on ServeTheHome.
Coverage Details
Bias Distribution
- 67% of the sources are Center
Factuality
To view factuality data please Upgrade to Premium










