Google has launched its A4X virtual machines (VMs) powered by Nvidia GB200 NVL72.

The VMs are purpose-built for training and serving extra large-scale AI workloads, particularly those involving reasoning models, LLMs with long context windows, and scenarios needing massive concurrency.

Google Cloud claims to be the first and only cloud provider to offer VMs powered by both B200s and GB200s.

The Nvidia GB200 NVL72 is a system made up of 72 Nvidia Blackwell GPUs and 36 Arm-based Nvidia Grace CPUs, connected via fifth-generation Nvidia NVLink.

Nvidia GB200 NVL72
– Nvidia

Each GB200 NVL72 offers more than one exaflop of training performance, and the A4X VM offers a 4x increase in LLM training performance, compared to the A3 VMs which are powered by Nvidia H100s.

The A4X VMs enable the deployment of models across tens of thousands of Blackwell GPUs and are currently available in preview.

In a Google blog post by George Elissaios, VP of AI infrastructure and compute, and Roy Kim, director of cloud GPUs, the two note that the A4X VMs are part of Google's AI Hypercomputer supercomputing architecture, and users can "deploy and manage large clusters of A4X VMs with compute, storage and networking as a single unit" which makes it easier to manage complexity in distributed workloads.

The VM also includes the Titanium ML network adaptor which is based on Nvidia ConnectX-7 network interface cards (NICs). It delivers 28.8TBps of non-blocking GPU-to-GPU traffic, while each NVL72 is then connected by Google's Jupiter network fabric.

"Developers and researchers need access to the latest technology to train and deploy AI models for specific applications and industries," said Alexis Bjorlin, VP of Nvidia DGX Cloud, Nvidia.

"Our collaboration with Google provides customers with enhanced performance and scalability, enabling them to tackle the most demanding generative AI, LLM, and scientific computing workloads while benefiting from the ease of use and global reach of Google Cloud."

The blog post also notes that the VMs are cooled by "Google's third-generation liquid cooling infrastructure" - though does not include details about the cooling methodology beyond that it is "based on learnings over years of operational experience."

In November 2024, reports emerged that Nvidia had redesigned the 72-GPU racks after it found issues with its Blackwell GPUs overheating. Prior to this in August, the Blackwell GPU family was reported to be facing delays due to an unexpected design flaw, unrelated to overheating. Later that month, the company stated that the issue had been resolved.

Google first offered a sneak preview of its GB200 NVL racks in October 2024. At the start of this month, Google made the Nvidia Blackwell GPUs available in preview via its A4 VMs.

In January 2025, Lambda Labs deployed two GB200 NVL72 racks - one at EdgeCloudLink's data center in Mountain View, California, and the other at a Pegatron facility. CoreWeave has also brought GB200 NVL72 instances to its cloud platform.

In October 2024, Microsoft posted that it was the first cloud to be running a Blackwell system with GB200-powered AI servers, though this was reportedly not an NVL72 GB200 configuration.

In May 2024, several companies announced their intention to deploy the GB200 GPUs, including Oracle, Amazon, Microsoft, Google, and several smaller cloud providers.