Google has unveiled its seventh-generation Tensor Processing Unit (TPU), Ironwood.

Announced at the company’s Next ‘25 event in Las Vegas, Google said it is the first TPU designed specifically for inferencing.

Google Ironwood TPU
Google Ironwood TPU – Google

Built to support the computational requirements of what Google dubbed “the next phase of generative AI,” Ironwood comes in two liquid-cooled sizes based on AI workload demands, a 256 chip configuration and a 9,216 chip configuration.

Google says each individual chip boasts a peak compute of 4,614 teraflops and, when scaled to 9,216 chips, each pod totals 42.5 exaflops – which the company claims is more than 24x the computer power of the world’s most powerful supercomputer El Capitan, which offers 1.742 exaflops per pod.

However, it should be noted that the performance of El Capitan is measured in FP64 precision but Google has used FP8 precision benchmarking, or 8-bit precision calculations, to calculate the compute performance of Ironwood.

While AI performance is measured in FP8, traditional compute performance is measured in double-precision calculations, also known as FP64 – the industry standard for large systems.

As a result, a system boasting 100 petaflops of FP64 performance is more powerful than a system that has achieved 100 petaflops of FP8 performance, for example.

Ironwood also offers 192GB of 7.4Tbps of high-bandwidth memory (HBM) per chip, a 6x increase on Google’s sixth-generation Trillium TPU, in addition to 1.2Tbps bidirectional Inter-Chip Interconnect (ICI) bandwidth, a 1.5x improvement over Trillium.

Speaking at a press briefing ahead of the launch, Amin Vahdat, VP and general manager of ML systems and cloud, said that Ironwood is nearly 30x more power-efficient than the first cloud TPU from 2018 and almost twice as efficient as Trillium.

“We're excited to introduce our seventh generation TPU, Ironwood,” Vahdat said. “It's built from the ground up for inferencing at scale, ushering our customers into the age of inference, where it's no longer about the data bits of the model, but what the model can do with data after it's been trained.”

Furthermore, with the launch of Ironwood, Google says it is also pushing its network to “new heights,” with its new 400G Cloud Interconnect and Cross-Cloud Interconnect offering up to 4x more bandwidth than the company’s 100G Cloud Interconnect and Cross-Cloud Interconnect.

Additionally, Google said its Hyperdisk Exapool network block storage offering provides allows users to provision up to an exabyte of block storage capacity, the highest performance and capacity per AI cluster of any hyperscale, while a new Cloud Storage zonal bucket provides up to 20x faster random-read data loading than a Cloud Storage regional bucket, enabling users to colocate their primary storage with TPUs or GPUs.

“More than a decade ago, Google began investing in TPU development for our own services to push forward the frontier of what is possible in scale and efficiency,” said Vahdat. “As an industry, over the last eight years, we've seen a 10x year over year increase in demand for training and serving models, that's a factor of 100 million.”

Subscribe to The Compute, Storage & Networking Channel for regular news round-ups, market reports, and more.