The AI boom has upended the data center sector, forcing a rapid shift to liquid-cooled racks as facilities pivot from sub-10kW racks to 120kW racks.
That dramatic change alone has caused profound disruption in a sector that has historically moved at a cautious pace, but this week at its GTC developer conference, Nvidia laid out a pathway to an even more ambitious goal: 600kW racks by the end of 2027.
"The reason why we communicated to the world what Nvidia's next three or four-year roadmap is that everybody else can plan," Huang tells DCD.
"We're the first technology company in history that announced four generations of something. That's like somebody saying 'I'm going to announce my next four phones,' it makes no sense. But we are an infrastructure company, we are a factory for the world, and we're foundational for so many companies."
GTC: The Nvidia roadmap revealed
The first new GPU generation is the Blackwell Ultra, which will be released later this year. While it will consume more on a per-chip basis than Blackwell, the DGX GB300 NVL72 is not expected to consume any more power than its GB200 ancestor, DGX head Charlie Boyle explains.
"Even though GB300 got faster, we optimized the chip, and we did some interesting things on cooling," he says.
To the naked eye, the Nvidia GB300 NVL72 doesn't look too different from the GB200. "You can only tell the difference when you go behind the rack," Boyle says.
"One of the racks is actually a busbar. And so, for the first time, we have a data center level power shell that can power the entire rack, you just slide the DGXs in there."
Alongside this, the company has ramped up the number and size of capacitors to smooth power flows. "We want the power subsystem in the server to absorb the shock. We don't want to send that back to the data center. When you send it back to the data center, all sorts of things start happening," Boyle adds.
Traditionally, data center operators have had to build in buffer power for potential peaks. "Depending on how conservative or up to the edge you are, a lot of customers are building 1.3 or 1.5x of the rack power," Boyle explains.
"That's wasted power, stranded power. But as power is the most important thing in the data center, I want to use every watt possible. And so with the new power systems in our B300 designs, and then continuing in our GB designs, you don't have to overprovision data center power in order to run these things at maximum capacity, even when you hit those peaks."
This should mean that GB300 racks will be as easy to deploy as the 200 was, although many data center operators are struggling to meet the density requirements at scale of these systems.
And things only continue to ramp up from there. In the second half of 2026, Nvidia is promising to deliver the Vera Rubin NVL144, with a new Arm chip and a new GPU. The company has yet to disclose how much power it expects that rack to consume, but it will likely be higher, with Boyle stating that “there'll be lots of steps in between” the 120kW scale and 600kW. “We've got to meet our customers where they are today.”
It's worth noting that, with this first Vera Rubin generation, Nvidia has changed its nomenclature for these racks.
In the DGX GB300 NVL72, the number denoted the 72 GPUs in the rack. From Rubin on, it will instead show the number of reticle-sized GPUs. Both Blackwell and Rubin are each made of two reticle-sized GPUs, which is why the number has doubled to 144, but in reality, the number of GPUs is the same over both generations.
That then brings us to the Rubin Ultra NVL576 in the second half of 2027. Each Rubin Ultra has four reticle-sized GPUs, meaning a total of 144 of the larger footprint GPUs. Similarly, the number of the new Arm CPUs is expected to increase - although Nvidia has yet to confirm by how much.
Introducing Kyber
All of this brings the 'Kyber' rack to 600kW, beyond the limits of the vast majority of data centers today, especially if the racks are deployed at scale.
While much of the system may still change in the years to come, some facts are known. "GB200 & 300 are like 90 plus percent liquid-cooled," Boyle says. "But there's still some fans in it, there are components that aren't cold plate. The Kyber rack design is 100 percent liquid cooling, no fans."
Kyber also uses compute blades, smaller servers that are vertical, to pack more compute and networking into the rack.
Crucially, Kyber also includes a rack-sized sidecar to handle power and cooling. Therefore, while it is a 600kW rack, it requires two racks' worth of physical footprint, at least in the current version shown by Nvidia.
Blackwell racks also require power and cooling infrastructure in data center racks, but one such system can support many GB200 racks - so each 120kW rack requires only one and a bit racks' worth of physical footprint.
Of course, the rapid leap in density in just a few years begs the question of just how far it can go. Over the past 18 months, the company has been talking to its supply chain about building 1MW racks. At GTC, rumors swirled of plans to double the density of the NVL576.
Getting beyond the MW rack level, should it happen, will not be easy. In issue 56 of the DCD Magazine (out next week), Vertiv CEO Giordano Albertazzi explains that another leap in density will necessitate “a further revolution in the liquid cooling, and a paradigm change on the power side.
“Higher voltages, different types of power infrastructure, all things that are still dynamic," Albertazzi says.
He continues: "It's undeniable that the density will continue to increase. Will we get to 1MW exactly? I don't know, but the density will definitely go higher and higher because the compute will be so much more efficient when that happens.
“It will have a slightly bigger footprint, but it will not be huge. It won’t have the footprint of 10 racks. Certainly, it will be much more robust, simply because the weight will be totally different scales. But the concept would not be alien to what we think of today.”
Data center scale challenges
Back at GTC, Nvidia took more than a hundred executives from the data center sector to their campus to discuss the challenges of deploying ever-denser racks.
"I wanted to make sure I talk to DCD, because we've become an infrastructure company," Nvidia’s data center boss Ian Buck said. "We're not talking chips anymore. We're talking at data center scale."
He hopes that increased transparency about the GPU giant's plans will help stave off the worst. "The NVL72 kind of showed up and it's been a scramble to get the world's data centers to be able to support 120kW racks and move the world to liquid cooling this fast at this scale."
By going public, it allows potential suppliers "to have the courage to invest in building stuff that they would normally only do in a bespoke, one-off supercomputer as an R&D project," Buck explains.
"It's getting a little easier now that we've gotten NVL72 [out] to see the future, but it's very important for them to understand our road map so they know that build the next generation. Even the flow rate around a bend pipe matters. There's abrasion, we need to design the right kind of pipe in a way that we can scale it to millions of GPUs, so that we are building the right thing."
Managing to build something that can exist outside of a research lab is key, Buck says. “It's hard to know where the limit is. I came from a world of supercomputing: There, a possible megawatt cabinet is not unheard of. It's quite conceivable; those people don't flinch. The technology is there. The hard part is, can you do it on a mass scale?
“Can you do it at that scale of not building one spaceship, one Formula One car, but building all the world's cars like that? That is the question. Can we get a supply chain [to help us do this]?”
Of course, at some point, be it due to cooling or power constraints, the data center sector may be unable to support denser racks.
Jensen Huang doesn't see a limit coming soon. He dodged DCD’s question about where the near-term cap on rack densities were, instead suggesting that there was no real end in sight.
"Well a data center is now 250MW; that's kind of the limit per rack," Huang told DCD. "I think the rest of it is just details. And so if you said that data centers are a gigawatt, then I would say a gigawatt rack sounds correct.”
He continued: "Of course, the rest is engineering and practicality. It isn't necessary to put everything in one rack.”
Core to Nvidia’s pursuit of ever greater density is the desire to get as many GPUs as possible on a single fabric, in this case the company’s proprietary NVLink communications link.
“The overhead of the networking, the communications protocol, is extremely low level,” Huang said. “This means that all of the GPUs can work together as one. In fact, they talk to each other like they're addressing each other's memories. All of our GPUs that are connected over NVLink are essentially one giant chip.”
The problem is that this is all handled by copper, limiting how far the GPUs can be from each other.
"Nvidia will absolutely try to build the best possible building block," Buck said. "And that building block right now is how many GPUs you can connect in a single NVLink domain in copper."
He added: "You can only go a meter and a bit before the electrons just want to schmoo out. That's what's driving the density. And I think that's going to continue till it's exhausted, and then we'll go and figure out the next optical 'something-something,' the next scaling fabric to reduce the cost."
The company tried to go a different route in the past, announcing the DGX One Ranger, which "was the size of the entire keynote stage at Computex," Buck recalls. Instead of copper, it went optical, "but the problem was that, to provide that much NVLink bandwidth disaggregated like that, there was so much optical fiber and so many transceivers.
"Half your power was spent just on optics, just on all the connectivity versus the compute. It wasn't a good, efficient design, and it never made it to production."
Copper "doesn't burn any power, so the mission became, 'let's put as many GPUs as we can in an NVL. So we'll keep doing that.'"
Photonic switching
The company has changed a lot since the days of Ranger, however. This week, it announced that it was launching silicon photonics networking switches - that is, for connecting racks to each other, not for connectivity within the rack.
That move alone is expected to significantly decrease networking power demands, crucially increasing the percentage of facility power that can go to deploying more GPUs.
But it does suggest at least a pathway to bringing optical further into the rack. "As long as you can use copper, you're going to use copper," Nvidia's SVP of networking, Gilad Shainer, said.
"But, at some point, there will not be any copper. This one is only 200 gig SerDes per lane. In three years down the road, when we transition to 400 gig per lane, that one to two meter [limit of] copper will go to nothing. That means that everything will become optical - but we build systems based on what you can do, what exists now.”
For now, “NVLink is copper between NVLink domains and scale out is fiber. In the future, you can assume that everything will become optical, and then there will be more innovations around co-package optics and how you bring optics directly to the switch and not to go through transceivers."
Whatever the solution, the core focus is to fit as many GPUs within that single fabric as possible.
"We are going to scale up as far as we can," Huang said, repeatedly referencing the idea of thousands of GPUs in a single fabric, before the company may need to shift focus to 'scale out' across multiple racks.
Here, he is likely referring to reticle-sized GPUs, rather than distinct GPUs, again following the naming convention change that sees the NVL576 have 144 GPUs, each with four reticle-sized GPUs.
"Is it necessary to put everything in one rack if there's no reason to scale up more than a few 1,000 processors because the mathematics of scale up and scale out is such that there are diminishing returns beyond 1,000 or 5,000 GPUs?
"After that, we could scale out with many racks, right? But if the scaling up is very effective, like it between 72 and 144, and 288 and 576, then we ought to try to scale up as much as we can. And so that's kind of where we are.
"Someday, I bet we'll find diminishing returns, in which case, we'll scale up to 4,000 - pick a number - X number of thousand GPUs. After that, we can scale it out - and then you don't have to rack it up as densely."