The University of Edinburgh has announced the installation of a Cerebras CS-3 system cluster at its EPCC supercomputing center.
Consisting of four CS-3 Wafer Scale Engine processors, the cluster will be used by the university to train models ranging from 240 billion to 1 trillion parameters whilst enabling EPCC to run models consisting of more than 2,000 tokens per second up to 70 percent faster than “leading GPU solutions,” according to Cerebras.
First unveiled in March 2024, the Wafer Scale Engine 3 consists of four trillion transistors and 900,000 'AI cores,' alongside 44GB of on-chip SRAM. Sold as part of the CS-3 system, the company claims the chip is capable of 125 peak AI petaflops.
This is the third time the EPCC has deployed Cerebras chips, with the new CS-3 cluster representing a doubling of the compute capacity provided by the two CS-2 systems the university had previously installed.
First opened in 1990 and running national supercomputing services since 1994, EPCC is one of the largest supercomputer centers in Europe, focused on providing services and R&D across traditional HPC, data science, and AI workloads.
The partnership with Cerebras was initially born out of EPCC director, Professor Mark Parsons’ ambition to invest in more interesting technologies and be at the “leading edge” of the industry. Announced in February 2021, the initial deployment saw EPCC become Cerebras’ first international customer, launching a CS-1 cluster alongside an HPE Superdome Flex Server system for front-end storage and pre-processing.
Speaking to DCD ahead of today’s announcement, Dr. Andy Hock, VP of product at Cerebras, said that because the new CS-3 cluster supports both training and inference, it will “radically expand the resources available for EPCC and its associated members.”
“There's been a lot of commercial attention on our systems for inference, and I think that's going to be a centerpiece of how EPCC users use and experiment with this new cluster,” Hock said. “But, more than that, I think EPCC is an important partner for us because they represent the intersection of not just model training, but also inference and HPC, and they're working not for commercial purpose, but to advance fundamental AI development.”
While EPCC still has a need to deploy compute for training purposes, the models built by researchers are sufficiently capable that AI inferencing runs are increasingly where value is being delivered.
And while EPPC also has GPU clusters that sit alongside its Cerebras deployment, Hock claims that unlike traditional GPUs that experience bottlenecks, the chip architecture offered by Cerebras provides “orders of magnitude more memory bandwidth, so our chips can deliver a fundamentally different level of inference performance… typically between 20x and 70x faster than a contemporary state-of-the-art GPU.”
As a demonstration of this, earlier this month, Cerebras announced that in partnership with Canadian silicon photonics firm Ranovus, it had been awarded a $45 million contract from the United States’ Defense Advanced Research Projects Agency (DARPA) to reduce compute bottleneck challenges.
Echoing Hock’s comments, Parsons said that while the UK government has been widely pushing the AI training narrative, very little attention has been given to inference – noting that in the government’s AI Opportunities Action Plan published in January 2025, inference is only mentioned once.
“All the value from AI comes from asking things questions or presenting it with an image to analyze. And one of the reasons why we wanted to get these CS-3s is because we know we need to understand inference and its value better.“
In addition to providing value via its inferencing capabilities, Parsons also said that given the EPCC’s £9.8 million ($12.5m) annual electricity bill, the fact that per flops, Cerebras’ CS-3s are more than 2x more energy efficient than GPUs for inferencing, is incredibly important.
“With inferencing, [the chips] respond so much more quickly; they can never respond to many more questions in the time available, which, in itself, saves energy. So there are huge benefits to doing inference in these large devices, which, to be honest, I hadn't really considered until very recently,” he said.
“It would be wonderful if I could raise the money to buy a 64 CS-3 system and we shall see where we get with those arguments. But... we would very much like to host a system like that.“