Cerebras unveiled its new AI supercomputer Andromeda at SC22. With 13.5 million cores across 16 Cerebras CS-2 systems, Andromeda boasts an exaflop of AI compute and 120 petaflops of dense compute. Its computing workhorse is Cerebras’ wafer-scale, manycore processor, WSE-2.
Each WSE-2 wafer has three physical planes, which handle arithmetic, memory, and communications. By itself, the memory plane’s 40GB of onboard SRAM can hold an entire BERTWIDE. But the arithmetic plane also has some 850,000 independent cores and 3.4 million FPUs. Those cores have a collective 20 PB/s or so of internal bandwidth, across the communication plane’s cartesian mesh.
Cerebras is emphasizing what it’s calling “near-perfect linear scaling,” which means that for a given job, two CS-2s will do that job twice as fast as one, three will take a third of the time, and so on. How? Andromeda’s SC-2 systems rely on parallelization, Cerebras said, from the cores on each wafer to the SwarmX fabric coordinating them all. But the supercomputer’s talents extend beyond its already impressive 16 nodes. Using the same data parallelization, researchers can yoke together up to 192 CS-2 systems for a single job.
Andromeda Scales Up With Epyc Wins
Andromeda gets its data from a bank of 64-core AMD EPYC 3 processors. These processors, AMD said via email, work in tandem with the CS-2 wafers, doing “a wide range of data pre- and post-processing.”
“AMD EPYC is the best choice for this type of cluster,” Cerebras founder and CEO Andrew Feldman told us, “because it offers unparalleled core density, memory capacity and IO. This made it the obvious choice to feed data to the Andromeda supercomputer.”
Between its sixteen second-gen wafer-scale engines, Andromeda runs on 18,164 Epyc 3 cores. However, that throughput comes at a price. All told, the system consumes some 500 kilowatts when it’s running at its peak.
Go Big or Go Home
Andromeda isn’t the fastest supercomputer on earth. frontier, a supercomputer at the Oak Ridge National Lab capable of doing nuclear weapons simulations, passed the exaflop mark earlier this year. Frontier also runs at higher precision, 64-bit to Andromeda’s 16-bit half precision. But not every operation needs nuclear-weapons-grade precision. Andromeda isn’t trying to be Frontier.
“They’re a bigger machine. We’re not beating them. They cost $600 million to build. This is less than $35 million,” said Feldman.
Nor is Andromeda trying to usurp Polaris, a cluster of more than two thousand Nvidia A100 GPUs at Argonne National Lab. Indeed, like Andromeda, Polaris itself uses AMD EPYC cores to do pre- and post-processing. Instead, each supercomputer excels at a slightly different type of work.
Broadly speaking, CPUs are generalists while ASICs (including GPUs) and FPGAs are more specialized. That’s why crypto miners love GPUs. The blockchain involves a whole lot of repetitive math. But Andromeda is still more specialized. It excels at handling large sparse matrices — multi-dimensional arrays of tensor data that’s mostly zeroes.
AI is profoundly data-intensive, both in the pipeline and the actual AI compute. So, Feldman said, Andromeda uses Epyc processors to streamline the process. “The AMD Epyc-based machines sit in servers outside the Cerebras CS-2s,” said Feldman, to coordinate and prepare the data. Then, Andromeda’s SwarmX and MemoryX fabrics take over.
A GPU cluster has to coordinate between each core, card, and server rack. This incurs an unavoidable delay. There’s also an exponential memory overhead as networks get larger and more complex. In contrast, WSE-2 handles much of its information pipeline within the same piece of hardware. At the same time, Cerebras’ manycore wafer-scale processors can do more on a single (gigantic) piece of silicon than a consumer CPU or GPU. This allows Andromeda to handle profoundly parallel tasks.
Large Language Models
In the same way that a Formula One racecar is wasted on surface streets, Andromeda finds its stride at scale. Nowhere is this more evident than its runaway success with large language models (LLMs).
Imagine an Excel spreadsheet with a row and column for each and every word in the whole English language. Natural language processing models use matrices, special grids not unlike a spreadsheet, to track the relationships between words. These models can have billions, even tens of billions, of parameters. Their sequences can be 50,000 tokens long. You’d think that as the training set grew, that exponential overhead would strike again. But LLMs often work using the sparse tensors Andromeda loves.
Andromeda customers including AstraZeneca and GlaxoSmithKline report success using LLMs on Andromeda to research “omics” including the COVID genome and epigenome. During one experiment at the National Energy Technology Lab, scientists describe doing “GPU impossible” work with Andromeda that Polaris simply couldn’t complete. And it might not crunch numbers for nuclear bombs, but Andromeda is also at work on fusion research.
“Pairing the AI power of the CS-2 with the precision simulation of Lassen creates a CogSim computer that kicks open new doors for inertial confinement fusion (ICF) experiments at the National Ignition Facility,” said Brian Spears of the Lawrence Livermore National Lab.
Andromeda Meets Academia
Andromeda currently lives at Colovore, a HPC data center in Santa Clara. But Cerebras has also allocated time for academics and grad students to use Andromeda for free.
And there’s another thing grad students, in machine learning and elsewhere, may wish to note: Andromeda plays well with Python. In machine learning, that’s table stakes, but we mean really well. You can send an AI job to Andromeda, Cerebras says, “quickly and painlessly from a Jupyter notebook, and users can switch from one model to another with a few keystrokes.”
“It is extraordinary that Cerebras provided graduate students with free access to a cluster this big,” said Mateo Espinosa, a doctoral candidate at the University of Cambridge in the United Kingdom. Espinosa, who formerly worked at Cerebras, is working with Andromeda for his thesis on explainable artificial intelligence. “Andromeda delivers 13.5 million AI cores and near-perfect linear scaling across the largest language models, without the pain of distributed compute and parallel programming. This is every ML graduate student’s dream.”
Machine learning has to swim upstream in an ever-growing river of data. To a point, we can just throw more computing hardware at the task. But within and between networks, latency starts to stack up fast. To get the same amount done in a given time, you have to throw more energy at the problem. The sheer volume of data makes throughput its own bottleneck. That “triple point” is where Cerebras seeks to make its mark.
All images of Andromeda courtesy of Cerebras.