The most powerful chip for AI is from Cerebras: 900 thousand cores, 125 PetaFLOPS

The most powerful chip for AI is from Cerebras: 900 thousand cores, 125 PetaFLOPS

Brains, a California-based company specializing in the design and manufacturing of supercomputers intended to support applications focused on artificial intelligence, has announced the launch of its new generation of AI chips. They promise double the performance of the previous generation while consuming the same amount of energy.

Baptized WSE-3 (Wafer Scale Engine 3), the chip has a square shape measuring 21.5 centimeters per side and uses almost an entire 300 millimeter silicon wafer to produce a single specimen. Cerebras thus maintains its lead in producing the largest chip ever.

In terms of transistor, WSE-3 contains 4,000 billion of them, with an increase of over 50% compared to the previous generation, thanks to the use of more advanced production technologies than in the past. A single chip also counts for a lot 900.000 core and is capable of expressing a power equal to 125 PetaFLOPS.

What changes with WSE-3, Cerebras’ most powerful chip for artificial intelligence

First of all, Cerebras seems to follow Moore’s law: the company’s first chip debuted in 2019 and was made using TSMC’s 16nm manufacturing process. WSE-2 arrived in 2021, featuring the TSMC 7nm process while WSE-3 went even further in terms of miniaturization reaching 5 nm.

Il number of transistors has more than tripled since the first Cerebras-branded megachip. There has also been a clear improvement in terms of on-chip memory and bandwidth: WSE-3 uses 44 GB, has a memory bandwidth of 21 Petabyte/s and can use up to 214 Petabit/s in terms of interconnection. The leap forward recorded in terms of floating point operations per second (PetaFLOPS), however, exceeded all other values.

Cerebras WSE-3 compared with NVIDIA H100 GPU

So WSE-3 is already being installed in a data center located in Dallas (Texas, USA): it will support the operation of a supercomputer capable of processing 8 ExaFLOPS. Only in June 2023 did we start talking about the “ExaFLOP generation” and now Cerebras is unveiling a chip capable of managing something like 8 * 1018 floating point operations per secondor an 8 followed by 18 zeros.

In practice, WSE-3 can be used for train generative models capable of relying on 24,000 billion parameters, a value that is impressive when compared to that of LLMs (Large Language Models) more “impressive” (up to 1,500 billion parameters).

CS-3 is the supercomputer built with the new AI chip

Cerebras spokespeople explain that the supercomputer being launched in Dallas will use 64 new chips, combined to form a single system CS-3. Suffice it to say that it is possible to couple up to 2,048 WSE-3s to significantly extend the computational abilities of each system.

Theoretically, a single WSE-3 chip would be equivalent, for AI-related operations to 62 GPU NVIDIA H100.

Train a widely used LLM as Call 70B from scratch it would thus take just a day. The term “70B” refers to the number of parameters present in the model or to the “weights” assigned to nodes within the artificial neural network during the training process. They are essential for the functioning of the model, as they influence the ability of the neural network to learn and generate output accurati.

The agreement with Qualcomm

Although Cerebras computers are designed to optimize and speed up training phases, Cerebras CEO Andrew Feldman claims that the real limitation lies in the mechanisms of inferencei.e. the moment in which the generative model is actually executed.

Cerebras Qualcomm AI inference chip

According to company estimates, if every person on the planet used ChatGPT, it would incur costs in the order of $1 trillion a year, not including the immense amount of energy required. And the operating costs are obviously proportional to the size of the model and the number of users who use it.

Brains and Qualcomm have entered into a collaboration whose aim is to start reducing the cost of inference by a factor of 10. The solution adopted will involve the application of “ad hoc” techniques to compress the weights and remove unnecessary connections.

The networks trained by Cerebras will then be supported, for example, by the chip Qualcomm AI 100 Ultraspecialized precisely in inference activities.

The images published in the article are taken from the official Cerebras website.

Leave a Reply

Your email address will not be published. Required fields are marked *