Programming

NVIDIA Blackwell and B200 GPU: artificial intelligence increasingly protagonist

NVIDIA Blackwell and B200 GPU: artificial intelligence increasingly protagonist

An essential part of the commitment to new artificial intelligence solutions and the creation of humanoid robots is the launch of even more powerful GPUs.

NVIDIA just presented theBlackwell architectureevolution of the previous one Hopper announced in 2022. Its name is a tribute to the mathematician and statistician, the first African American to be included in the National Academy of Sciences Hall of FameDavid Harold Blackwell.

What the NVIDIA Blackwell architecture looks like

NVIDIA technicians focused on 6 characteristics of the Blackwell architecture indicated as fundamental. First of all, the new generation of GPU it will be much more powerful.

The GPU H100, based on the Hopper architecture, is equipped with 80 billion transistors produced with TSMC’s 4nm manufacturing process. However, the GPU Blackwell it is equipped with 208 billion transistors, using an improved version of the same process. Connecting two die GPU via high-speed interconnect interface NV-HBI from 10 TB/s, you get a single GPU with even higher performance.

Il Transformer engine second generation optimizes calculations and modeling with improved inference abilities. All by combining the use of the library NVIDIA TensorRT-LLM per i LLM (Large Lange Models), the NeMo framework and support for it scaling of tensor architecture.

The latest version of the interconnect NVLink offers a throughput bidirectional speed of 1.8 TB/s per GPU, ensuring seamless high-speed communication, with the ability to pair up to 576 GPUs for larger LLMs.

The other important features of the latest generation NVIDIA GPUs

Il RAS engine (Reliability, Availability, Serviceability) dedicated works at the chip level, increases resilience, maximizes system uptime and reduces operating costs.

NVIDIA engineers also introduced advanced features of “confidential computation“: Support new hardware-level cryptographic protocols to protect sensitive data and AI models from unauthorized access. According to the company headed by Jensen Huang, performance in terms of throughput would not be affected by the addition of these security measures.

Finally, the decompression engine dedicated supports the latest compression formats such as LZ4, Snappy and Deflate. Furthermore, the ability to access the large memory of the Grace CPU with a bidirectional throughput of 900 GB/s accelerates the performance of any type of query – even the most complex ones – ensuring better performance for data analysis and, in general, for the data scientist.

The first GPU based on Blackwell is NVIDIA B200

The name of the first GPU based on the Blackwell architecture, B200, was leaked last February through Dell vice president Jeff Clack. He forged ahead by revealing that NVIDIA was developing a GPU with a power consumption of 1000W.

Platforms using B200 include the superchip GB200which connects two B200s and a Grace CPU; GB200 NVL72 which pairs 72 B200s and 36 Graces, products such as the integrated artificial intelligence system DGX B200 and the server board HGX B200.

NVIDIA also showed the DGX GB200 system, which combines as many as 36 GB200 units and the DGX SuperPODa new generation AI supercomputer which also bases its operation on the DGX GB200 scheme.

The DGX SuperPOD is a supercomputer that CEO Huang describes as the “engine of the industrial revolution in AI“. In fact, it combines the latest advances in accelerated computation NVIDIA, in the network and in the software to enable businesses, industries and entire countries to develop and improve their artificial intelligence solutions. Just think, on the other hand, that architecture NVIDIA Quantum-2 InfiniBand it allows to pair tens of thousands of GB200 chips: demonstrating what type of systems it is possible to create.

The interconnection InfiniBand Quantum-X800, announced separately, provides up to 1800 GB/s of bandwidth to each GPU in the platform. While with technology SHARP (Scalable Hierarchical Aggregation and Reduction Protocol) fourth generation, NVIDIA achieved performance of 14.4 TeraFLOPS, 4 times higher than the previous generation.

Leave a Reply

Your email address will not be published. Required fields are marked *