Intel Ponte Vecchio really exists: how high-performance GPUs work

Intel Ponte Vecchio really exists: how high-performance GPUs work


We talk about architecture Intel Ponte Vecchio for over three years but very few have had the opportunity to actually get their hands on high-performance GPUs, designed primarily for data centers. Known for decades for its processors, the Santa Clara company is trying to regain market share also in the world of video cards dedicated, thanks to its X architecturee.

However, Intel has also made bold steps forward in the ultra-professional video card sector and although there is already talk of the successor, Ponte Vecchio (PVC) represents the spearhead of the company led by Pat Gelsinger. PVC marks a first ambitious goal that allows Intel to stand out above all in terms of bandwidth of memory and throughput on calculations FP64fundamental aspects for high-performance computing.

FP64 is a term that refers to the representation and manipulation of floating point numbers (Floating Point) with double precision. In the context of scientific calculation and complex operations that require greater numerical precision, the use of FP64 is essential (use of 64 bits to represent a floating point number; greater precision with more computational resources).

The Intel Ponte Vecchio architecture in brief

Ponte Vecchio represents somewhat of the pinnacle of chip complexity. There are currently various versions: Intel Data Center GPU Max 1550Max 1350 and Max 1100. Both are characterized by a design innovative physics: characterized by a structure that includes a large number of chiplets, what Intel calls tile they host key elements named Xe Cores.

Intel Ponte Vecchio GPU

In the case of the Max 1100 version, the Xe Cores they are arranged on one tile base measuring 640 mm² and which not only integrates a gigantic 144 MB L2 cache but also acts as an interface for HBM2e, PCIe and other GPU peers. The term GPU peer refers to the ability of two or more graphics processing units (GPUs) to communicate directly with each other without going through the CPU or main system. The use of five different process nodes and advanced techniques stacking 3D make PVC a fascinating product from a technological point of view.

The Max 1100 variant integrates 56e Cores and reaches frequencies operating up to 1.55 GHz. With 108 MB of L2 cache enabled on the tile base, it uses 48 GB of HBM2e memory with a theoretical bandwidth of 1.2 TB/s. The card looks like PCIe with a TDP of 300W, positioning itself in competition with AMD MI210 and Nvidia H100 PCIe.

Every Xe Core consists of 8 vector engines from 512 bits able to ensure a throughput nominal of 11.1 billion operations per second.

Max 1550 goes even further: the GPU is in fact composed of a number of transistors exceeding 100 billion (excluding memory), on a total surface area of ​​2,330 mm². The TDP rises in this case to 600W and thee There are 128 cores, always combined with HBM2e memory.

Intel GPU serie Max data center

Cache hierarchy and memory latency

Every Xe Core of the Max 1100 GPU integrates a generous 512 KB L1 cache while the higher level cache it is a 144 MB “colossus”. The only weak point seems to be the latency which, with its 280 ns, appears significantly higher than competing solutions. This is an aspect that many experts are indicating and which could have a direct impact on performance of GPUs equipped with large caches. On the other hand, consumer GPUs themselves tend to offer faster access to local memory.

Chiplet architecture: blessing and curse for Intel

For a company historically anchored to one approach monolithic for the creation of its processors, the choice to use an innovative chiplet configuration represents a real turning point, also confirmed on the CPU side with Meteor Lake, an architecture that promises to be revolutionary.

Chiplet selection has presented and will continue to present significant challenges for Intel. There duplication of logic causes a certain overhead and when evaluating data communications between chiplets, the cooling and heat dissipation implications must also be weighed.

Ponte Vecchio is therefore a valuable learning experience for Intel for future development High-end GPU. It is therefore not a product that represents a point of arrival but rather marks the beginning of a new adventure.

Both cards are designed to best support computer applications artificial intelligence, machine learning e high-performance computing (HPC).

The future of Intel data center GPUs

Flagship GPUs like the ones already mentioned NVidia H100 e AMD MI210 they are among the most complex chips around and push performance to the maximum. They are the result of decades of experience gathered in the design, development and production of large GPUs. Intel, obviously, cannot still enjoy the same experience, although with Ponte Vecchio it wanted to return to “playing the charge”.

The massive use of PVC on some supercomputer it will give Intel the opportunity to collect valuable data on real performance, excellent ideas for optimizing future architectures.

Leave a Reply

Your email address will not be published. Required fields are marked *