L’von Neumann’s architecture, named after the mathematician and computer scientist John von Neumann, is a classic architecture that has become the basis for the design of most modern computers. The architecture is in fact characterized by several key components that operate together in a coordinated manner: CPU, memory, I/O unit, system bus, control unit, storage. Even if it was possible for decades scale architecture of von Neumann in a rather simple way, the bottlenecks of this approach have been clear to everyone for some time.
The CPU often has to wait for the data recovery from memory, which is slower than the CPU itself; there are energy efficiency problems; scalability has now reached a limit that is difficult to overcome. To overcome some of these problems, new architectures have been developed that exploit theparallel processing and quantum computing.
While the market began to offer a few years ago faster interfaces for the exchange of information between CPU and memory (think of Infinity Fabric, NVLink and CXL), experiments in the field of neuromorphic computing continue in parallel: it is inspired by the functioning of the human brain and seeks to develop computing systems based on biological principles.
IBM NorthPole overcomes the von Neumann bottleneck
In 2014, IBM’s research and development team led by Dharmendra Modha developed a chip (TrueNorth) whose functioning was inspired by brain mechanisms. In the following years, the working group continued with the studies, arriving today at the revelation North Polea chip specialized in neural inference.
L’neural inference it is a computational process that simulates the functioning of the nervous system, in particular biological neural networks, to extract information or make decisions based on data or stimuli sent as input. The expression is used in the context of artificial intelligence applications and machine learning: le neural networks artificial data are used to perform inference operations, that is, to draw conclusions and make predictions based on data.
As IBM engineers explain, the biggest difference between NorthPole and conventional products is that each of its 256 core comes with its own memory: This way each core can perform inference tasks faster than any existing chip. Architecturally, NorthPole blurs the line between compute and memory. However, NorthPole’s biggest advantage is also its own limit: the integration of memory at core level does not allow the chip to be exploited for more “generalist” operations.
The performance of the new IBM chip for artificial intelligence
The processor for artificial intelligence born in the laboratories of Big Blue showed higher power efficiency, better space efficiency and lower latency than any other chip currently on the market in tests conducted on the image recognition ResNet-50 and YOLOv4 object detection models.
Using the ResNet-50 model as a benchmark, NorthPole is noticeably more efficient than common 12nm GPUs and 14nm CPUs (NorthPole itself is built using a 12nm process). Either way, NorthPole is 25 times more energy efficient when it comes to number of frames interpreted per joule of power required.
NorthPole also outperformed in terms of latency, as well as space required for computation, in terms of frames interpreted per second per billion transistors required. According to Modha, on ResNet-50, NorthPole outperforms all major popular architectures, even those that use more advanced technological processes, such as a GPU built on a 4nm manufacturing node.
From a purely technical point of view, NorthPole contains 22 billion transistors in 800 square millimeters; it can perform 2,048 operations per core per cycle with 8-bit precision.
During testing, the NorthPole team focused primarily on battery-related uses artificial vision, in part because funding for the project comes from the US Department of Defense. Some of the main applications under consideration include person and object detection, image segmentation and video classification. However, the chip can also be used in other areas, such asnatural language processing or for voice recognition.
The images published in the article are taken from the note published by IBM Research.