Convolutional neural networks: what they are.  An animation reveals how it works

In the human brain, billions of neurons are joined together through synapses. There neural network which causes information to travel through electrical and chemical impulses allows the brain to process information, learn from past experiences and respond to new stimuli.

The artificial neural networks (ANN) are machine learning models that are inspired by brain mechanisms. Artificial neurons are organized into calculation unit interconnected in layers and with a different “weight”. In artificial neural networks, learning occurs throughweight adjustment, in order to improve the network’s ability to perform a specific task, such as data classification, text generation, regression, and so on. For regression means a statistical model useful for predicting or estimating the value of a dependent variable based on the values ​​of one or more independent variables. In short, the challenge is to establish how one variable is related to another variable or influences its trend.

What are Convolutional Neural Networks (CNN)

Artificial neural networks can have different architectures and topologies. The convolutional neural networks (CNN, Convolutional Neural Networks) are a specific type of artificial neural network designed for the recognition of pattern in two-dimensional data, such as images.

How it works and what is the recognition of pattern

The recognition of pattern consists of the ability to identify “regularities”, structures or trends in complex data sets. The main objective is to identify gods patterns in the data: They can be visual in images, audible in audio signals, statistical in a numerical data set, and so on.

This technique is widely exploited in multiple fields, including artificial vision (recognition of objects in images), the recognition of natural language (text comprehension), signal processing (recognition of patterns in audio signals), medicine (medical diagnosis based on images or clinical data) and much more.

The processes of convolution, pooling and training

Convolutional neural networks attempt to emulate brain processing mechanisms through the use of “filters” o “kernel” which detect similar features in images. The process of convolution involves applying filters on different regions of an image to extract significant features. The activity is normally carried out additionally layers to move, in the end, to pooling: It reduces the size of the resulting image while retaining the most important features.

After one or more convolution layers and poolingCNNs usually contain one or more fully connected layers, similar to the layers of traditional artificial neural networks. The various layers take the features extracted from the previous phases and use them to make complex decisions, such as image classification. The training phase allows the neural network to independently determine which characteristics of the images are important.

The main differences between traditional artificial neural networks and convolutional neural networks

In traditional artificial neural networks, each neuron of a layer is connected to all the neurons of the next layer. This scheme is suitable for managing tabular or sequence data, but does not take into account the spatial structure in two-dimensional data such as images. Conversely, CNNs are able to “capture” this information by recognizing characteristics such as linee, curve e texture. Furthermore, while traditional ANNs require a set of features manually extracted from the input, CNNs are self-powered.

If ANNs are used for speech recognition, natural language processing, tabular data classification; CNNs excel at object recognitionin image segmentation and other activities related to computer vision.

Understand how convolutional neural networks work with an animation

The ones described are not exactly simple concepts. Thus, the authors of the project Animated AI have decided to publish some splendid animations on GitHub that explain how CNNs work.

The graphics show the basic operation of convolution, the process called “padding” (solves the problem of reducing the data size of output after the convolution) and the so-called “stride” i.e. the way in which the filter moves over the image. The accompanying YouTube video is excellent because it serves as a hyphen: focuses not on calculations but on fundamental concepts of CNNs.

Convolutional neural network (CNN): animation of operation

The cube positioned at the top of the animation contains the input data. They are divided into one grill: On the other hand, the input of a CNN is an image i.e. a array of values corresponding to every single pixel that occupies a precise position in the image itself.

The characteristics of an RGB image

In a’immagine RGB (red, green, blue) the intensities of each of the three primary colors are represented with separate matrices. Each matrix represents theintensity of the respective color in each pixel that makes up the image. For example, if you have an image of size 100×100 pixels, you have three matrices, each of size equal to 100×100 pixels, to represent the intensity of red, green and blue at each point in the image . If a grayscale image only has two dimensions (height and width), an RGB image will have three dimensions (height, width and depth).

Convolution and using deeper layers

By placing the cube in front of the filter, the CNN cuts out a portion of the input data (for example, a 3 x 3 interval) and performs an operation that makes it easier to recognize the characteristics of each pixel. Convolution applies the scalar product between the filter and the corresponding matrix of the input. The filter contains a specific pattern between theoutput expresses how much theinput reflects the specific one pattern.

A “surface layer” of the CNN can search for a particular color, a line, a curve; The deeper layers for example, they can identify the features of a dog, a dog’s muzzle or a human face, for example that of a specific person. After examining a square-shaped interval, we move on to the next, until the entire two-dimensional structure provided in input has been analyzed. The interval at which the filter moves is called “stride“.


Please enter your comment!
Please enter your name here