Why are they called “neural networks”?
Stephan Clémençon: The idea behind artificial neural networks is to model a decision based on input data by imitating the human decision-making process. The biological reference is just a metaphor. The intelligence of artificial neural networks is based on computation and is not at all like human intelligence. The brain hasn’t yet yielded its secrets to the point of allowing us to define its functioning by equations!
How do they work?
SC: A neural network is composed of several layers of computational nodes, or artificial neurons: an input layer, one or several hidden layers, and an output layer. The nodes correspond to computational units that perform multiplications and additions to obtain a result (a scalar product, to be precise), of which an image is then calculated, known as an “activation function”.
When information is sent to a neural network, each node in the first layer performs a calculation while applying weights (the network parameters) to the different components of the information. Like in a simplified version the synaptic transmission of neurotransmitters in the biological brain, the results of the calculation are then sent to the neurons/nodes in the next layer. In this way, information is propagated through the network structure to the output layer to obtain the results that will encode the decision.
How did neural networks come into being?
SC: It was two neurology researchers, McCulloch and Pitts, who first proposed a mathematical model for the cerebral neuron in the 1940s, known as the “artificial neuron”. In the 1950s, psychologist Frank Rosenblatt developed the first machine learning algorithm, the perceptron, which is used to learn the parameters of the artificial neuron model (the weights), in order to optimize its predictive capacities. For this, it requires labeled data, representing the input information and the desired responses.
The learning process consists in training AI models to make predictions by reproducing what already exists, which is why it needs examples. However, at the time, there was very limited data available. Scientists could only train very basic machine learning models, compared with the deep learning models of today.
How did we go from “machine” learning to “deep” learning?
SC: There was a long “winter” period during which the technology had no success in applications. With advances in theory and algorithms, most of the concepts were well documented in scientific literature. But data and computing power were insufficient to configure deeper neural networks with increasing layers of neurons.
It was Internet data that breathed new life into neural network research, especially applied research. Development accelerated in the 2010s with big data and the availability of labeled data, particularly pixel-based images (ImageNet), of which the analysis is facilitated by graphics cards.
The development of neural networks has since been marked by milestone applications characterized by increasingly complex tasks such as face recognition, machine listening, and the ability to play combinatory games (AlphaGo). Today, the most spectacular advances are in the field of Natural Language Processing (NLP).
Do these applications use different types of neural networks?
SC: There is indeed a wide range of neural network structures. Some of the main models are the multilayer perceptron (MLP), which is the basic neural network, the convolutional neural network (CNN), which has a redundancy-based structure that mimics the organization of the visual cortex and is mainly used for images, and the recurrent neural network (RNN) for audio signals or time series.
Depending on the data available, different architectures must be tested to measure performance. This exploratory process can be accelerated using automated machine learning (AutoML), but that requires considerable computing power and raises issues concerning the energy use of AI.
What are the specific issues facing NLP?
SC: To implement machine learning, all the input and output data (images, sound signals, etc.) must be transformed into numbers, something which is difficult to do for text. One possible approach is word embedding, a learning method in which words are represented by vectors, which enables the network to learn to predict the next words in a sentence based on context, a process called self-supervised learning. The Word2Vec technique developed by Google in 2013 was one of the first and most well-known models to be applied to this method.
Since 2022, this type of technology has been superseded in practice by transformer models, which are more efficient at analyzing and producing natural language responses. These models, known as large language models (LLM), power what is today known as “generative” AI, i.e. AI capable of carrying out generative tasks such as summarizing text, answering questions, and producing images or sound signals. This type of AI opens up vast perspectives that are still difficult to define.
What are the risks associated with AI and neural networks?
SC: Without listing them all, there is first of all a “model bias”, in the sense that a mathematical model is not reality, but only a representation of it. Neural network models are popular because they can be perfectly numerically controlled and effectively reproduce existing data in a multitude of situations. But there’s no reason to believe that they won’t be outperformed by other types of mathematical function in the future.
Secondly, there’s a sampling bias during the learning phase of these models, since it is obviously impossible to provide exhaustive learning data, so models “generalize” from a finite number of examples. This generalization can often lead to errors.
Admittedly, with big data, this type of bias, of statistical error, has been drastically reduced. But in the absence of control over the acquisition protocol, the data used for AI are not necessarily representative of the target population or the phenomenon being analyzed. This selection bias can also lead to major disparities in performance, such as if we use a neural network for facial recognition to carry out identity checks on population groups that are very poorly represented in the training image database. Not to mention the fact that some data is easy to manipulate or corrupt, or that exploiting protected data raises copyright issues…
So AI still makes a lot of mistakes despite the mass of data available?
SC: The expressiveness of the models and the number and representativeness of the data available enable us to get closer to a minimal level of error, but not necessarily zero! Some tasks present intrinsic difficulties. For example, a facial recognition algorithm is highly likely to perform poorly on certain population groups such as newborn babies or people with glasses and beards, bearing in mind that even the human eye sometimes has trouble telling the difference. In some fields, such as biology and finance, the phenomena analyzed are only very partially determined by the observations available: the evolution of the financial markets or cancer cell development is not written in the data!
That said, with recent progress, the level of error committed by AI for problems such as image analysis or speech recognition has been considerably reduced to the point of surpassing that of human expertise. But while such success is to be welcomed, we must also be aware that such high-performance models run the risk of being misused for malicious purposes, such as creating false faces, imitating voices or writing misleading texts.
You mentioned the issue of energy use in AI. How does this impact current and future developments?
SC: Initially, research into neural networks focused mainly on their predictive performance and the computational issues involved in scaling them up. Although “computational economy” has always been a key concern, model complexity and data redundancy were once seen as assets. Now, with the explosion in uses, especially of generative AI, it is crucial to think about the power consumption of learning algorithms and models, which are very energy intensive. It is a complex issue to quantify because it also depends on the type of computing infrastructure on which the algorithm is implemented or the model deployed.
The scientific community has, of course, been very active with regard to this subject. In this context, distributed learning, edge computing, and the compression of neural networks are becoming key methodological topics that are also a subject of focus among researchers at Télécom Paris. That said, with the ubiquity of digital technologies, the question of AI’s frugality should challenge society more broadly and call for collective reflection on possible restraint in certain uses.