Elsa Dupraz, 2024 IMT-Academy of Science Young Scientist Prize

Channel coding is the focus of Elsa Dupraz's research, a key process for improving information transmission in telecommunications. While the technique’s effectiveness in this field is well known, the researcher is also exploring novel applications, such as data compression and DNA storage. Her research, at the intersection between several disciplines, has fostered multiple collaborations and earned her the 2024 IMT-Academy of Science Young Scientist Prize.

In brief:

Channel coding involves introducing redundant bits into a message, to protect it from potential errors during transmission.
This method can also be applied to information stored in DNA, as errors can occur during synthesis and reading.
It can be used to compress data for machine learning as well.

Tags:

# channel coding

# DNA

# IMT Atlantique

# IMT-Academy of Science Prize

Elsa Dupraz awarded the 2024 IMT-Academy of Science Young Scientist Prize.

Channel coding is a technique used in most telecommunication standards (4G, 5G, Wi-Fi, etc.) to make data transmissions more reliable. During such transmissions, errors can arise due to various sources of noise or interference, and corrupt the signal. “It’s like when you’re surrounded by noise in a restaurant and you don’t quite understand what someone’s saying to you,” says Elsa Dupraz, information processing researcher at IMT Atlantique. “To continue the analogy, in telecommunications, one option is to ‘speak’ – i.e. transmit – louder, but this also requires more energy. Another possibility is to code the information.”

The principle of channel coding is to introduce additional information, known as “redundancy”, into the data before it is transmitted. Since information is made up of a sequence of bits – the basic unit, with a binary value of 0 or 1 -, redundancy involves adding extra bits to the sequence. It can be likened to saying “M for Mom” when speaking aloud: a redundancy of information is added to remove the ambiguity between the sounds “M” and “N”. Of course, in telecommunications, things are more complicated. Redundancies are algorithmically processed by a decoder to identify and correct errors. There are several ways of adding redundancy, and therefore several families of codes, including turbo codes and LDPCs, which are widely used today.

Channel coding is Elsa Dupraz’s specialty. While the technique is well mastered in wireless transmissions, the young researcher is exploring applications in less conventional fields, including in-memory computing and DNA data storage. Her involvement in ambitious projects at the intersection between several disciplines earned her the 2024 IMT-Academy of Science Young Scientist Prize.

The fortuitous discovery of a multifaceted technology

Elsa Dupraz’s interest in channel coding was awoken when she was still a PhD student. Her thesis focused on information theory and compression, which involves reducing the size of information in order to transmit it more quickly. She then discovered that an original way to compress information was to use channel coding techniques. But it was during her postdoctorate, which she partly undertook at the University of Arizona in the United States, that she realized how many different applications there were for these techniques, including information storage.

“In France, channel coding is widely used by telecoms companies, which are firmly implanted in local areas. Whereas in the USA and Asia, the industrial ecosystem is also focused on information storage and manufacturing hard drives and RAM memory, which facilitates channel coding being applied in other scientific contexts“, says the researcher. In 2015, Elsa Dupraz joined IMT Atlantique (then known as Télécom Bretagne) and, inspired by her time in the USA, began looking at applications of channel coding in sectors other than telecommunications, including in-memory computing.

Limiting transfers between memory and processors in AI

In artificial intelligence, neural networks perform calculations by applying a weight matrix to the processed data, in order to adjust its importance. During the learning process, the weights are updated in each step and stored in the memory. In the following step, the weights are loaded from the memory to the computing units (as in graphics processing units or GPUs). Once the mathematical operations have been carried out, the readjusted weights are stored back in the memory. And so on and so forth.

In very deep or complex neural networks, with a large number of weights, the constant to-and-fro between memory and computing units consumes a great deal of time and energy. Optimization therefore involves performing calculations where the weights are stored, without having to transfer them to external processors: this is in-memory computing. Nowadays, the latest non-volatile RAM technologies – which do not lose their information when the memory is turned off – are capable of supporting such operations, but in-memory computing is still in the prototype stage.

Memories are less reliable than processors and introduce errors into calculations, requiring correction mechanisms. “If we see these errors as noise, my aim is to measure the effect of this noise on the final result of computing and use channel coding to protect operations from it,” says Elsa Dupraz. Between 2019 and 2024, the researcher’s work was supported by the Transatlantic Research Partnership and Samuel de Champlain programs, thanks to international scientific collaborations with the University of Illinois at Urbana-Champain, USA, and Polytechnique Montréal, Canada, respectively.

DNA, nature’s super hard drive

Once these programs were completed, the researcher devoted much of her energy to a very different field of application: DNA data storage. Faced with the ever-increasing energy issues surrounding data centers, DNA has been a serious avenue of research for the past ten years or so, as a medium for information storage. Because of its density, DNA could store information much more compactly than is currently the case. It is also sustainable and robust. “DNA is resistant to temperature variations – just like humans – which is not the case for data centers, which have to be constantly cooled,” says Elsa Dupraz.

That is how scientists began to consider the possibility of storing data on synthetic DNA. The principle is the same as for drug synthesis: DNA is made by combining the four nucleotide bases A, C, G and T, each of which can contain two bits of information. While some base sequences produce molecules with therapeutic properties, the idea behind DNA data storage is to assemble the bases in such a way as to synthesize “inactive” molecules. In France, this research is structured by the MoleculArXiv PEPR, which features numerous teams from a wide range of disciplines, including computer science, biology, chemistry and biosafety. Elsa Dupraz has offered her expertise to support error correction in two main operations: synthesis and sequencing.

Two levels of error

Today, the main challenges in DNA synthesis are to increase chain length to increase storage capacity, and reduce the time taken by this process. “Synthesis is a very reliable operation – originally developed for drug manufacturing – but a very slow one. It takes several hours to make a DNA molecule,” says Elsa Dupraz. One of the PEPR’s objectives is therefore to develop faster synthesis methods, even if this means tolerating a margin of error. In fact, in the first prototypes, anomalies were introduced when writing the DNA, such as inserting, deleting and substituting bases, plus similar errors when sequencing. “Sequencing makes it possible to decipher the DNA, but reading it is not very reliable,” says the researcher. “This is where channel coding comes into play once again: in an attempt to retrieve the information, or even protect it.”

With her team, she designed a statistical model to characterize and correct the errors specific to these problems. “In channel coding for wireless telecommunications, the models are well established and understood. There were a few for in-memory computing, but those available for DNA data storage were not precise enough to be used.” To develop a suitable model, the researcher was able to use data – original messages and readings – provided by a bioinformatics team, already a partner on a previous project funded by Labex Cominlabs, dnarXiv. This model, incorporated into the decoder, made it possible to correct errors in both synthesis and sequencing.

An innovative approach to data compression

While Elsa Dupraz’s work mainly focuses on creating channel codes to improve transmissions, she also explores more unusual uses for this technique, such as data compression, which she discussed in her PhD. Typically, channel coding introduces redundancies, lengthening a message, but this process can also be reversed in order to compress it.

Although this method is less efficient than conventional compression techniques, it better preserves the structure of the data, facilitating its processing by learning models. Standard compression techniques alter the structure of the data, making it difficult to exploit for machine learning. Elsa Dupraz is studying the possibility of directly applying AI models to compressed data using channel codes. Proof, if proof were needed, of the great versatility of this technique and its researcher.

Scientific news from the Institut Mines-Télécom.

Elsa Dupraz, 2024 IMT-Academy of Science Young Scientist Prize

In brief:

Tags:

The fortuitous discovery of a multifaceted technology

Limiting transfers between memory and processors in AI

DNA, nature’s super hard drive

Two levels of error

An innovative approach to data compression

Read more

From 5G to 6G: the unstoppable progress of generations

5G & 6G: the antennas that shape networks

5G connecting constellations