Article written by Nesrine Kaaniche, Associate Professor, Télécom SudParis – Institut Mines-Télécom and Aymen Boudguiga, Research Engineer in Cybersecurity, French Atomic Energy and Alternative Energies Commission (CEA)
Artificial intelligence (AI) is spreading throughout our daily lives, transforming fields such as medical diagnosis, transport, finance and security. It is increasingly involved in decision-making through the massive collection and analysis of our data, feeding decision-making algorithms and influencing our choices. This omnipresence raises important ethical and social issues, leading to a complex question with no easy answer: should we trust it?
With the EQUIHid project, we are investigating the potential of federated learning to make health services more equitable and respectful of privacy. Federated learning is a way to collaboratively train artificial intelligence models. This process is done in response to a specific problem, such as analyzing mammograms for early cancer detection, or assessing images of skin lesions for melanoma. Each model is trained on a multitude of patient data from different hospitals.
AI models reproducing inequalities
Imagine a model designed to diagnose skin cancer, which has remarkable accuracy in light-skinned patients, but which becomes ineffective when it comes to darker skin tones. The model is considered inequitable, since it unfairly favors one group of patients over another. How is this possible?
The answer lies in the data used to train the model. If the data is biased, over-representing a certain type of patient, the model will perform better with this group, thus reproducing the bias present in the data.
If the training database is predominantly composed of images of light-skinned patients, the model will be less exposed to pigmentation variations and forms of cancer that occur more frequently on darker skin. As a result, it will be less effective at diagnosing melanoma in dark-skinned patients. The consequences of such bias can be serious. Patients that receive an erroneous or late diagnosis could see their health prognosis seriously compromised. What’s more, these biases reinforce existing inequalities in the healthcare system, putting already marginalized groups at a disadvantage.
Such biases are intensified if the large volumes of data on which these models are trained are not representative of the general population. The medical data used comes from hospital consultations. However, as each facility has only a partial view of the problem via its local population, it will have difficulty obtaining an equitable model. One solution to this problem is to combine different data sources to enrich them, and thus ensure equitable federated learning.
Equity, privacy and decentralized learning
The principle involves several entities communicating and cooperating directly with each other, without sharing potentially sensitive data and without having to centralize it on a shared site managed by a third party. Data sovereignty is thereby ensured. However, this is not sufficient to guarantee patient privacy in federated learning databases. Indeed, even if patient data is not directly exposed, the trained models may leak sensitive health information in the event of a cyberattack.
Let’s take the previous example of a skin cancer diagnosis model. A hacker could interact with the model to try to guess personal details about a given patient, including the likelihood of the patient developing the disease. If the model responds with a high degree of certainty, this indicates that the patient in question was probably included in the training set, revealing their predisposition to this disease, without having to directly access their medical data.
The EQUIHid project aims to design new federated learning algorithms that are both privacy-friendly and capable of training non-discriminatory models in a decentralized way. As well as developing algorithms, the project also aims to study the problem theoretically, in order to assess how equity impacts the performance of AI models. How does equity interact with federated learning? How does it interfere with privacy-friendly learning? Finally, how do the three concepts interact with each other?
While the fields of equity, privacy and federated learning have each been extensively studied, their interactions are rarely considered in current scientific literature. It is therefore important to find the right balance to solve this three-parameter equation.
Towards the implementation of more equitable models
Researchers at the National University of Singapore (NUS) have demonstrated that equity in machine learning models has a cost in terms of privacy. Moreover, this cost is not evenly distributed: information leaks associated with learning models are significantly greater for disadvantaged sub-groups, the very ones for whom equitable learning is crucial. In EQUIHid, we demonstrated that the more biased the training data, the higher the cost in terms of privacy to achieve equity for these subgroups.
During the first phase of the project, we explored an existing solution, FairFed, which constructs a learning model from several models with varying levels of equity, with the aim of creating a global model that is more equitable than each individual model. We sought to add additional constraints to this approach, specifically, respect for privacy. To this end, we included an initial proposition based on homomorphic encryption and differential confidentiality techniques.
Homomorphic encryption is a cryptographic technique that enables mathematical operations to be performed on encrypted data, without first having to decrypt it. This ensures data confidentiality during processing. Differential confidentiality, on the other hand, is a mathematical property of statistical data which ensures that it is very difficult to deduce whether or not a specific individual is present in a data set, even after aggregated statistics are published.
Human-centered AI
Our solution is based on these concepts, and makes it possible to train a common model using multiple models based on data from different entities. It combines them and weights their participation according to their level of equity. This ensures greater confidentiality of training data and a more equitable model overall.
In the second phase of the project, we will address the issue of integrity in federated learning, to ensure that the model training process runs smoothly and avoid any deviations with potentially significant consequences, such as generating a biased model leading to incorrect medical diagnoses or massive leakage of sensitive data.
The issue of AI and equity has become a priority for European and international institutions. The Artificial Intelligence Act (AI Act), adopted by the European Parliament in March 2024, highlights fundamental rights in terms of data protection, human dignity and non-discrimination. Thus, research into detecting and reducing, or even eliminating, biases in learning models is a key challenge to promote more equitable, human-centered AI.