Post-doc in “Ethical multiagent reinforcement learning: a normative-base approach” – Post doc contract of 15 months

On-site
- SAINT ETIENNE, Auvergne-Rhône-Alpes, France
Informatique

Job description

Joining Mines Saint-Étienne means committing to an institution where science and innovation build a more sustainable future. It is a school of excellence where everyone has the opportunity to unlock their full potential and contribute to tackling the challenges of tomorrow.

Ranked among the top engineering schools in France and recognized worldwide, our school, a member of the Institut Mines-Télécom, educates the talents of tomorrow while actively addressing major industrial, digital, and environmental challenges. By joining us, you become part of a community of 500 members of staff, 2,500 students and take part in an ambitious project: combining academic excellence, cutting-edge research, and positive societal impact.

The Institut Mines-Télécom brings together France’s leading Grandes Ecoles to tackle major industrial, digital, energy, and environmental challenges. With its eight public Grandes Écoles and two affiliated Graduate Schools, it is the leading public institute dedicated to engineers and managers. Together, we imagine and build a sustainable future by educating the leaders who will shape tomorrow’s transitions.

🔍 What you expect from you

As a Post-doc in ETHICAL MULTIAGENT REINFORCEMENT LEARNING, you will be at the heart of our missions in research, and innovation, assigned to the Laboratory of Informatics, Modelling and Optimization of the Systems (LIMOS) UMR 6158 CNRS and hosted at the Institute Henri Fayol Education and Research Centre. Within this centre, you will combine your passion for science and for societal impact.

The Institute Henri Fayol focuses on current transformations in the digital, ecological and industrial transitions that are at the heart of the efficiency, resilience and sustainability of industry and territories. It develops a multi-disciplinary strategy combining strong skills in mathematical and industrial engineering, computer science and intelligent systems, environmental and organizational engineering, and responsible management and innovation, in conjunction with the EVS UMR 5600, LIMOS UMR 6158 and COACTIS research units.

The post-doc position is part of the ANR ACCELER-AI project (Adaptive Co-Construction of Ethics for LifElong tRrustworthy AI, ANR-22-CE23-0028-01) [https://projet.liris.cnrs.fr/acceler-ai/], which aims to produce hybrid AI systems capable of learning behaviours in line with moral values, in co-construction with users. The ANR ACCELER-AI is a collaborative project among the University Claude Bernard Lyon 1 - Laboratoire d’InfoRmatique en Image et Systèmes d’information (LIRIS), Lyon Catholic University (UCLy), and Mines Saint-Etienne – Laboratory of Informatics, Modelling and Optimization of the Systems (LIMOS) represented by the Institute Henri Fayol.

The importance Artificial Intelligence (AI) systems are gaining in our daily lives makes it a crucial and pressing matter to ensure that they are in line with (moral) values and respect social and legal norms. These systems, by interacting with humans and, more generally, being immersed in our societies, have a direct impact on our lives. This urges AI researchers to develop more ethically-capable systems, shifting from ethics in design [2] to ethics by design with "explicit ethical agents" [4] able to behave ethically thanks to the integration of reasoning on and learning of ethics.

Reinforcement learning (RL) methods enable agents to learn making decisions, but they do not guarantee safety or compliance with ethical values or legal and social norms. Safe reinforcement learning (SRL) [3] has been proposed to ensure reasonable system performance and respect to safety constraints during the learning or deployment processes. The Shield RL [1] monitors the environment and prevents the execution of actions that would violate formally defined safety constraints or norms. The Norm Guided RL [5] uses a normative supervisor that evaluates the potential agent's actions against a normative system and informs the decision-making process of the agent, which may still choose to perform an action that violates safety constraints or norms.

In the context of the ACCELER-AI project, we extended the Norm Guided RL proposed in [5] to enable expressing norms declaratively and in ordering preference using Answer Set Programming (ASP). By being declarative, those norms may be easily changed either by human beings or by dedicated agents. Our preliminary results in a simple Pac-Man scenario replicates previous works and corroborates their conclusions that influencing directly the learning process based on the information about the violation of ethical constrains leads to better outcomes than only constraining agents' action space at run-time.

The post-doc will investigate further the Norm Guided RL approach in a different scenario and contrast the advantages and disadvantages of this approach with the Reward Shaping approach in RL [6]. Once concluded, the post-doc will extend the proposed approach to bounding ethical behaviours in the context of Multi-Agent Reinforcement Learning (MARL) and demonstrate its effectiveness in a prototypical demonstrator in the domain of mobility or smart grid.

Reporting to the ACCELER-AI project coordinators and working in collaboration with the other project partners, post-doc main tasks will be:

Design and perform experiments of the Norm Guided RL approach in a single agent context considering different scenarios (e.g., harvesting [7]) contrasting it with Reward Shaping.
Elaborate a state of the art on mechanisms for bounding agents learning in the context of MARL.
Design novel mechanisms for bounding the learning of ethical behaviours extending the Norm Guided RL approach in the context of MARL.
Integrate the designed mechanisms and conduct experiments in the prototypical demonstrator in the domain of mobility or smart grid.
Propose metrics to assess the effectiveness of the designed mechanisms
Transfer of knowledge and results to other academic partners.
Dissemination of this work in conferences and journal papers of the domain.
You will report on the activities and results related to your assigned missions.

Your ability to work in a project-based environment and to create links between research and practical applications will be essential for success.

[1] Carr, S., Jansen, N., Junges, S., & Topcu, U. (2023). Safe reinforcement learning via shielding under partial observability. In Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence (AAAI'23/IAAI'23/EAAI'23), Vol. 37. AAAI Press, Article 1654, 14748–14756. doi: 10.1609/aaai.v37i12.26723

[2] Dignum, V. (2019). Responsible Artificial Intelligence: How to Develop and Use AI in a Responsible Way. Springer Nature.

[3] Gu, S., Yang, L., Du, Y., Chen, G., Walter, F., & Wang, J. (2024). A Review of Safe Reinforcement Learning: Methods, Theories, and Applications. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(12), pp. 11216-11235. doi: 10.1109/TPAMI.2024.3457538

[4] Moor, J. H. (2006). The Nature, Importance, and Difficulty of Machine Ethics. IEEE Intelligent Systems, 21(4), pp. 18-21. doi: 10.1109/MIS.2006.80

[5] Neufeld, E., Bartocci, E., Ciabattoni, A., & Governatori, G. (2021). A Normative Supervisor for Reinforcement Learning Agents. In: Platzer, A., Sutcliffe, G. (eds) Automated Deduction – CADE 28. CADE 2021. Lecture Notes in Computer Science, vol 12699. Springer, Cham. doi: 10.1007/978-3-030-79876-5_32

[6] Laud, A. D. (2004). Theory and Applications of Reward Shaping in Reinforcement Learning. [Doctoral dissertation, University of Illinois at Urbana-Champaign] ProQuest Dissertations & Theses Global.

[7] Woodgate, J. & Ajmeri, N. (2025). Combining Normative Ethics Principles to Learn Prosocial Behaviour. Proceedings of the 24th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), p. 1-3.

Job requirements

🔍 What we are looking for

You hold a PhD (or equivalent doctorate) in Computer Science or equivalent and possess :

Scientific expertise in Reinforcement Learning, Multi-Agent Systems, and/or Normative Systems.
Strong expertise in scientific publication and research.
Proficiency in English (C1 level) and, ideally, international experience.

Your additional strengths:

CNU qualification in sections 26, 27, or 61 will be appreciated favourably
Collaborations with industry and applied research experience
Proven commitment to scientific outreach and public engagement.

🌍 Why join Mines Saint-Etienne ?

We support each of our employees on the path to excellence, with the conviction that together, we can have a lasting and significant impact on our world.

Joining Mines Saint-Etienne is the opportunity to find:

A stimulating environment: Cutting-edge experimental resources, a welcoming working environment and a solid international network (T.I.M.E., EULIST).
A real impact: Contractual research projects worth €11 million/year, mainly with industrial partners
An incomparable quality of life: 49 days of paid leave, partial remote working, 75% reimbursed public transport, financial support for carpooling and cycling and a social barometer where 83% of employees praise the quality of life at work.

Let's build a more sustainable future, through science, engineering, and projects that make sense.

📩 Apply now !

Deadline : 31 August 2025

Submit your application to our dedicated platform https://institutminestelecom.recruitee.com/o/postdoc-in-ethical-multiagent-reinforcement-learning-a-normative-base-approach-post-doc-contract-of-15-months

A cover letter,
A CV detailing your teaching activities, research work and, if possible, relations with the economic world,
If possible one or more letters of recommendation,
A copy of your diplomas (PhD or doctorate),
A copy of an identity document.

Desired start date: 1 November 2025

ℹ️ Additional information

Nature and duration of the contract : Position of Post-doc on a fixed-term contract for 15 months
Location : Saint-Étienne (42)
Remuneration set according to the candidate's profile, according to the rules defined by the management framework of the Institut Mines Télécom
The positions offered for recruitment are open to all with, on request, facilities for candidates with disabilities
Position open to civil servants (category A) and/or people on public contracts
All applications may be subject to an administrative investigation

https://www.mines-stetienne.fr/recherche/centres-et-departements/institut-henri-fayol/

Contacts :

o On the content of the position :

Luis Gustavo NARDIN, Associate Professor (Researcher at LIMOS UMR 6158)

Mail : luisgustavo.nardin@emse.fr

Tel. : +33 (0)4 77 49 97 00

o For HR and administrative aspects :

Amélie HUCHET – Gestionnaire RH

Mail : amelie.huchet@emse.fr

Tel. : +33(0)4 77 42 93 05

Recruitment process (Expected dates) :

o Application Deadline : August 31, 2025

o Interviews : between September 01-15, 2025

o Desired Start Date : November 01, 2025 (Flexible)

Other relevant resources

o ACCELER-AI project : https://projet.liris.cnrs.fr/acceler-ai/

o Institute Henri Fayol : https://fayol.wp.imt.fr

o Laboratory of Informatics, Modelling and Optimization of the Systems (LIMOS) : https://limos.fr

o Laboratoire d’InfoRmatique en Image et Systèmes d’information (LIRIS) : https://liris.cnrs.fr

o Lyon Catholic University (UCLy) : https://www.ucly.fr

Post-doc in “Ethical multiagent reinforcement learning: a normative-base approach” – Post doc contract of 15 months

Job description

Job requirements

All done!