Aller au contenu

Post-doctoral fellow in model-based reinforcement learning - 24 months contract

  • Hybrid
    • Palaiseau, Île-de-France, France
  • Informatique

Job description

Who we are ?

Télécom Paris, part of the IMT (Institut Mines-Télécom) and a founding member of the Institut Polytechnique de Paris, is one of France's top 5 general engineering schools.

The mainspring of Télécom Paris is to train, imagine and undertake to design digital models, technologies and solutions for a society and economy that respect people and their environment.


We are looking for our future postdoctoral researcher in model-based reinforcement learning to join the Computer Science and Networks (INFRES) department at Telecom Paris.

Reinforcement learning (RL) has emerged as a useful paradigm for training agents to perform complex tasks. Model-based RL (MBRL), in particular, promises greater sample efficiency and sophisticated planning capabilities by enabling an agent to learn a predictive model of its environment. However, the direct application of current MBRL methods to safety-critical domains, such as, autonomous robotics, transportation, or industrial control, is hindered by unresolved challenges. The core scientific challenge: The limitations of current world models. Standard approaches to MBRL typically learn a monolithic, “black-box” world model, often using a large neural network as function approximators. While these models can be highly effective for prediction within their training distribution, they suffer from two key limitations for deployment in sociotechnical systems:

  1. Brittleness and unpredictable failures: Learned models are prone to unpredictable failures when the agent encounters unseen states or dynamics (i.e., distributional shift). These failures are difficult to anticipate and can lead to unsafe behavior, as the model’s predictions are no longer reliable.

  2. Lack of verifiability: The learned models are opaque and do not come with formal guarantees. It is not possible to prove that the model will consistently respect fundamental constraints of the real world or be aligned with expected values, such as physical laws, safety rules, or logical invariants. This lack of verifiable correctness is a major barrier to building trustworthy and well-calibrated autonomous systems.

Research focus: Verifiable world models. The research will focus on developing a new class of structured, verifiable world models that integrate the flexibility of deep learning with the rigor of formal methods and compositional reasoning. The core research thrusts of this position are:

• Structured, neurosymbolic models: The research will investigate model architectures that are not learned from a blank slate. Instead, they will be designed to incorporate explicit symbolic knowledge. This could include known physical laws, logical rules, or safety constraints, which are treated as fixed, verifiable components of the model. The learning process then focuses on modeling the more complex, unknown aspects of the environment around these established truths.

• Compositional reasoning for safety: We will explore how a complex world model can be constructed by composing smaller, more specialized sub-models. A key research question is how to formally verify properties of the composite model based on the known properties of its individual components. This provides a modular and scalable path to certifying that the agent’s internal model of the world is, and remains, consistent with its safety specifications.

• Model adaptation: A truly intelligent agent must be able to adapt its understanding of the world from experience. This research will develop a framework for safe model adaptation. This involves creating MBRL algorithms where the agent can propose updates to its own world model structure, but these updates are only accepted after a formal verification step confirms that the new model still adheres to its core safety properties.

• Multitask learning: Task decomposition allows agents to learn transversal skills that can be useful in different contexts. Shared representations, multitask and multiobjective RL paradigms improve generalization. The research in this area will explore how to capture task decomposition in world models to enable multitask specifications with verifiable guarantees.

The successful candidate will lead the solution of these open problems through the development and implementation of RL algorithms. They will have the opportunity to make a significant impact in the field of trustworthy and well-calibrated artificial intelligence (AI) through international collaborations (e.g., UT Austin, MIT).

Your main responsabilities :

  • To carry out research missions in the field of model-based RL.

  • To ensure supervision and tutoring missions

  • To contribute to the reputation of the School, the Institut Mines-Télécom and the Institut Polytechnique de Paris

Job requirements

We are looking for a candidate with a solid theoretical understanding of reinforcement learning, accompanied by a strong foundation in mathematics. You must also have proven experience in programming reinforcement learning agents, particularly with tools such as JAX, PyTorch, Gym, etc.

A proven ability to publish in leading scientific conferences and journals is essential, as is an aptitude for sharing and disseminating your knowledge within the team. Finally, you must be fluent in English in order to thrive in an international environment. You hold a PhD or equivalent. Your level of English is professional.

Why join us?
You'll be working in a fast-growing, pleasant, green and accessible environment (especially for people with disabilities) just 20 km from Paris (RER B and C suburban train lines, close to major roads, shared shuttle departing from Porte d'Orléans). You will benefit from :

  • 49 days annual leave (CA + RTT)

  • flexible working hours (depending on department activity)

  • telecommuting 1 to 3 days/week possible

  • 75% public transport pass reimbursement

  • Proximity to numerous sports facilities, concierge service, underground parking, in-house catering, etc.

  • Good to know: our social security contributions are lower than in the private sector

Other information :
Application deadline: January 10, 2026
Job type : 24 months fixed-term contract
Job description here

Scientific contact person : Georgios Bakirtzis (bakirtzis@telecom-paris.fr)
Administrative contact person : Najoua Kharmaze

Funding: This postdoctoral position is partially supported by the chair Architecture of Complex Systems – Dassault Aviation, Naval Group, Dassault Systèmes, KNDS France, Agence de l’Innovation de Défense, Institut Polytechnique de Paris.

Our recruitment is based on skills, without distinction of origin, age, gender identity, or sexual orientation, and all our positions are open to individuals with disabilities.

or