Post doctoral in Generation of test data by traffic morphing - 18 months contract

Job description


ABOUT TELECOM SUDPARIS

Telecom SudParis is a public graduate school for engineering, which has been recognized on the highest level in the domain of digital technology. The quality of its courses is founded on the scientific excellence of its faculty and on teaching techniques that emphasize project management, innovation and intercultural understanding. Telecom SudParis is part of the Institut Mines-Telecom, the number one group of engineering schools in France, under the supervision of the Minister for Industry. Telecom SudParis with Ecole Polytechnique, ENSTA Paris, ENSAE Paris and Telecom Paris are co-founders of the Institut Polytechnique de Paris, an institute of Science and Technology with an international vocation.

Its assets include: a personalized course, varied opportunities, the no.3 incubator in France, an ICT research center, an international campus shared with Institut Mines-Telecom Business School and over 60 student societies and clubs.


MISSIONS

The evaluation of security mechanisms is a pillar of product certification. Security monitoring suffers from a lack of methods or platforms for reproducible evaluation. In particular, in order to ensure a high level of security, intrusion detectors need to be subjected to a sufficiently large, diverse and realistic sample of behaviour, both malicious - to assess their detection capability - but also normal - to ensure that they generate no or few false positives.

Another important aspect of the evaluation is to measure the detector's ability to scale. The generation of test data can thus amply supplement test sets to give them a critical size for stress testing.

The type of data generated complements synthetic traffic generation by learning patterns of legitimate or malicious traffic, by providing faster and more widely (the question of effectiveness remains to be solved) data sets similar to the input data sets, but with error-prone characteristics. This type of traffic is well suited for testing intrusion detectors (especially those based on machine learning methods).

In this project, we aim to generate new (unknown) samples, especially for network intrusion detectors, by modifying existing traffic. This approach will have the effect of generating normal traffic that could be classified as (false) positive, or malicious traffic that could bypass the detector (false negative). In the latter case, this approach has been effective in the transformation of Android malware. A final objective is to scale up the evaluation of intrusion detectors by generating a critical volume of test data and adapting the traffic to a target environment. This generation will have to be part of a data-based evaluation methodology for intrusion detectors, including not only the formalisation of properties to be evaluated, but also the approaches to dataset construction (selection, generation, representation, quality, etc.) and the measures to evaluate them. A pitfall, however, is that many generation methods have demonstrated their limitations, including the realism and practicality of the generated data.


ACTIVITIES

We propose to take inspiration from the traffic morphing approach which allows to transform the shape of a network flow in order to bypass statistical analysers. However, this approach remains very limited (modification of the packet size distribution) and requires knowledge of the target distribution. Our approach is based on the use of generative neural networks to generate a greater diversity of traffic. For this purpose, variational autoencoders are used to reproduce traffic that would appear to come from the same distribution, without the traffic being identical. Another method would be to take advantage of Natural Language Processing (NLP) methods to generate traffic in the style of a reference traffic. Another application is to generate or transform malicious traffic so that it is more difficult to detect.

The approach aims first to identify for detectors the boundaries between classes according to the classification parameters. Then to propose transformations allowing to reduce the distance between attack traffic (or packet) and legitimate traffic. The main challenge of such a transformation is the degradation of the harmfulness of the traffic thus obtained.

Furthermore, we seek to extend the number of parameters to be generated beyond the packet size, to also take into account temporal parameters (inter-arrival time). A generalisation of this approach aims at systematically analysing the parameters at flow and packet level in order to determine those likely to be altered. This can be followed by the possibility of proposing various avenues of research that will make it possible to determine relevant transformations or generations for the parameter or type of parameter (numerical, textual, categorical, periodic, discrete, continuous, etc.).

Job requirements

TRAINING AND SKILLS

Level of training and / or experience required:

- PhD or Doctorat for less than 3 years

Essential skills, knowledge and experience:

- Experience in modelling and/or simulation

- Knowledge of modelling languages and formalisms

- Knowledge of virtualisation and network security

Advantageous skills, knowledge and experience :

- Digital twin experience

Abilities and skills:

- Thoroughness

- Autonomy

- Teamwork

FURTHER INFORMATION
- Deadline for applications : February 28th, 2023

- Nature of the contract: Limited contract 18 months

- Location of the position : Palaiseau (France)

- The positions offered for recruitment are open to all with, upon request, accommodations for candidates with disabilities

- Working conditions : teleworking possible, on-site restaurant and cafeteria, accessibility by public transport (with employer's contribution) or close to main roads and staff association