Machine Learning for Vehicular Network Keywords: Machine Learning · VANET · Detection · V2X · Federated Learning · Autonomous Car 1 *) Context With the development of the vehicle communication techniques, novel communication systems and concepts like Vehicular Ad hoc Networks (VANET), Vehicle to Everything (V2X), and Internet of Vehicle (IoV) are proposed and have been started in many countries. The European commission strongly motivates road operators to work closely together on several Cooperative ITS (C-ITS) deployment projects such as: Scoop@F, InterCor, C-Roads Platform and INDID. For example, the C-Roads Platform is dedicated to harmonize the deployment of C-ITS activities across Europe. The goal is to achieve the deployment of a secure and interoperable cross-border C-ITS services for road users. The most challenging requirements of IoV networks are the privacy and the security issues. In fact, cyberattacks aiming disturbing systems like V2X can introduce degradation of efficiency and robustness of the whole Cooperative Intelligent Transport Systems (C-ITS)[GC20] [Lam+21]. Therefore, nowadays, studying the security and privacy issues in these novel systems become an important direction. To attain the benefits of safe driving and efficient traffic, beacon messages containing vehicles status information (position, speed,...) are broadcast periodically and publicly in V2X applications. Nevertheless, this exchange potentially jeopardizes the privacy of drivers and makes vehicles tracking feasible in case all these information are collected and analyzed. In VANET, attacks aim to corrupt routes to vehicles and therefore interrupt communication in the network and may lead to accidents in some cases. Attacks can be divided in two groups: passive and active attacks. The passive attacks involve traffic eavesdropping to analyze essential routing information.These type of attacks are difficult to detect and defense mechanisms are not effective. Active attacks use information previously gathered to create special crafted packets with the objective of causing some kind of disruption in the network. We can mention here many known attacks in vehicular network such as: Sybil[Zha+14], DoS and jamming attacks [LQL19]. In this thesis, we focus on misbehaving detection mechanisms in vehicular network, which are becoming prominent area of research in VANET. Our first research direction will be using Federated Learning [Yan+19] based ML techniques to study the security issues in the C-ITS scenario and design further security standards, protocols, management schemes to improve the security of C-ITS. The second direction will include the research on the security and privacy issues of the FL system itself. Since data privacy in FL is known to be threaten by attacks using gradient leakage[ZLH19], how to ensure a robust and secure FL system deployed in the C-ITS is a challenging task. The third direction will include other state-of-the-art intelligent methods to be used for the security research in future C-ITS [Pit+19]. *) Objective The idea of this research is to use Machine Learning to detect misbehaving in VANET. In literature, there are many studies using supervised algorithms considering measurements such as true positive rate and accuracy at a chosen threshold such as the number of correct predicted instances divided by total number of instances. In misbehaving detection, the cost of false positive and false negative error can differ from case to case, and can change over time.... In this thesis, we start our research from the Machine Learning (ML) enabled security solutions for the vehicular network [URL21] [Qiu+20]. Particularly, we aim to study the security solutions that are enhanced by the state-ofthe-art ML techniques such as Federated Learning (FL)[Yan+19]. As, it is defined in [Yan+19]: Define N data owners {F1, ...FN }, all of whom wish to train a machine learning model by consolidating their respective data {D1, ...DN }. A conventional method is to put all data together and use D = D1 S ... S DN to train a model MSUM . A federated learning system is a learning process in which the data owners collaboratively train a model MF ED , in which process any data owner Fi does not expose its data Di to others. In addition, the accuracy of MF ED , denoted as VF ED should be very close to the performance of MSUM , VSUM . Formally, let δ be a non-negative real number, if |VF ED − VSUM| < δ we say the federated learning algorithm has δ accuracy loss. The initial motivation of designing FL is to overcome the gap between the different data holders to train a more comprehensive ML model while keeping training data private for each party and ensuring good level of accuracy. In the C-ITS scenario, since each communication party ( car manufacturers, road operators etc..) are C-ITS data holders that can collect data with different features and train these data in a centralized manner will be more efficient to prevent the road users from many attacks[Lu+20]. In this thesis, we start our research from the Machine Learning (ML) enabled security solutions for the C-ITS from a security aspect. Particularly, we aim to study the security solutions that are enhanced by the state-of-the-art ML techniques such as Federated Learning (FL) and the robustness, safety and security of the ML systems used for intelligent vehicles. In this step several approaches of FL could be tested to detect vehicular network attacks, such as horizontal and vertical Federated Learning. We could also tackle security of Federated Learning data set by using security concepts for software like software watermarking[ZJ15] and embedding watermark into a program. The second direction will be investigating the safety of using ML techniques in the autonomous vehicles. For instance, the intelligent driving systems (e.g. Tesla FSD, Google Waymo) can recognize the driving environments and help drivers making decisions or control the driving totally. However, recent research[Hua+20] shows that these ML-based intelligent driving systems are vulnerable due to the robustness issues of the ML systems. The attackers could add perturbations on certain objects in physical world to fool the recognition systems which will lead to traffic accidents. Therefore, in this thesis, we will investigate this issue from a security aspect to help improve the safety and robustness of the autonomous vehicles. The second direction will include the research on the security and privacy issues of the FL system itself. The third direction could include other state-of-the-art validation methods to be used for the security research in future C-ITS. Its is important to validate our results based on existing methods. Here, We can extend our research perimeter to study the model uncertainty for example, which helps to find the annotation errors and also the misclassification caused by errors when classifying data. This approach could be useful for the detection of some attacks like jamming [Gon+21]. such approach could be applied on logs or other data like images. *) Background of the candidate We are looking for a candidate with a Bac+5 in Computer Sciences with very good background in Machine learning. A background in Cybersecurity is essential. She or he must have a good knowledge in VANET network and attacks in such networks. Knowledge in performance evaluation, optimization, and modeling will be greatly appreciated as well as programming and simulation skills. *) Application Applicants should submit a: - Cover letter - Curriculum vitae - Recommendation letters For full consideration, For any question, please contact: • Mounira Msahli: mounira.msahli@telecom-paris.fr • Han Qiu: qiuhan@tsinghua.edu.cn • Gerard Memmi: gerard.memmi@telecom-paris.fr *) Multidisciplinarity and cotutelle with Tsinghua University, Beijing, China This thesis will be in cotutelle between IP Paris institute and Tsinghua University. It will be supervised by: • Gerard Memmi: is a professor at Telecom Paris. - Gerard is doing research in data protection and privacy, energy profiling of software programs, and security of distributed systems. His background research on security on distributed systems is a major factor on the supervision of the security and C-ITS attacks part. - Han Qiu: is an associate professor at Tsinghua University, his research interests include AI security, big data security, applied cryptography, and Cloud computing. He will supervise Nan specially on the Machine Learning and Federated Learning aspects. - Mounira Msahli is an associate professor at Telecom Paris. Her current research interests include the areas of vehicular network security and the use of ML for Cybersecurity. She will contribute on the supervision of autonomous and connected cars aspects and the adaptability of designed Federated ML solutions to the vehicular context. Work has already begun in all three areas by publishing the paper "opological Graph Convolutional NetworkBased Urban Traffic Flow and DensityPrediction” in IEEE Transactions on Intelligent Transportation Systems(2020). *) References [CLZ19] Haowei Cao, Jialiang Lu, and Nan Zong. “A Multi-User 360-Video Streaming System for Wireless Network”. In: The 17th International Conference on Virtual-Reality Continuum and its Applications in Industry, VRCAI 2019, Brisbane, QLD, Australia, November 14-16, 2019. Ed. by Tomas Trescak et al. ACM, 2019, 24:1–24:5. doi: 10.1145/3359997.3365713. url: https://doi.org/10.1145/3359997. 3365713. [GC20] Amrita Ghosal and Mauro Conti. “Security issues and challenges in V2X: A Survey”. In: Comput. Networks 169 (2020), p. 107093. doi: 10.1016/j.comnet.2019.107093. url: https://doi.org/10. 1016/j.comnet.2019.107093. [Gon+21] Xinyu Gong et al. “Model Uncertainty Based Annotation Error Fixing for Web Attack Detection”. In: J. Signal Process. Syst. 93.2-3 (2021), pp. 187–199. doi: 10.1007/s11265- 019- 01494- 1. url: https://doi.org/10.1007/s11265-019-01494-1. [Hua+20] Lifeng Huang et al. Universal Physical Camouflage Attacks on Object Detectors. 2020. arXiv: 1909. 04326 [cs.CV]. [Lam+21] Ayyoub Lamssaggad et al. “A Survey on the Current Security Landscape of Intelligent Transportation Systems”. In: IEEE Access 9 (2021), pp. 9180–9208. doi: 10 . 1109 / ACCESS . 2021 . 3050038. url: https://doi.org/10.1109/ACCESS.2021.3050038. [LQL19] Z. Lu, G. Qu, and Z. Liu. “A Survey on Recent Advances in Vehicular Network Security, Trust, and Privacy”. In: IEEE Transactions on Intelligent Transportation Systems 20.2 (2019), pp. 760–776. doi: 10.1109/TITS.2018.2818888. [Lu+20] Y. Lu et al. “Federated Learning for Data Privacy Preservation in Vehicular Cyber-Physical Systems”. In: IEEE Network 34.3 (2020), pp. 50–56. doi: 10.1109/MNET.011.1900317. [Pit+19] Nikolaos Pitropakis et al. “A taxonomy and survey of attacks against machine learning”. In: Comput. Sci. Rev. 34 (2019). doi: 10.1016/j.cosrev.2019.100199. url: https://doi.org/10.1016/j. cosrev.2019.100199. [Qiu+20] Han Qiu et al. “Topological Graph Convolutional Network-Based Urban Traffic Flow and Density Prediction”. In: IEEE Transactions on Intelligent Transportation Systems (2020). [URL21] A. Uprety, D. B. Rawat, and J. Li. “Privacy Preserving Misbehavior Detection in IoV Using Federated Machine Learning”. In: 2021 IEEE 18th Annual Consumer Communications Networking Conference (CCNC). 2021, pp. 1–6. doi: 10.1109/CCNC49032.2021.9369513. [Yan+19] Qiang Yang et al. “Federated Machine Learning: Concept and Applications”. In: ACM Trans. Intell. Syst. Technol. 10.2 (Jan. 2019). issn: 2157-6904. doi: 10.1145/3298981. url: https://doi.org/10. 1145/3298981. [Zha+14] K. Zhang et al. “Sybil Attacks and Their Defenses in the Internet of Things”. In: IEEE Internet of Things Journal 1.5 (2014), pp. 372–383. doi: 10.1109/JIOT.2014.2344013. [Zho+11] T. Zhou et al. “P2DAP — Sybil Attacks Detection in Vehicular Ad Hoc Networks”. In: IEEE Journal on Selected Areas in Communications 29.3 (2011), pp. 582–594. doi: 10.1109/JSAC.2011.110308. [ZJ15] N. Zong and C. Jia. “Software Watermarking Using Support Vector Machines”. In: 2015 IEEE 39th Annual Computer Software and Applications Conference. Vol. 2. 2015, pp. 533–542. doi: 10.1109/ COMPSAC.2015.59. [ZLH19] Ligeng Zhu, Zhijian Liu, and Song Han. Deep Leakage from Gradients. 2019. arXiv: 1906 . 08935 [cs.LG]. 4