Jiri De Jonghe

AI Engineer - AI Safety Researcher - Data Scientist

[email protected]

Brussels, Belgium

Experience

AI & Data Senior Consultant

EY

September 2022 - Present

Brussels, Belgium

  • Worked on various GenAI use cases: Retrieval Augmented Generation (RAG), code generation, Question Answering (QA)...
  • Traditional ML use-cases: using XGBoost, CatBoost, Sklearn to develop ML models for financial service institutions
  • Research: investigating the robustness of NLP models and LLMs against adversarial attacks, leading to two accepted conference papers
  • Led GTM strategies, facilitated stakeholder management, and contributed to client development through AI-driven insights.

Education

Master of Computer Science

KU Leuven

2020 - 2022

Leuven, Belgium

Major in Artifical Intelligence. Master Thesis: Usage of Gaussian Mixture Models for the classification of (sub)genres of (Electronic Dance) Music

Bachelor of Civil Engineering

KU Leuven

2016 - 2020

Leuven, Belgium

Major in Computer Science. Bachelor Thesis: Casting to multi-screen videowall

Academic Publications

Belmoukadam, O., De Jonghe, J., Sassine, N. (2023) AdversNLP: A Practical Guide to Assessing NLP Robustness Against Text Adversarial Attacks. 2nd ICML Workshop on New Frontiers in Adversarial Machine Learning.Belmoukadam, O., De Jonghe, J., Ajridi, S. (2024) AdversLLM: A Practical Guide to Governance, Maturity and Risk Assessment for LLM-based Applications. NLAICSE 2024.

Notable Projects

LLMFromScratch

Recreating an LLM completely from scratch, written in C. The goal is to be a learning experience for people of all varieties of technical expertise. We start all the way back form the perceptron and gradually build up to an LLM. Alongside the code, I provide a clear explanations on two levels: one for less technical people who just want to understand how it works and to develop an intuition, and one for people that want to dive into the technical details and are familiar with basics of mathematics.

InstaNER

A CLI tool that automates the entire process of creating a Named Entity Recognition (NER) model, starting all the way from identifying the needed entities, creating train and test data, training a transformer based model on the data, evaluating the results and loading it for inference, all while ensuring reproducability. The goal is to provide an easy-to-use accelerator allowing non-technical people to start training their own NER models, showing the process of synthetic data generation and how this can speed up the model generation process.

Skills

Programming Languages

Python

Proficient

Python has been my bread and butter to work as data scientist and AI engineer. I've used it extensively in both industry and academia.

C

Proficient

To understand anything low-level, C has been my go-to language to develop in. It forces you to think about the nitty-gritty details which deepens my understanding of the subject. I've used this extensively for personal projects.

Go

Intermediate

Currently upskilling myself in Go with the goal to build future-proof systems. I believe this to be a better language for building AI softare systems than Python, which is currently still the most used language. Mainly used for personal projects, but I've used it in industry.

HTML / CSS / JavaScript / TypeScript / ReactJS

Intermediate

Learned an intermediate amount of knowledge to create front-ends such that I can build full stack applications. I've used this occasionally in industry.

Frameworks and Tools

Tooling

JuPyteR Notebooks, Neovim, VSCode, Cursor, Git, GitHub, Docker, Nginx, tmux, LaTeX, (Arch) Linux

Frameworks

NumPy, Pandas, Polars, PyTorch, TensorFlow, Keras, Transformers

Certifications

DP-100: Azure Data Scientist Associate

AZ-900: Azure Fundamentals

Natural Languages

Dutch - Native

English - Fluent

French - Intermediate

Spanish - Intermediate

German - Notions