I have a proven track record in AI data labeling, multimodal annotation, and LLM evaluation, with a strong focus on STEM and engineering content. I have expertise in text, image, audio, video, and satellite image classification and have been consistently delivering high-quality annotations to train state-of-the-art AI models. I have experience with self-driving cars (Lidar, vision, and object detection), remote sensing datasets, and scientific data annotation (equations, chemical structures, engineering drawings). Additionally, I have designed and implemented QA rubrics to verify model outputs for accuracy, consistency, and logical coherence in various STEM fields, including physics, math, biology, chemistry, and computational modeling.
What differentiates me is my hands-on experience in both labeling and evaluation, combined with my skills in RLHF, SFT, and prompt engineering. I have been creating well-structured challenges and rubrics that enhance the depth of reasoning and accuracy of AI models. My background in automation, robotics, and computational modeling has provided me with a strong technical foundation, allowing me to approach annotation and evaluation tasks with precision and scientific understanding. My multimodal expertise, paired with my STEM specialization, allows me to deliver accurate, scalable, and domain-relevant training data for state-of-the-art AI systems.
ExpertFrenchGermanEnglish
Labeling Experience
French LLM Data Annotation & Safety Review – Outlier.ai
LabelboxTextRLHFEvaluation Rating
At Outlier.ai, I worked on professional French-language LLM projects that involved creating high-quality French prompts and responses, ranking model outputs, and documenting detailed rationales to support RLHF, SFT, and QA training. I also performed safety reviews and red-teaming in French to identify harmful, biased, or low-quality outputs, ensuring cultural and linguistic accuracy. The project emphasized adherence to strict rubrics, quality assurance protocols, tone/register control, and rigorous evaluation standards for large-scale model development.
At Outlier.ai, I worked on professional French-language LLM projects that involved creating high-quality French prompts and responses, ranking model outputs, and documenting detailed rationales to support RLHF, SFT, and QA training. I also performed safety reviews and red-teaming in French to identify harmful, biased, or low-quality outputs, ensuring cultural and linguistic accuracy. The project emphasized adherence to strict rubrics, quality assurance protocols, tone/register control, and rigorous evaluation standards for large-scale model development.
2024
English A Trainner
Scale AITextPrompt Response Writing SFT
Contributed to the training and improvement of advanced language models through Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) by writing, editing, and fact-checking clear and contextually accurate responses across diverse topics.
Developed and optimized English prompts for instruction-based training, ensuring clarity, creativity, and alignment with linguistic and contextual guidelines.
Evaluated and rated AI-generated outputs using detailed rubrics to assess accuracy, clarity, factual integrity, and adherence to project-specific style and safety standards.
Conducted prompt–response pair creation and refinement to enhance model reasoning, creativity, and coherence within both SFT and RLHF workflows.
Performed ranking and comparative analysis of multiple AI responses to identify the best-performing outputs and guide model fine-tuning and reward model calibration.
Reviewed large volumes of AI-generated content for tone consistency, bias mitigation
Contributed to the training and improvement of advanced language models through Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) by writing, editing, and fact-checking clear and contextually accurate responses across diverse topics.
Developed and optimized English prompts for instruction-based training, ensuring clarity, creativity, and alignment with linguistic and contextual guidelines.
Evaluated and rated AI-generated outputs using detailed rubrics to assess accuracy, clarity, factual integrity, and adherence to project-specific style and safety standards.
Conducted prompt–response pair creation and refinement to enhance model reasoning, creativity, and coherence within both SFT and RLHF workflows.
Performed ranking and comparative analysis of multiple AI responses to identify the best-performing outputs and guide model fine-tuning and reward model calibration.
Reviewed large volumes of AI-generated content for tone consistency, bias mitigation
2025 - 2025
Video Data Labeling & QA Evaluation for AI Models
LabelboxVideoBounding BoxEmotion Recognition
Primarily performed video labeling and QA on various data sets to train AI models for computer vision and multimodal understanding tasks. Tasks ranged from frame-by-frame video annotation to bounding boxes, segmentation, object detection, activity recognition, and more with an emphasis on ensuring high temporal and spatial accuracy. Implemented QA rubrics to evaluate and audit the outputs of AI vision models, including validating consistency, object tracking quality, and correlation to ground-truth labels. Contributed to projects in the autonomous systems, surveillance analysis, and robotics domains, providing high-quality annotated video data and structured QA evaluations to enhance model reliability, reasoning, and overall performance.
Primarily performed video labeling and QA on various data sets to train AI models for computer vision and multimodal understanding tasks. Tasks ranged from frame-by-frame video annotation to bounding boxes, segmentation, object detection, activity recognition, and more with an emphasis on ensuring high temporal and spatial accuracy. Implemented QA rubrics to evaluate and audit the outputs of AI vision models, including validating consistency, object tracking quality, and correlation to ground-truth labels. Contributed to projects in the autonomous systems, surveillance analysis, and robotics domains, providing high-quality annotated video data and structured QA evaluations to enhance model reliability, reasoning, and overall performance.
2025 - 2025
Voice & Speech Data Labeling and QA for AI Training
Scale AIAudioEmotion RecognitionData Collection
Annotated and assessed voice and speech datasets to train models for speech recognition and conversational AI. Responsibilities included transcribing audio, diarizing speakers, classifying intents, and tagging sentiments, with an emphasis on clarity and context. Implemented QA rubrics for transcriptions and responses to audit AI-generated speech and conversational outputs for accuracy, tone, and fluency. Annotated multilingual and accented speech data to support inclusivity and robust performance in global use cases. Delivered high-quality audio datasets and structured QA evaluations that improved model accuracy for voice assistants, NLU, and speech-to-text applications.
Annotated and assessed voice and speech datasets to train models for speech recognition and conversational AI. Responsibilities included transcribing audio, diarizing speakers, classifying intents, and tagging sentiments, with an emphasis on clarity and context. Implemented QA rubrics for transcriptions and responses to audit AI-generated speech and conversational outputs for accuracy, tone, and fluency. Annotated multilingual and accented speech data to support inclusivity and robust performance in global use cases. Delivered high-quality audio datasets and structured QA evaluations that improved model accuracy for voice assistants, NLU, and speech-to-text applications.
2025 - 2025
STEM & Multimodal AI Data Labeling Project
LabelboxImageBounding BoxText Generation
Generated new math and physics prompts from the ground up to provide baseline and instruct STEM-focused models in scientific reasoning. Developed tasks that included equations, proofs, problem solving, and applied physics situations, with a focus on clarity, logical complexity, and subject-matter accuracy. Labeled and formatted these data into quality training corpora that were later used for QA rubric scoring and iterative fine-tuning. This effort has contributed to more accurate models, fewer reasoning errors, and the advancement of scalable evaluation protocols for STEM-focused language models.
Generated new math and physics prompts from the ground up to provide baseline and instruct STEM-focused models in scientific reasoning. Developed tasks that included equations, proofs, problem solving, and applied physics situations, with a focus on clarity, logical complexity, and subject-matter accuracy. Labeled and formatted these data into quality training corpora that were later used for QA rubric scoring and iterative fine-tuning. This effort has contributed to more accurate models, fewer reasoning errors, and the advancement of scalable evaluation protocols for STEM-focused language models.
2025 - 2025
Education
U
University of Ottawa
Master of Science, Mechanical Engineering
Master of Science
2023 - 2025
K
Kwame Nkrumah University of Science and Technology