LLM Evaluation and Multilingual Text Annotation Project
I worked on large-scale AI training data projects focused on evaluating and improving LLM behavior across multilingual datasets (EN/FR/AR). My tasks included rating model outputs, checking factual accuracy, assessing reasoning quality, rewriting prompts, and generating high-quality responses for supervised fine-tuning. I also handled code understanding and debugging tasks, leveraging my Java/Spring Boot background to evaluate technical explanations, algorithms, and code snippets. In addition, I contributed to image-based verification tasks such as quality checks, classification, and identity/persona verification. Throughout the project, I followed strict quality guidelines, maintained consistency, and delivered high-precision annotations across thousands of examples.