LLM Evaluation and Prompt Rating for Human-AI Alignment
As a freelance AI trainer, I contributed to multiple large-scale language model tuning projects focused on human-aligned AI evaluation. My responsibilities included evaluating AI-generated responses in English and French, performing A/B comparison tasks, rewriting prompts for safety and clarity, and providing structured ratings on tone, helpfulness, logic, and ethical concerns. I also participated in fine-tuning and prompt crafting (SFT), and reviewed system outputs for sensitive content, coherence, and hallucinations. Each task followed strict annotation guidelines and was performed via proprietary web-based interfaces (Outlier, Toloka, dataannotation). I consistently maintained over 95% agreement with gold-standard labels and ensured output quality met alignment goals for safe LLM deployment.