AI Training & LLM Evaluation Specialist (Freelance) | LILT AI
As an AI Training & LLM Evaluation Specialist at LILT AI, I evaluated LLM-generated outputs for correctness and domain accuracy in various programming and technical domains. I created golden prompt–answer pairs to contribute to RLHF training data and performed model scoring and structured output assessments. Additionally, I conducted audio evaluations for speech prosody and dialect accuracy in different languages. • Conducted side-by-side (SxS) model ratings and systematic QA for LLM fine-tuning. • Authored prompt–response pairs and model feedback for RLHF datasets. • Assessed STT/TTS audio quality in Arabic, English, and French. • Produced structured reports to reduce hallucination rates and improve generative quality.