LLM Evaluation and AI Data Annotation Specialist (Freelancer)
Evaluated and ranked large language model (LLM) responses based on multiple qualitative criteria in a freelance role. Crafted adversarial and multilingual prompts to test and analyze AI robustness across conversation turns. Produced structured analytical rationales and documented observed model weaknesses in English and Hindi. • Assessed LLM responses for relevance, coherence, reasoning, and adherence to instructions • Designed pressure scenarios and simulated user personas for realistic multi-turn dialogues • Generated detailed feedback on logical gaps, hallucinations, and safety issues • Contributed to multilingual AI model evaluation through Hindi prompt engineering