Prompt Evaluator / AI Response Reviewer
Worked as an AI Response Evaluator, assessing and improving the quality of large language model (LLM) outputs across a wide range of domains including technical, analytical, and general knowledge tasks. Responsible for reviewing AI-generated responses against detailed evaluation guidelines, focusing on accuracy, relevance, completeness, safety, and clarity. Applied structured reasoning to rate responses, identify factual errors, logical inconsistencies, and gaps in explanation, and provide clear, evidence-based justifications for each evaluation. Contributed to improving model performance by analysing prompt-response interactions and suggesting refinements to prompts to elicit higher-quality outputs. This included identifying ambiguous or poorly structured prompts, testing alternative phrasings, and evaluating the impact on response quality. Maintained high annotation accuracy and consistency across large volumes of tasks, demonstrating strong attention to detail and adherence to strict quality standards. Additionally, handled edge cases such as biased, unsafe, or misleading outputs by applying policy guidelines to ensure responsible AI behaviour. Collaborated within a distributed workflow, meeting tight deadlines while maintaining quality benchmarks. Leveraged analytical skills and experience with structured data to track patterns in model performance and contribute to continuous improvement of evaluation frameworks.