Generative AI search Evaluation and Quality Assuarance
Evaluated the accuracy and safety of AI-generated responses for a major Large Language Model (LLM) project. Tasks included ranking multiple model outputs based on truthfulness, helpfulness, and harmlessness. Performed detailed side-by-side comparisons and provided linguistic feedback to refine the model's reasoning capabilities. Adhered to strict quality guidelines with a focus on identifying hallucinations and ensuring factual consistency across large datasets