AI Model Output Evaluation & Quality Assessme
Evaluated AI/LLM-generated outputs for accuracy, instruction adherence, and code correctness across multiple platforms. Tasks included rating model responses for helpfulness and safety, assessing code generation quality across Python and JavaScript, and performing search result quality evaluation covering page quality and needs-met assessments. Applied structured rubrics to judge output relevance, factual accuracy, and technical soundness.