AI Evauation Specialist
• Supported large language model (LLM) training and evaluation through prompt assessment, response ranking, and reinforcement learning feedback (RLHF). • Conducted instruction tuning, safety alignment, and output quality evaluation across diverse AI use cases. • Applied critical reasoning and domain knowledge to assess model accuracy, factuality, tone, and policy compliance. • Identified failure modes, hallucinations, and bias patterns, contributing to model robustness and reliability improvements. • Worked with structured and unstructured datasets, including conversational data, technical content, and edge-case scenarios. • Maintained strict data privacy, security, and confidentiality standards while meeting high-volume quality benchmarks.