AI Model Evaluation & Test Design Specialist
As an AI Model Evaluation & Test Design Specialist, I reviewed and rated AI-generated language model outputs for reasoning quality and logical consistency. I designed evaluation scenarios, established gold-standard behaviors, and conducted multi-level prompt testing to ensure LLM reliability. My work included quality assurance reviews and structured feedback loops to enhance AI model performance. • Evaluated complex, multi-turn text interactions for errors, hallucinations, and biases. • Developed and implemented reproducible annotation frameworks for language model assessment. • Used Label Studio, Prodigy, and Scale AI to manage annotation workflows and datasets. • Conducted scenario-based QA reviews to test edge-case handling and instruction following.