AI Evaluation Specialist (Contract) – Handshake AI (Project Vela)
As an AI Evaluation Specialist on Project Vela, I evaluated responses generated by large language models for accuracy, instruction adherence, and truthfulness. I designed and implemented multi-turn prompts to systematically identify issues such as hallucinations and logical inconsistencies. Using structured RLHF-aligned methodologies, I compared and ranked model outputs objectively. • Authored multi-turn evaluation prompts to reliably surface constraint violations • Analyzed language model response patterns for logical flaws and hallucinations • Compared and ranked outputs using standardized frameworks in a production setting • Applied structured evaluation frameworks to support RLHF processes