Generalist AI Training Contractor
As an AI Trainer and LLM Quality Evaluator, I designed and implemented structured multi-axis evaluation rubrics to assess large-language-model responses. I performed detailed side-by-side output comparisons, ranking and scoring responses by coherence, accuracy, safety, and adherence to instructions. I documented errors, hallucinations, and edge cases to support iterative improvements, refining criteria for consistency and defensibility. • Designed and refined evaluation rubrics for LLM assessment • Performed comparative response ranking and rating tasks • Identified and classified model hallucinations and unsafe outputs • Produced detailed rationales to support model training and QA