AI Model Evaluator / AI Trainer — Independent Contractor
Evaluated large language model outputs for reasoning quality, factual accuracy, safety, and instruction compliance in support of RLHF and model fine-tuning. Conducted pairwise comparisons, preference ranking, and detailed rationales to guide AI improvement. Applied taxonomy-based annotation, detected hallucinations and inconsistencies, and authored structured explanations for use by model teams. • Evaluated GPT-4, Claude, Gemini outputs across multiple benchmarking criteria. • Produced structured rationales for model failures and annotated outputs using complex guidelines. • Conducted dataset quality assurance reviews and flagged annotation inconsistencies. • Supported RLHF and fine-tuning pipelines for commercial AI systems.