AI Model Evaluation & Reasoning Specialist
I performed systematic evaluation of large language model (LLM) outputs for reasoning quality, instruction alignment, and output reliability. My work included rubric-based scoring, logical consistency checks, and detection of hallucinated or factually inaccurate responses. The process involved comparative ranking of AI-generated text and documentation of rationale for assessment outcomes. • Evaluated AI responses for logical coherence and adherence to detailed instructions. • Analyzed multi-step reasoning chains, identifying assumptions and logical fallacies. • Developed structured frameworks to ensure assessment consistency and reproducibility. • Documented concise justification reports for every evaluation result.