AI Evaluator / LLM Reasoning Analyst
Responsible for evaluating the reasoning, consistency, and robustness of large language models (LLMs). Performed hallucination detection, logical flaw identification, and edge-case analysis on AI outputs. Structured prompts and workflows were developed for validating model outputs and supporting adversarial testing. • Evaluated LLM outputs using structured methodologies. • Identified hallucinations, logical inconsistencies, and edge-case failures. • Designed and implemented prompts to test model reasoning and reliability. • Automated analysis workflows using Python and JSON-based tools.