AI Operations Analyst – Model Output Evaluator
As an AI Operations Analyst at Innodata, I conducted model output evaluations and safety benchmarking for large language models. My work focused on identifying hallucination patterns, refining instruction-following, and benchmarking model robustness using Multimango and proprietary tools. I performed high-complexity evaluations, multi-modal consistency checks, and ranked outputs for coding and logical tasks. • Evaluated 500+ model-generated text outputs for logical accuracy and hallucinations. • Refined and tested over 200 complex prompts against adversarial and instruction-following scenarios. • Benchmarked model performance across text, image, and audio modalities for cross-modal accuracy. • Ensured 95% alignment with gold-standard evaluation metrics through systematic analysis.