AI Data Analyst and Evaluator, Aurora Studio
As an AI Data Analyst and Evaluator at Aurora Studio, I reviewed and rated large language model outputs across multiple domains, including mathematics, finance, reasoning, and general knowledge. The work involved applying rubric-based scoring, RLHF preference ranking, and detailed quality justifications to each output, ensuring alignment with gold-standard benchmarks and regulatory requirements. I contributed to model improvement by identifying edge cases and documenting systematic model failure patterns. • Evaluated LLM-generated text and factuality, and verified compliance with instructions. • Applied RLHF preference ranking and rubric-based evaluation on model responses. • Audited outputs for hallucinations, bias, and mathematical or quantitative errors. • Identified edge cases and contributed to pipeline quality improvements.