Freelance Finance Domain Expert – Generative AI / LLM Evaluation
As a freelance finance domain expert for generative AI and LLM evaluation, led finance-focused evaluation projects assessing AI model performance on real-world financial tasks. Developed scoring rubrics, calibration frameworks, and conducted deep error analysis to improve LLM reasoning quality. Designed and implemented structured feedback loops supporting model fine-tuning and reviewer alignment. • Created prompt- and rubric-based evaluation workflows for financial models, statements, and risk analysis. • Conducted preference ranking and in-depth failure mode analysis to isolate and address reasoning flaws. • Developed annotation guidelines integrating finance domain standards into LLM output evaluation. • Leveraged platform annotation tools and proprietary/internal evaluation software for structured assessments.