Data Labeler and Evaluator (LLM)
At Mentis AI, I conducted prompt writing, rubric evaluation, data labeling, and response accuracy analysis focused on finance and private credit. The experience involved analyzing AI-generated responses to complex spreadsheets, memos, and presentations for accuracy and appropriateness. My work contributed to improving large language model (LLM) performance in financial contexts. • Evaluated financial text responses for correctness and coherence. • Labeled and reviewed AI outputs against provided rubrics in the finance domain. • Provided feedback for prompt engineering to enhance model responses. • Worked closely with AI researchers to support targeted AI optimizations.