LLM Response Evaluation and Prompt Authoring
Contributed to a generative AI training initiative through Outlier's Aether platform, working on human feedback tasks designed to improve the reasoning and alignment of large language models. This involved writing and refining prompts across multiple difficulty tiers, then rating and ranking model-generated responses based on accuracy, coherence, instruction adherence, and logical consistency. A professional background in contract review and legal research proved particularly useful here evaluating how well a model interprets dense, structured language, follows conditional logic, and identifies gaps in reasoning requires the same kind of close reading that compliance work demands. Maintained consistent output quality across high-volume workflows while adapting to evolving rubrics and annotation guidelines throughout the project.