Independent AI Researcher & Model Evaluator
This independent research and model evaluator role included designing and executing a tiered protocol for evaluating the output of multiple large language models. The work analyzed model certainty, authority, hedging, framing, and sensitivity in contested or ambiguous narratives. Red-teaming, rubric development, and domain-specific knowledge were used to assess accuracy, nuance, and risk. • Crafted structured protocols for systematic model testing. • Conducted detailed rubric-based evaluation on sensitive topics. • Applied practitioner knowledge in cultural and historical contexts. • Explored patterns in LLM behavior via prompt engineering and analysis.