AI Output Evaluation & Prompt Engineering
Evaluated and rated AI-generated outputs across code generation, technical writing, and business documentation as part of building two AI-native startups. Tasks included assessing responses for accuracy, instruction-following, coherence, and reasoning quality. Iteratively refined system prompts based on output evaluation to improve model behavior.