AI Content Evaluation & Application Development
In this role, I evaluated AI-generated text, code, and content using multiple large language models on a daily basis. I applied prompt engineering techniques to optimize AI output quality and accuracy and assessed AI responses for factual accuracy, helpfulness, relevance, coherence, and possible harm. I compared AI models' outputs to evaluate quality and performance for various application needs. • Conducted in-depth daily analyses of AI-generated code for correctness and efficiency • Rated and ranked AI model outputs using LMSYS Chatbot Arena and related tools • Provided comprehensive feedback on AI-generated content for remote evaluation workflows • Worked extensively with VS Code and internal/proprietary tooling for testing, debugging, and refining AI-assisted development workflows