Text Evaluation and Prompt Rating for Large Language Models
Worked on diverse LLM evaluation projects including rating chatbot responses, comparing answer quality, classifying tone and appropriateness, and rewriting prompts for model fine-tuning. I also performed Named Entity Recognition (NER), summarization, and content moderation tasks. My annotations helped train safer, more coherent language models, and I was consistently selected for high-trust tasks like alignment tuning and red teaming.