AI Response Evaluator and Prompt Engineer
I worked on AI response evaluation projects where I assessed the quality, accuracy, relevance, and safety of AI-generated text outputs. Tasks included comparing multiple responses to the same prompt, ranking them by quality, identifying factual errors, detecting bias or harmful content, and providing structured feedback to improve model performance. I also worked on prompt engineering and optimization, crafting and refining prompts to produce more accurate, useful, and aligned responses. This involved testing different phrasings, adjusting instructions, and analyzing outputs for consistency. In data labeling, I handled text classification tasks including topic tagging, sentiment analysis, and intent recognition. I maintained high consistency across large datasets and followed detailed annotation guidelines. I also reviewed and corrected mislabeled data to improve dataset quality. Quality measures I followed included double-checking ambiguous cases, adhering to platform-specific guidelines, and participating in calibration exercises to ensure alignment with client expectations.