LLM Response Evaluation and Human Feedback Annotation-Welocalize
Project focused on evaluating and annotating AI-generated text responses to improve the quality and human-likeness of large language models. Tasks included rating responses based on relevance, coherence, tone, clarity, and emotional alignment with user intent, as well as ranking multiple outputs and providing structured human feedback (RLHF). Additional work included prompt–response writing (SFT), text classification, emotion recognition, and summarization. Quality measures involved strict adherence to detailed guidelines, consistency checks, and regular quality reviews to ensure high annotation accuracy across large-scale datasets.