LLM Response Evaluation and Text Quality Labeling
In this project, I evaluated AI-generated text responses to measure accuracy, coherence, tone, and factual quality. Tasks included classifying responses according to quality guidelines, comparing model outputs, and identifying errors or biases in generated content. I worked with multilingual data (English, Portuguese, Spanish) and applied guideline-based criteria to ensure consistent and high-quality annotations. The project contributed directly to improving the performance of LLMs used in conversational AI systems.