NLP Data Annotation and LLM Response Evaluation Project
Worked on an academic and self-driven data annotation and AI evaluation project focused on improving the quality and reliability of language model outputs. Labeled and reviewed text datasets for tasks such as intent classification, sentiment analysis, and response correctness using structured guidelines. Evaluated LLM-generated responses based on accuracy, relevance, clarity, and consistency with expected outputs. Used structured formats like JSON to store annotations and evaluation results. Performed quality checks by reviewing edge cases and correcting inconsistent labels to improve dataset reliability. Collaborated with peers to refine labeling guidelines and ensure consistency across annotations. Gained hands-on experience in prompt testing, structured output validation, and basic tool-assisted evaluation workflows using Python.