LLM Evaluation and Prompt Quality Rating Project
I participated in multiple large-scale AI evaluation and data labeling projects for natural language models. The tasks involved assessing LLM outputs for accuracy, relevance, factual consistency, and ethical compliance. I performed prompt-response evaluations, hallucination detection, and comparative rating (SxS) following strict quality guidelines. Each project required consistent attention to linguistic nuance, contextual understanding, and adherence to rating rubrics. I maintained a high-quality score and met all AET (Average Evaluation Time) and consistency standards. These projects contributed to improving large-scale AI models used for global applications, emphasizing human alignment and responsible AI development.