AI Data Labeling & LLM Evaluation Project
Worked on large-scale AI training and evaluation projects focused on improving Large Language Model (LLM) performance. Responsibilities included evaluating AI-generated text for accuracy, relevance, safety, and instruction-following using detailed guidelines. Performed prompt–response writing (SFT), RLHF-based evaluation, text generation, summarization, and question answering tasks. Provided structured feedback to reduce hallucinations and improve model reliability. Maintained high accuracy and productivity in remote, deadline-driven environments. Contributed to high-accuracy datasets used for supervised fine-tuning (SFT) and evaluation benchmarks.