LLM Evaluator
As an LLM Evaluator, I contributed to the training and evaluation of large language models. My work involved labeling and categorizing text data, ranking model outputs, and performing quality assurance for RLHF pipelines. I developed prompt datasets to enhance the performance and safety of conversational AI systems. • Conducted LLM prompt evaluation for reasoning, bias, and hallucinations. • Labeled, classified, and ranked large volumes of text outputs. • Performed multi-stage QA checks to maintain accuracy above 98%. • Created and curated datasets to support LLM fine-tuning and evaluation.