LLM Content Evaluator and AI Data Labeler
I evaluated and rated LLM-generated content for quality, accuracy, and alignment using rubrics-based frameworks. Tasks included scoring outputs, analyzing content for safety and tone, and providing feedback for improvement in model responses. The experience involved Supervised Fine Tuning (SFT), RLHF, and RLVR workflows for comprehensive model assessment. • Evaluated chatbot and general AI outputs on criteria such as reasoning depth and prompt alignment. • Performed reinforcement learning-based reward signal evaluations to enhance alignment. • Labeled data across multiple batches, contributing directly to instruction dataset quality. • Utilized various online productivity and annotation tools for remote AI labeling work.