Data Annotator
In my work with Project Meereen, I was responsible for the preference ranking process. I evaluated multiple model outputs against strict rubrics for truthfulness and helpfulness, providing the critical feedback used to train reward models which was essential for aligning the model so that it didn't just produce fluent text, but followed complex instructions safely and effectively. I also worked on high-complexity QA tasks that required deep reasoning involving creating gold standard answer sets for Supervised Fine-Tuning, where I had to synthesize information from various sources to ensure the model could handle both factual retrieval and multi-step logical queries while adhering to rigorous quality benchmarks, therefore reducing hallucinations and ensured the model provided precise, evidence-based responses.