Alignerr
I contributed to AI training projects for Alignerr, focusing on multiple tasks across the data labeling lifecycle using Labelbox as the primary annotation tool. My responsibilities included Reinforcement Learning from Human Feedback (RLHF), fine-tuning support, and supervised fine-tuning (SFT), where I created high-quality prompt-response pairs to train large language models. I also performed LLM output evaluation and rating tasks to assess model alignment, coherence, and factual accuracy. The projects varied in size, from targeted batches to large-scale datasets involving thousands of samples. Additionally, I worked on evaluating and correcting computer programs—ensuring logic correctness, code quality, and adherence to best practices in languages like Java and Python. High accuracy and consistency were maintained through internal QA reviews, gold standard references, and clear labeling guidelines, ensuring the outputs met strict quality benchmarks set by the client.