Tasker
The Argon project on DataAnnotation was focused on evaluating and improving large language model (LLM) outputs through a structured feedback and annotation process. The project involved comparing and ranking AI-generated responses to user prompts based on criteria such as accuracy, relevance, helpfulness, and clarity. Annotators worked on a wide range of topics, including technical subjects like programming and general knowledge tasks in English. The scope included thousands of prompt-response pairs, and the primary task was to rank completions, identify preferred responses, or flag harmful or biased content, helping refine the model’s understanding and alignment. I contributed to the Argon project by providing high-quality, consistent annotations and critical evaluations that directly influenced LLM training. The project emphasized quality assurance through guidelines, calibration tasks, and review feedback, ensuring that only accurate and reliable rankings were accepted.