Data Analyst
Evaluating various AI response and scoring based on accuracy, directness and allignment.
Hire this AI Trainer
Sign in or create an account to invite AI Trainers to your job.
I have over two years of hands-on experience in AI training data, annotation, and model evaluation, working with platforms like Outlier AI and Scale AI on large-scale RLHF pipelines. My work has focused on evaluating and ranking AI-generated responses in various domains such as software engineering, generalist fields and embedded systems ensuring high standards of correctness, instruction-following, safety, and reasoning quality. I consistently identified hallucinations, logical errors, and edge-case failures, while maintaining a 95%+ quality score across 200+ hours of expert-level annotation tasks. I also contributed to dataset creation by authoring high-quality prompt-response pairs and designing adversarial prompts to systematically expose model weaknesses. What sets me apart is my combination of deep technical expertise and structured evaluation methodology. I have built automated evaluation pipelines to assess LLM outputs using embedding similarity, heuristics, and custom scoring rubrics aligned with industry RLHF practices. My experience spans dataset curation, preference modeling, inter-annotator agreement analysis, and feedback-driven model improvement. This allows me to approach AI training not just as annotation, but as a rigorous, metrics-driven process focused on improving model alignment, reliability, and real-world performance.
Evaluating various AI response and scoring based on accuracy, directness and allignment.
As an AI Training Data Contractor at Outlier AI, I evaluated and ranked AI-generated code outputs to enhance model accuracy in various programming languages. I created high-quality prompt-response pairs for RLHF datasets, ensuring technical accuracy and factual consistency for engineering and embedded systems domains. My contributions improved model instruction-following and safety through structured feedback and annotation tasks. • Assessed large volumes of AI-generated code and responses in Python, C++, and JavaScript. • Provided expert-level corrections and flagged hallucinations, inaccuracies, and reasoning errors. • Authored RLHF training data with a focus on firmware, sensor integration, and network protocols. • Maintained high annotation quality, exceeding a 95% quality score throughout the contract.
As an AI Data Annotation Specialist with Scale AI, I annotated and reviewed technical responses for large language model fine-tuning focused on STEM and engineering. My work included adversarial prompt creation, rubric evaluation, and contributing to the reliability and compliance of model outputs. The role required collaboration and strict adherence to annotation standards for language model safety evaluation. • Labeled and reviewed prompts and responses with focus on content quality, tone, and correctness. • Developed adversarial prompts to challenge and expose weaknesses in LLM code and math reasoning. • Evaluated output batches for harmful or non-compliant responses, contributing to safety benchmarks. • Supported teamwork by upholding consistent rubric standards across thousands of tasks.
Advanced Level Qualification, General Science
Secondary School Certificate, General Secondary Education
Embedded Systems Engineer
Embedded Systems Engineer