For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
N

Nicodemus Orenge

AI Evaluation, Annotation & RLHF Specialist, Outlier AI / Remotasks (Scale AI)

Kenya flagNairobi, Kenya
ExpertLabelboxCVATSupervisely

Key Skills

Software

LabelboxLabelbox
CVATCVAT
SuperviselySupervisely
RemotasksRemotasks
Scale AIScale AI

Top Subject Matter

Mathematics Domain Expertise
Computer Science
Scientific Reasoning

Top Data Types

TextText
ImageImage
DocumentDocument

Top Task Types

No task types listed

Freelancer Overview

AI Evaluation, Annotation & RLHF Specialist, Outlier AI / Remotasks (Scale AI). Brings 2+ years of professional experience across complex professional workflows, research, and quality-focused execution. Core strengths include Labelbox, CVAT, and Supervisely. Education includes Bachelor of Science, University of North Texas (2015). AI-training focus includes data types such as Text and labeling workflows including Evaluation and Rating.

Expert

Labeling Experience

LLM Mathematical Reasoning Benchmark

Text
Independently designed and executed LLM Mathematical Reasoning Benchmark projects to systematically evaluate AI models. Assessed mathematical problem-solving, proof construction, and factual correctness of frontier language models using structured evaluation rubrics. Provided detailed comparative insights into open-source and GPT-4-class LLM mathematical performance for ongoing community reference. • Benchmarked algebra, calculus, and differential equations outputs using rigorous criteria. • Scored proofs for reasoning completeness, step validity, and solution correctness. • Compared open-source models and commercial LLMs across quantitative benchmarks. • Synthesized findings to advance state-of-the-art LLM evaluation protocols.

Independently designed and executed LLM Mathematical Reasoning Benchmark projects to systematically evaluate AI models. Assessed mathematical problem-solving, proof construction, and factual correctness of frontier language models using structured evaluation rubrics. Provided detailed comparative insights into open-source and GPT-4-class LLM mathematical performance for ongoing community reference. • Benchmarked algebra, calculus, and differential equations outputs using rigorous criteria. • Scored proofs for reasoning completeness, step validity, and solution correctness. • Compared open-source models and commercial LLMs across quantitative benchmarks. • Synthesized findings to advance state-of-the-art LLM evaluation protocols.

2023 - Present
Labelbox

AI Evaluation, Annotation & RLHF Specialist, Outlier AI / Remotasks (Scale AI)

LabelboxText
As an AI Evaluation, Annotation & RLHF Specialist, I evaluated AI-generated outputs for quality, accuracy, and instruction adherence on diverse datasets for leading AI labs. My responsibilities encompassed assessing mathematical proofs, code generation, and science answers across text, image, and video formats using structured rubrics and adversarial testing. I utilized advanced annotation tools such as Labelbox, CVAT, Supervisely, and Remotasks/Scale AI platforms to perform multimodal RLHF evaluation, preference ranking, and QA auditing. • Reviewed 70,000+ AI-generated outputs, maintaining high annotation accuracy and throughput. • Evaluated mathematical solutions, computer code, and scientific content, flagging inaccuracies and logical errors. • Developed and refined rubrics for mathematical and scientific evaluation, and supported calibrating new annotators. • Conducted adversarial testing, hallucination detection, and contributed to systematic failure analysis for model improvement.

As an AI Evaluation, Annotation & RLHF Specialist, I evaluated AI-generated outputs for quality, accuracy, and instruction adherence on diverse datasets for leading AI labs. My responsibilities encompassed assessing mathematical proofs, code generation, and science answers across text, image, and video formats using structured rubrics and adversarial testing. I utilized advanced annotation tools such as Labelbox, CVAT, Supervisely, and Remotasks/Scale AI platforms to perform multimodal RLHF evaluation, preference ranking, and QA auditing. • Reviewed 70,000+ AI-generated outputs, maintaining high annotation accuracy and throughput. • Evaluated mathematical solutions, computer code, and scientific content, flagging inaccuracies and logical errors. • Developed and refined rubrics for mathematical and scientific evaluation, and supported calibrating new annotators. • Conducted adversarial testing, hallucination detection, and contributed to systematic failure analysis for model improvement.

2022 - Present

Education

U

University of North Texas

Bachelor of Science, Computational Physics

Bachelor of Science
2015 - 2015

Work History

G

G-Force Inc.

IT Systems & Data Support Analyst

Nairobi
2019 - 2020