Nicodemus Orenge - AI Evaluation, Annotation & RLHF Specialist, Outlier AI / Remotasks (Scale AI)

Key Skills

Software

Labelbox

CVAT

Supervisely

Remotasks

Scale AI

Top Subject Matter

Mathematics Domain Expertise

Computer Science

Scientific Reasoning

Top Data Types

Text

Image

Document

Top Task Types

No task types listed

Freelancer Overview

AI Evaluation, Annotation & RLHF Specialist, Outlier AI / Remotasks (Scale AI). Brings 2+ years of professional experience across complex professional workflows, research, and quality-focused execution. Core strengths include Labelbox, CVAT, and Supervisely. Education includes Bachelor of Science, University of North Texas (2015). AI-training focus includes data types such as Text and labeling workflows including Evaluation and Rating.

Expert

Labeling Experience

LLM Mathematical Reasoning Benchmark

Text

Independently designed and executed LLM Mathematical Reasoning Benchmark projects to systematically evaluate AI models. Assessed mathematical problem-solving, proof construction, and factual correctness of frontier language models using structured evaluation rubrics. Provided detailed comparative insights into open-source and GPT-4-class LLM mathematical performance for ongoing community reference. • Benchmarked algebra, calculus, and differential equations outputs using rigorous criteria. • Scored proofs for reasoning completeness, step validity, and solution correctness. • Compared open-source models and commercial LLMs across quantitative benchmarks. • Synthesized findings to advance state-of-the-art LLM evaluation protocols.

2023 - Present

AI Evaluation, Annotation & RLHF Specialist, Outlier AI / Remotasks (Scale AI)

LabelboxText

As an AI Evaluation, Annotation & RLHF Specialist, I evaluated AI-generated outputs for quality, accuracy, and instruction adherence on diverse datasets for leading AI labs. My responsibilities encompassed assessing mathematical proofs, code generation, and science answers across text, image, and video formats using structured rubrics and adversarial testing. I utilized advanced annotation tools such as Labelbox, CVAT, Supervisely, and Remotasks/Scale AI platforms to perform multimodal RLHF evaluation, preference ranking, and QA auditing. • Reviewed 70,000+ AI-generated outputs, maintaining high annotation accuracy and throughput. • Evaluated mathematical solutions, computer code, and scientific content, flagging inaccuracies and logical errors. • Developed and refined rubrics for mathematical and scientific evaluation, and supported calibrating new annotators. • Conducted adversarial testing, hallucination detection, and contributed to systematic failure analysis for model improvement.

2022 - Present

Education

U

University of North Texas

Bachelor of Science, Computational Physics

Bachelor of Science

2015 - 2015

Work History

G

G-Force Inc.

IT Systems & Data Support Analyst

Nairobi

2019 - 2020