For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
M

Michael Umeokoli

Software Engineer

UNITED_KINGDOM flag
Manchester, United Kingdom
$13.00/hrExpertOneformaMercorScale AI

Key Skills

Software

OneFormaOneForma
MercorMercor
Scale AIScale AI
Data Annotation TechData Annotation Tech

Top Subject Matter

Software Engineering / Developer Tools
Generative AI / LLMs (RLHF, model comparison, preference ranking, summarization for both text and code )
Computer Vision / Image Generation (image model evaluation project)

Top Data Types

TextText
ImageImage
AudioAudio

Top Task Types

Evaluation Rating
Entity Ner Classification
Question Answering
Object Detection
Computer Programming Coding
Classification
Text Generation
Text Summarization
RLHF
Transcription
Data Collection
Bounding Box
Function Calling
Prompt Response Writing SFT
Fine Tuning

Freelancer Overview

Software Engineer. Brings 5+ years of professional experience across complex professional workflows, research, and quality-focused execution. Education includes Master of Science, Manchester Metropolitan University.

ExpertEnglishIgbo

Labeling Experience

AI Trainer

Computer Code ProgrammingRLHF
Worked on multiple projects evaluating code responses produced by different AI models. For each prompt, I reviewed two candidate code solutions, assessed them against a detailed rubric (including correctness, efficiency, code quality/readability, error handling, security considerations, and adherence to user requirements), and assigned structured scores. I then ranked which model response performed better for each criterion and produced an overall judgment. This work required careful execution of test cases, deep understanding of programming concepts (e.g. algorithms, API usage, concurrency, data structures), and consistent application of labeling guidelines. The resulting annotations were used to train and fine-tune AI models for higher-quality code generation and more reliable developer assistance. Languages and ecosystems frequently involved included Python, Go, JavaScript/TypeScript, and backend API patterns. I also provided written rationales for decisions where required, helping model trainers and researchers understand edge cases and nuanced trade-offs between different code solutions.

Worked on multiple projects evaluating code responses produced by different AI models. For each prompt, I reviewed two candidate code solutions, assessed them against a detailed rubric (including correctness, efficiency, code quality/readability, error handling, security considerations, and adherence to user requirements), and assigned structured scores. I then ranked which model response performed better for each criterion and produced an overall judgment. This work required careful execution of test cases, deep understanding of programming concepts (e.g. algorithms, API usage, concurrency, data structures), and consistent application of labeling guidelines. The resulting annotations were used to train and fine-tune AI models for higher-quality code generation and more reliable developer assistance. Languages and ecosystems frequently involved included Python, Go, JavaScript/TypeScript, and backend API patterns. I also provided written rationales for decisions where required, helping model trainers and researchers understand edge cases and nuanced trade-offs between different code solutions.

2024 - Present

Text Evaluetor

TextRLHF
Evaluated and ranked text responses from different AI models for the same prompts. For each task, I reviewed multiple candidate responses, scored them against a clear rubric (e.g. correctness, relevance, clarity, level of detail, tone, and safety/policy compliance), and selected which response was better overall and on specific criteria. The work involved careful reading, comparison, and consistent application of guidelines to produce high‑quality preference labels and scalar ratings. These annotations were used as RLHF training data and evaluation signals to compare models, improve response quality, and align model behavior more closely with user expectations. Tasks covered a range of domains including general knowledge, step‑by‑step reasoning, coding assistance, and professional writing. I also flagged edge cases (e.g. hallucinations, unsafe content, or incomplete reasoning) to help refine labeling policies and model safety. Tasks covered a range of domains including general knowledge, step‑by‑step reasoning, coding assistance, and professional writing. I also flagged edge cases (e.g. hallucinations, unsafe content, or incomplete reasoning) to help refine labeling policies and model safety.

Evaluated and ranked text responses from different AI models for the same prompts. For each task, I reviewed multiple candidate responses, scored them against a clear rubric (e.g. correctness, relevance, clarity, level of detail, tone, and safety/policy compliance), and selected which response was better overall and on specific criteria. The work involved careful reading, comparison, and consistent application of guidelines to produce high‑quality preference labels and scalar ratings. These annotations were used as RLHF training data and evaluation signals to compare models, improve response quality, and align model behavior more closely with user expectations. Tasks covered a range of domains including general knowledge, step‑by‑step reasoning, coding assistance, and professional writing. I also flagged edge cases (e.g. hallucinations, unsafe content, or incomplete reasoning) to help refine labeling policies and model safety. Tasks covered a range of domains including general knowledge, step‑by‑step reasoning, coding assistance, and professional writing. I also flagged edge cases (e.g. hallucinations, unsafe content, or incomplete reasoning) to help refine labeling policies and model safety.

2024 - 2024

Image Evaluator

ImageEvaluation Rating
Evaluated image outputs from two different generative AI models to improve image quality using human feedback. For each prompt, I reviewed candidate images from both models and scored them along multiple dimensions including prompt fidelity, visual quality, composition, style consistency, and absence of safety issues or policy violations. Using a detailed rubric, I assigned structured ratings to each image and selected a preferred response per criterion and overall. My annotations were then used as RLHF-style preference data to compare models, identify strengths and weaknesses, and guide further model training and fine‑tuning for higher quality, safer image generation.

Evaluated image outputs from two different generative AI models to improve image quality using human feedback. For each prompt, I reviewed candidate images from both models and scored them along multiple dimensions including prompt fidelity, visual quality, composition, style consistency, and absence of safety issues or policy violations. Using a detailed rubric, I assigned structured ratings to each image and selected a preferred response per criterion and overall. My annotations were then used as RLHF-style preference data to compare models, identify strengths and weaknesses, and guide further model training and fine‑tuning for higher quality, safer image generation.

2024 - 2024

Education

N

Nnamdi Azikiwe University

Bachelor's, Agriculture

Bachelor's
2016 - 2021
M

Manchester Metropolitan University

Master of Science, Computer Science

Master of Science
Not specified

Work History

B

Bubex Labs

Software Engineer

Manchester
2025 - Present
U

University of Manchester

Research Software Engineer

Manchester
2024 - 2025