Simpore Sheick - LLM Evaluator Expert - Scale AI

Key Skills

Software

Other

Scale AI

Internal/Proprietary Tooling

Remotasks

Mercor

Top Subject Matter

Software Engineering and Computer Science

LLM Evaluation and Q&A

Top Data Types

Text

Image

Computer Code Programming

Top Label Types

RLHF

Text Generation

Question Answering

Fine Tuning

Text Summarization

Evaluation Rating

Computer Programming Coding

Prompt Response Writing SFT

Freelancer Overview

AI Math Expert - Outlier AI. Brings 4+ years of professional experience across complex professional workflows, research, and quality-focused execution. Core strengths include Internal, Proprietary Tooling, and Other. Education includes Master of Science, Efrei Paris (2026) and Non-degree Exchange Program, Yuan Ze University (2024). AI-training focus includes data types such as Computer Code, Programming, and Text and labeling workflows including RLHF, Evaluation, and Rating.

IntermediateFrenchEnglish

Labeling Experience

AI Math Expert - Outlier AI

RLHF

As an AI Math Expert at Outlier AI, I evaluated and ranked AI-generated code using RLHF methodologies. The primary focus was on improving LLM performance on algorithmic, data structure, and benchmark problems. I also designed prompts and test cases to enhance model accuracy in a wide range of technical tasks. • Evaluated Python, TypeScript, and JavaScript code for correctness and adherence to best practices. • Designed and implemented computer science prompts and test cases. • Validated code quality on Docker/Kubernetes workflows and CI/CD pipelines. • Focused on benchmarking model outputs with LeetCode and HackerRank problems.

2024 - Present

Code Evaluator

Computer Code ProgrammingComputer Programming Coding

Evaluate some code generated by Ai that tried to solve some GitHub issues

2025 - 2025

LLM Evaluation Team Project

OtherText

In the LLM Evaluation team project, I helped build and evaluate a Streamlit web app for real-time comparison of LLM responses. My main tasks included using DeepEval to assess context precision, recall, and relevancy in grounded Q&A setups. This project involved evaluating RAG tasks across various document uploads. • Compared Gemini Pro and LLaMA model outputs using established evaluation metrics. • Measured and reported on Q&A context accuracy with PDF, DOCX, and TXT data. • Implemented Pinecone retrieval for enhancing information relevancy. • Collaborated in a team to analyze evaluation results and document findings.

2024 - 2025

Education

E

Efrei Paris

Master of Science, Cybersecurity and Cloud Computing

Master of Science

2023 - 2026

Y

Yuan Ze University

Non-degree Exchange Program, Data Science

Non-degree Exchange Program

2023 - 2024

Work History

H

Headstarter

Software Engineering Resident

Remote

2024 - Present

I

Informatique CDC

Software Engineer Apprentice

Paris

2023 - Present