For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
Andrew Mondejar

Andrew Mondejar

LLM Evaluation, Prompt Eng & Code QA Specialist | Full-Stack Python/MERN

USA flagHialeah, Usa
$40.00/hrIntermediateLabelboxTolokaV7 Labs

Key Skills

Software

LabelboxLabelbox
TolokaToloka
V7 LabsV7 Labs

Top Subject Matter

No subject matter listed

Top Data Types

ImageImage
TextText
VideoVideo

Top Task Types

Computer Programming Coding
Diagnosis
Evaluation Rating
Fine Tuning
Prompt Response Writing SFT

Freelancer Overview

I'm a bilingual (English/Spanish) full-stack engineer turned AI-training specialist who lives in the overlap of LLM evaluation, prompt engineering, and rigorous data labeling. Over the past six years I've shipped MERN + Python micro-services at Vicsa & Co., authored 4000+ Bloom-aligned STEM MCQs, and created, scored, and de-hallucinated 2 000 + prompts for instruction-following, code-generation, and RLHF pipelines. My sweet spot is designing reusable ontologies, writing testable labeling guidelines, and then automating the boring parts with Pandas, NumPy, and scalable REST/GraphQL APIs. On live projects I've lifted model pass@k by 18 % cut page-load time 40 %, and kept 95 % test coverage with Jest & Playwright, all while mentoring global annotators on bias detection, consensus QA, and secure data handling (SOC 2).

IntermediateEnglishSpanish

Labeling Experience

Labelbox

RLHF Prompt Evaluation & Code Output Rating

LabelboxTextEvaluation RatingComputer Programming Coding
Curated 2 000 + multilingual prompts and completions for instruction-following and code-generation RLHF pipelines. Rated outputs for correctness, bias, and hallucination; wrote gold-standard responses; and auto-scored unit-test pass/fail results. Effort lifted model pass@1 accuracy +12 % and cut reviewer time 30 % under SOC-2 workflows.

Curated 2 000 + multilingual prompts and completions for instruction-following and code-generation RLHF pipelines. Rated outputs for correctness, bias, and hallucination; wrote gold-standard responses; and auto-scored unit-test pass/fail results. Effort lifted model pass@1 accuracy +12 % and cut reviewer time 30 % under SOC-2 workflows.

2023 - 2025
Labelbox

Bilingual STEM MCQ Authoring & Difficulty Classification

LabelboxTextClassificationQuestion Answering
Authored & annotated 4 000+ English/Spanish multiple-choice questions across Python, SQL, C++, Physics, and Calculus. Tagged each item for Bloom-level difficulty, distractor quality, and learning-objective alignment; validated with consensus QA scripts that raised item-bank validity to 96 % and shortened editorial cycles 25 %.

Authored & annotated 4 000+ English/Spanish multiple-choice questions across Python, SQL, C++, Physics, and Calculus. Tagged each item for Bloom-level difficulty, distractor quality, and learning-objective alignment; validated with consensus QA scripts that raised item-bank validity to 96 % and shortened editorial cycles 25 %.

2022 - 2025

Education

T

The University of Texas at Arlington

Bachelor's in Computer Science, Computer Science

Bachelor's in Computer Science
2019 - 2023

Work History

V

Vicsa & Co.

Al Prompt Engineer & Model Evaluator

Fort Worth
2023 - Present
V

Vicsa & Co.

Full-Stack Web Developer

Fort Worth
2023 - Present