Tomas Mejia - AI Training Data Specialist – Software

Key Skills

Software

Internal/Proprietary Tooling

Top Subject Matter

Healthcare – Medical Terminology & Clinical Conversations

Artificial Intelligence – LLM Evaluation & Prompt Quality

Software & Technology – App Development & Data Pipelines

Top Data Types

Image

Text

Document

Top Task Types

Fine Tuning

Classification

Segmentation

Freelancer Overview

I have hands-on experience designing and implementing AI data pipelines that required the same core skills central to data labeling and AI training work: precision, structured thinking, and a deep understanding of how data quality directly impacts model performance. In my PlantCare AI project, I built a full RAG pipeline from scratch — handling data chunking, embedding generation with sentence-transformers, cosine-similarity retrieval, and vector storage — which gave me direct exposure to how training data must be structured and curated for LLMs to produce reliable outputs. My work with multi-agent LLM systems using LangChain and Gemini API also required rigorous prompt engineering and continuous evaluation of model outputs for accuracy, relevance, and consistency — tasks that closely mirror real-world AI training data workflows. What sets me apart is the combination of technical AI/ML expertise with native-level bilingual fluency in Spanish and English (C1 certified). My two years of professional experience as a Medical Interpreter trained me to catch subtle linguistic errors, maintain strict quality standards, and operate with precision in high-stakes, zero-tolerance-for-mistakes environments — qualities that translate directly into high-quality data annotation and model evaluation. I am already deeply familiar with AI development tools such as Claude, Cursor, and GitHub Copilot, and I understand how human feedback shapes model behavior, making me an informed and effective contributor to any AI training data pipeline.

Entry LevelEnglishSpanish

Labeling Experience

AI Pipeline Engineer & LLM Data Quality Specialist

TextFine Tuning

Designed and implemented a full RAG (Retrieval-Augmented Generation) pipeline from scratch for PlantCare AI, including data chunking, embedding generation with sentence-transformers, cosine-similarity retrieval, and vector storage using Supabase pgvector. Built and evaluated a 4-agent LangChain system (Vision, Knowledge, Analysis, Response), requiring continuous assessment of LLM outputs for accuracy, relevance, and consistency. Applied prompt engineering techniques to improve model response quality and reduce hallucinations across multi-agent workflows.

2025 - 2025

Education

U

Universidad Tecnológica de Pereira

Bachelor of Science, Software Development

Bachelor of Science

2022

Work History

I

Interpretia

Medical Interpreter

Pereira

2024 - Present

A

Akorbi

Medical Interpreter

Pereira

2022 - 2023