For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
A

Anthony Young

High-Ticket Travel RAG System Data Preparation and Embedding

KENYA flag
Chicago, Kenya
$20.00/hrIntermediateMindriftAppen

Key Skills

Software

MindriftMindrift
AppenAppen

Top Subject Matter

Luxury travel pricing and product information
Lead generation and conversational agent data
Legal Services & Contract Review

Top Data Types

DocumentDocument
TextText
ImageImage

Top Task Types

Text Generation
Segmentation
RLHF
Computer Programming Coding
Prompt Response Writing SFT

Freelancer Overview

High-Ticket Travel RAG System Data Preparation and Embedding. Brings 6+ years of professional experience across legal operations, contract review, compliance, and structured analysis. Core strengths include Internal and Proprietary Tooling. Education includes Doctor of Philosophy, University of Chicago (2019) and Bachelor of Science, Massachusetts Institute of Technology (MIT) (2014). AI-training focus includes data types such as Document and Text and labeling workflows including Text Generation.

IntermediateEnglish

Labeling Experience

Omnichannel Lead Data Labeling and AI Workflow Maintenance

TextText Generation
I designed and maintained an Omnichannel Agentic Orchestration system for consolidating and labeling multi-source lead data for AI processing. My workflow used data cleaning nodes and LLM-based pipelines to structure, format, and annotate text data from diverse channels. Outputs fueled prompt-based AI models and required continuous monitoring for data labeling quality and workflow integrity. • Built custom n8n workflows integrating WhatsApp, Email, Instagram, and Twilio for unified data input. • Applied validation and transformation routines to ensure reliable training/evaluation datasets. • Leveraged FastAPI backends for scalable, automated data processing and label reformatting. • Enhanced agent performance for high-volume lead generation applications through high-quality labeled data.

I designed and maintained an Omnichannel Agentic Orchestration system for consolidating and labeling multi-source lead data for AI processing. My workflow used data cleaning nodes and LLM-based pipelines to structure, format, and annotate text data from diverse channels. Outputs fueled prompt-based AI models and required continuous monitoring for data labeling quality and workflow integrity. • Built custom n8n workflows integrating WhatsApp, Email, Instagram, and Twilio for unified data input. • Applied validation and transformation routines to ensure reliable training/evaluation datasets. • Leveraged FastAPI backends for scalable, automated data processing and label reformatting. • Enhanced agent performance for high-volume lead generation applications through high-quality labeled data.

2024 - Present

High-Ticket Travel RAG System Data Preparation and Embedding

DocumentText Generation
I built a High-Ticket Travel RAG System by labeling and embedding unstructured travel pricing PDFs for superior retrieval accuracy. The process involved scraping over 200 documents, chunking them with NLP techniques, and creating embeddings using state-of-the-art LLMs for downstream question answering. My proprietary pipeline transformed raw, unstructured data into a reliable retrieval-augmented dataset for business-critical AI workflows. • Used pdfplumber and LangChain to extract and preprocess PDF data for effective document chunking. • Generated Gemini-based embeddings inserted into Supabase pgvector, boosting precision in retrieval. • Implemented robust quality checks to ensure data integrity and consistency across the labeled outputs. • Supported high-value, decision-centric applications with document-based AI training and inference.

I built a High-Ticket Travel RAG System by labeling and embedding unstructured travel pricing PDFs for superior retrieval accuracy. The process involved scraping over 200 documents, chunking them with NLP techniques, and creating embeddings using state-of-the-art LLMs for downstream question answering. My proprietary pipeline transformed raw, unstructured data into a reliable retrieval-augmented dataset for business-critical AI workflows. • Used pdfplumber and LangChain to extract and preprocess PDF data for effective document chunking. • Generated Gemini-based embeddings inserted into Supabase pgvector, boosting precision in retrieval. • Implemented robust quality checks to ensure data integrity and consistency across the labeled outputs. • Supported high-value, decision-centric applications with document-based AI training and inference.

2024 - Present

Education

U

University of Chicago

Doctor of Philosophy, Computer Science and Applied Mathematics

Doctor of Philosophy
2014 - 2019
M

Massachusetts Institute of Technology (MIT)

Bachelor of Science, Mathematics and Computer Science

Bachelor of Science
2010 - 2014

Work History

E

Euser AI Solutions

Lead AI Engineer & Founder

Chicago
2024 - Present
V

Various Enterprise Clients

Senior Automation Engineer (Contract & Freelance)

N/A
2021 - 2023