Rafael Delfin - Quality Analyst - AI Safety and Financial Research

Key Skills

Software

Internal/Proprietary Tooling

Top Subject Matter

Finance and File Formatting

LLM evaluation in Spanish

Safety

Top Data Types

Image

Text

Video

Top Task Types

RLHF

Fine Tuning

Evaluation Rating

Prompt Response Writing SFT

Text Generation

Question Answering

Freelancer Overview

I specialize in AI data training, quality analysis, and process improvement, with hands-on experience in labeling and validating complex datasets for advanced mathematics and reasoning models. My work at Invisible Technologies involved rigorous technical review and annotation of training data using Python (Sympy), Wolfram Alpha, and LaTeX, consistently achieving over 90% accuracy and client satisfaction. I have developed documentation and feedback systems that reduced rework by 30%, and I’m skilled in optimizing model performance through advanced prompting techniques across AI Safety and Finance domains. My background also includes automating data pipelines with Python and SQL, ensuring data integrity for research and financial applications. I am fluent in both English and Spanish, and I thrive in remote, high-volume environments where precision and process optimization are essential.

IntermediateEnglishSpanish

Labeling Experience

Mathematics and Reasoning

Internal Proprietary ToolingTextQuestion AnsweringText Generation

Conduct rigorous technical review of training data for graduate-level mathematics models using Python (Sympy), Wolfram Alpha, and Overleaf/LaTeX.

2025 - 2025

SFT Finance Campaing

Internal Proprietary ToolingTextRLHFFine Tuning

For this project our team had the goal to create unseeded Enterprise Finance Persona prompts to simulate real-world scenarios across a variety of industries focused on finance department roles. These prompts will challenge the Model's ability to engage with Complex Reasoning, Probability, Statistics, and Word/Case Problems, File Formatting reflecting the decision-making processes typical of financial aspect in enterprise environments. The goal was to develop a comprehensive repository of tasks that encourage the Model to provide solutions addressing the needs and objectives of the Enterprise Finance Persona: find and correct mistakes, provide detailed explanations, analysis of the reasoning behind the proposed solutions.

2024 - 2024

File Formatting Campaign

Internal Proprietary ToolingTextRLHFFine Tuning

This project required evaluating the LLM output of seeded prompts in a variety of fields related to data anaylisis. Every single prompt either contained or required the model to interpret/output data in one of the following formats: CSV, TSV, JSON, HTML, or Markdown. In addition to seeded prompts, our team also generated prompts following the same file formatting criteria outlined above. Some of the quality metrics to be followed in this project were the usefulness of the prompts generated, the response ranking, labeling of applicable errors, the quality of the edits made, the overall fluidity of the conversation, and the adherence to the default/system preamble.

2024 - 2024

Safety Campaign

Internal Proprietary ToolingTextText GenerationRLHF

This project consisted in generating prompts and evaluating LLM output related to sensitive content. The main goal of the project was to establish two types of safety constraints for the model, strict and contextual. Given the nature of the material generated and evaluated in each safety mode, the quality metrics for each mode were different. However, the minimum quality standard for this project was always kept to a minimum 90% alignment and failure to adhere to this standard for more than a week would result in project offboarding.

2024 - 2024

Generalist AI Data Training

Internal Proprietary ToolingTextRLHFFine Tuning

I was part of a project commissioned by one of the largest AI companies in the space. The labeling tasks consisted in identifying instruction following and truthfulness errors in the responses generated by the LLM, selecting the best response, and then editing it to make sure it complied with the prompt's intent and the client's style specifications. Additional evaluations tasks included writing prompts that would stump the model and evaluate completions with various system/default preambles to tests the limits of its instruction following capabilities. Our team adhered to a strict quality standards, requiring a minimum 85% alignment in our tasks across multiple dimensions that were evaluated during quality control.

2023 - 2024

Education

U

Universidad de las Américas, Puebla

Bachelor of Science, Economics

Bachelor of Science

2015 - 2015

Work History

I

Invisible Technologies

Quality Analyst

Cancun

2025 - 2025

B

Brave New Coin

Head of Research

Auckland

2016 - 2018