Mathematics and Reasoning
Conduct rigorous technical review of training data for graduate-level mathematics models using Python (Sympy), Wolfram Alpha, and Overleaf/LaTeX.
Hire this AI Trainer
Sign in or create an account to invite AI Trainers to your job.
I specialize in AI data training, quality analysis, and process improvement, with hands-on experience in labeling and validating complex datasets for advanced mathematics and reasoning models. My work at Invisible Technologies involved rigorous technical review and annotation of training data using Python (Sympy), Wolfram Alpha, and LaTeX, consistently achieving over 90% accuracy and client satisfaction. I have developed documentation and feedback systems that reduced rework by 30%, and I’m skilled in optimizing model performance through advanced prompting techniques across AI Safety and Finance domains. My background also includes automating data pipelines with Python and SQL, ensuring data integrity for research and financial applications. I am fluent in both English and Spanish, and I thrive in remote, high-volume environments where precision and process optimization are essential.
Conduct rigorous technical review of training data for graduate-level mathematics models using Python (Sympy), Wolfram Alpha, and Overleaf/LaTeX.
For this project our team had the goal to create unseeded Enterprise Finance Persona prompts to simulate real-world scenarios across a variety of industries focused on finance department roles. These prompts will challenge the Model's ability to engage with Complex Reasoning, Probability, Statistics, and Word/Case Problems, File Formatting reflecting the decision-making processes typical of financial aspect in enterprise environments. The goal was to develop a comprehensive repository of tasks that encourage the Model to provide solutions addressing the needs and objectives of the Enterprise Finance Persona: find and correct mistakes, provide detailed explanations, analysis of the reasoning behind the proposed solutions.
This project required evaluating the LLM output of seeded prompts in a variety of fields related to data anaylisis. Every single prompt either contained or required the model to interpret/output data in one of the following formats: CSV, TSV, JSON, HTML, or Markdown. In addition to seeded prompts, our team also generated prompts following the same file formatting criteria outlined above. Some of the quality metrics to be followed in this project were the usefulness of the prompts generated, the response ranking, labeling of applicable errors, the quality of the edits made, the overall fluidity of the conversation, and the adherence to the default/system preamble.
This project consisted in generating prompts and evaluating LLM output related to sensitive content. The main goal of the project was to establish two types of safety constraints for the model, strict and contextual. Given the nature of the material generated and evaluated in each safety mode, the quality metrics for each mode were different. However, the minimum quality standard for this project was always kept to a minimum 90% alignment and failure to adhere to this standard for more than a week would result in project offboarding.
I was part of a project commissioned by one of the largest AI companies in the space. The labeling tasks consisted in identifying instruction following and truthfulness errors in the responses generated by the LLM, selecting the best response, and then editing it to make sure it complied with the prompt's intent and the client's style specifications. Additional evaluations tasks included writing prompts that would stump the model and evaluate completions with various system/default preambles to tests the limits of its instruction following capabilities. Our team adhered to a strict quality standards, requiring a minimum 85% alignment in our tasks across multiple dimensions that were evaluated during quality control.
Bachelor of Science, Economics
Quality Analyst
Head of Research