Angel Olivera - Bilingual Engineer: Expert in LLM, Chatbot & Image-Based training.

Key Skills

Software

Remotasks

Scale AI

Internal/Proprietary Tooling

Top Subject Matter

Training in Engineering and Exact Sciences

LLM training in Spanish

Image-text synthesis

Top Data Types

Computer Code Programming

Image

Text

Top Task Types

Classification

Computer Programming Coding

Evaluation Rating

Prompt Response Writing SFT

Red Teaming

Freelancer Overview

I've participated in several data labeling and AI training projects, where I utilized my skills and knowledge in digital design, programming, embedded systems, and mathematics to enhance AI models. In my work with ScaleAI (in collaboration with OpenAI), I used tools like Photoshop and Blender to create challenging assets for DALL-E, improving its image-to-text accuracy. I also developed advanced prompts in both English and Spanish to train language models in my areas of expertise, ensuring that conversations are detailed and increasingly enriching. I also participated in another project with OpenAI, identifying and correcting LLM problems (mainly hallucinations, factuality, and appropriateness-related errors) in GPT-3 and GPT-4 to improve their precision and safety for users. These experiences have strengthened my ability to drive improvements in AI model performance.

IntermediateEnglishSpanishPortuguese

Labeling Experience

ScaleAI/OpenAI Image-Text

Internal Proprietary ToolingImageClassificationText Generation

The OpenAI Image/Text project aimed to enhance the image-to-text synthesis capabilities of AI models like DALL-E. The scope involved creating a diverse range of graphic assets using tools such as Paint, Photoshop and Blender to challenge and improve the AI's synthesis accuracy. I was responsible for designing and labeling datasets with intricate details to ensure the model's ability to generate accurate textual descriptions from visual inputs. The project was large-scale, involving thousands of images across various categories. Quality measures included iterative feedback loops and continuous testing to refine outputs, ensuring high precision and consistency in the AI's performance.

2023 - 2024

ScaleAI/OpenAI Hallucinations

Internal Proprietary ToolingTextClassificationQuestion Answering

The Hallucinations project aimed to identify and rectify errors in language models, particularly focusing on hallucinations, factual inaccuracies, appropriateness issues, etc.. in GPT-3 and GPT-4. The scope involved analyzing model outputs, identifying discrepancies, and providing corrective labels and feedback to improve precision and safety. Data labeling tasks included detailed error categorization and corrective suggestions, ensuring comprehensive documentation of each identified issue. The project was extensive, encompassing thousands of model interactions and outputs. Quality measures included rigorous validation processes and adherence to ethical guidelines to ensure improvements in user safety and model reliability.

2023 - 2023

ScaleAI Domain Expertise Project

Scale AITextClassificationQuestion Answering

This project focused on leveraging domain expertise (in my case, programming and mathematics) to enhance language model proficiency. The scope included developing advanced prompts and datasets to train language models on complex mathematical and programming concepts. I performed data labeling tasks that involved creating and validating prompts in both English and Spanish, ensuring they were challenging yet educational. The project size was significant, involving hundreds of prompts and test cases. Quality measures adhered to included peer reviews and benchmarking against established academic standards to maintain high accuracy and relevance in the model's outputs.

2023 - 2023

ScaleAI Quality Assurance Specialist MX

Scale AIVideoSegmentationClassification

The MXQA program was an extensive project comprising several parallel pipelines. In one pipeline, I categorized social media videos from platforms like Instagram and TikTok, using my expertise in English and Spanish, while applying basic knowledge in French and Portuguese, which was sufficient for the project requirements. Another pipeline involved reviewing videos frame by frame to identify specific motion-related activities, ensuring detailed and accurate categorization. A third pipeline focused on reviewing product SKUs for online marketplaces, where I was responsible for accurate categorization and description to enhance product visibility for potential buyers. I began as a trainer, performing data annotation tasks such as extracting soundtracks, subtitles, and detecting inappropriate content. By the second month, I advanced to a reviewer role across all pipelines, where I ensured high-quality outputs through detailed quality assurance checks before final delivery to cllient.

2022 - 2023

ScaleAI MathGen

Internal Proprietary ToolingTextClassificationQuestion Answering

The MathGen project focused on training a mathematical language model to improve its problem-solving capabilities. The scope involved creating datasets and exercises covering a wide range of mathematical topics, from basic arithmetic to advanced calculus. I performed data labeling tasks by generating and verifying mathematical problems and solutions, ensuring clarity and correctness in the datasets. The project size was moderate, involving hundreds of problem sets and solutions. Quality measures adhered to included cross-verification with subject matter experts and adherence to educational standards, ensuring high accuracy and educational value in the model's outputs.

2022 - 2022

Education

U

Universidad Politécnica de Yucatán

Bachelor's Degree in Embedded Computing Systems, Embedded Computing Systems, Programming, Computer Science, Exact Sciences.

Bachelor's Degree in Embedded Computing Systems

2019 - 2023

Work History

S

Scale AI

Expert AI Data Trainer

Remote

2022 - 2024

S

Scale AI

MXQA - Quality Assurance Specialist

Mérida

2022 - 2023