Alain Chirino Trujillo - Bilingual Language Engineer in AI research and design

Key Skills

Software

Appen

Data Annotation Tech

Dataloop

Prodigy

SuperAnnotate

Top Subject Matter

Figurative language annotation in Spanish

Video captioning and transcription

Picture emotion detection labeling

Top Data Types

Image

Text

Video

Top Task Types

Classification

Data Collection

Entity Ner Classification

Fine Tuning

Prompt Response Writing SFT

Freelancer Overview

I have extensive experience in data labeling and AI training data, particularly in the field of Natural Language Processing (NLP). My work has involved annotating and labeling large datasets, such as over 10,000 examples for a Multilingual Euphemisms project, where I utilized advanced Corpus Linguistics techniques and tools. I have also applied machine learning methods to fine-tune language models, like BERT and RoBERTa, improving their performance on tasks such as named entity recognition. My ability to manage and process large-scale data, combined with a deep understanding of machine learning and NLP, has enabled me to deliver high-quality training data that significantly enhances model accuracy and adaptability across different domains. In addition to my hands-on experience, I have a strong academic background, including a Master's in Computational Linguistics, where I contributed to projects involving multilingual experiments and semantic network analysis. My expertise is further supported by my ability to develop and fine-tune models using techniques like Domain-Adaptive Pre-training (DAPT) and span-level masking. These skills, coupled with a solid foundation in Python, TensorFlow, and data annotation tools, set me apart in the field of AI training data, enabling me to effectively bridge the gap between data collection and model optimization.

ExpertEnglishSpanish

Labeling Experience

Emotion Recognition Detection in Conversations using MELD

SuperannotateVideoEmotion Recognition

In this project, I focused on developing a multi-modal model for the automated detection of questionable or obscene content in children's media, utilizing multilingual methods of natural language processing, text/video analysis, and deep learning. A key aspect of my role involved fine-tuning the MARLIN model for Multimodal Emotion Detection and Recognition. This process required precise annotation skills to accurately label and integrate various modalities, including facial expressions, audio cues, and textual information, to enhance the model's emotion classification accuracy. Additionally, I served as a liaison and translator between Mexican and American research groups, facilitating effective collaboration and ensuring the successful integration of diverse perspectives into the project. My expertise in data annotation was crucial in training the model to recognize complex emotional cues across different languages and cultures.

2024

Multilingual Euphemisms Project

AppenTextClassification

The Multilingual Euphemism Disambiguation for Potentially Euphemistic Terms (PETs) project focuses on identifying and disambiguating euphemistic language across multiple languages. My contributions included collecting and annotating over 6,000 data examples and enhancing algorithms in Python to generate PETs for research. I applied machine learning techniques to train and fine-tune models like BERT and RoBERTa, improving their ability to handle the ambiguity and vagueness inherent in euphemistic language. The project aimed to enhance the performance of NLP models in multilingual contexts, with findings shared through publications and workshops.

2022 - 2024

Education

M

Montclair State University

Master of Science, Computational Linguistics

Master of Science

2022 - 2024

U

UCP Enrique Jose Varona

Bachelors of Arts, English

Bachelors of Arts

2010 - 2015

Work History

U

University of Houston

Machine Learning Researcher (Intern)

Puebla

2024 - 2024

M

Montclair State University

Graduate Researcher Assistant

Montclair, NJ

2022 - 2024