For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
Nicolas Diaz Durana

Nicolas Diaz Durana

LLM Evaluation and Text Classification Specialist in English & Spanish

Colombia flagBogota D.C, Colombia
$25.00/hrIntermediateAws SagemakerDoccanoLabelbox

Key Skills

Software

AWS SageMakerAWS SageMaker
DoccanoDoccano
LabelboxLabelbox
Label StudioLabel Studio
LightTagLightTag
ProdigyProdigy

Top Subject Matter

No subject matter listed

Top Data Types

AudioAudio
TextText

Top Task Types

Classification
Data Collection
Prompt Response Writing SFT
Question Answering
Text Summarization

Freelancer Overview

As a linguist and mathematician with a strong background in NLP, I have worked on projects involving text processing, text classification, and prompt engineering for LLMs. My experience includes developing rule-based and statistical approaches to language tasks, working with Python libraries such as NLTK and spaCy, and experimenting with prompt design to optimize responses from large language models. With my interdisciplinary expertise, I can contribute effectively to AI training, data labeling, and linguistic data analysis tasks.

IntermediateFrenchEnglishSpanish

Labeling Experience

WordCount_Teilur

Internal Proprietary ToolingTextClassification
This project identifies the most common words in five large datasets covering the following themes: data engineering, data analytics, data science, software engineering and business analytics, as well as the most common words for the five joined datasets as a whole. Datasets come in the form of csv documents, built from the webscraping of different webpages: GitHub, Documentation, Glassdoor and specific content sites (techical blogs and other similar sources). The total amount of words in the five datasets is 3.204 .121 We used a variety of libraries and packages including NLTK, collections, wordcloud, pandas, matplotlib and openpyxl. This report shows the steps that were followed, starting with the uploading of the datasets up until the writing of the excel files with the most common words per category. The project repository can be found in my github profile: https://github.com/nykolai-d/teilur_wordcount

This project identifies the most common words in five large datasets covering the following themes: data engineering, data analytics, data science, software engineering and business analytics, as well as the most common words for the five joined datasets as a whole. Datasets come in the form of csv documents, built from the webscraping of different webpages: GitHub, Documentation, Glassdoor and specific content sites (techical blogs and other similar sources). The total amount of words in the five datasets is 3.204 .121 We used a variety of libraries and packages including NLTK, collections, wordcloud, pandas, matplotlib and openpyxl. This report shows the steps that were followed, starting with the uploading of the datasets up until the writing of the excel files with the most common words per category. The project repository can be found in my github profile: https://github.com/nykolai-d/teilur_wordcount

2022 - 2023

LSTM_Brown_LOB

OtherTextClassification
This project trains a Long Short Term Memory (LSTM) network to detect and classify a text written in English according to a particular variant: whether it is British or American. The code was written in Python. We used nltk, sklearn, tensorflow, math and pandas. The full repository of this project can be found in my github profile: https://github.com/nykolai-d/LSTM_Brown_LOB/tree/main

This project trains a Long Short Term Memory (LSTM) network to detect and classify a text written in English according to a particular variant: whether it is British or American. The code was written in Python. We used nltk, sklearn, tensorflow, math and pandas. The full repository of this project can be found in my github profile: https://github.com/nykolai-d/LSTM_Brown_LOB/tree/main

2022 - 2022

Education

U

Universidad Konrad Lorenz

Bachelor's in Mathematics, Mathematics

Bachelor's in Mathematics
2017 - 2021
U

Université Rennes 2 in France

Master's in Linguistics, Languages, Literatures and Linguistics

Master's in Linguistics
2006 - 2008

Work History

P

Pixorize

Linguistic Project Manager

New York
2022 - 2024
P

Pixorize

Linguistic Project Manager

N/A
2022 - 2024