Nicolas Diaz Durana - LLM Evaluation and Text Classification Specialist in English & Spanish

Key Skills

Software

AWS SageMaker

Doccano

Labelbox

Label Studio

LightTag

Prodigy

Top Subject Matter

No subject matter listed

Top Data Types

Audio

Text

Top Task Types

Classification

Data Collection

Prompt Response Writing SFT

Question Answering

Text Summarization

Freelancer Overview

As a linguist and mathematician with a strong background in NLP, I have worked on projects involving text processing, text classification, and prompt engineering for LLMs. My experience includes developing rule-based and statistical approaches to language tasks, working with Python libraries such as NLTK and spaCy, and experimenting with prompt design to optimize responses from large language models. With my interdisciplinary expertise, I can contribute effectively to AI training, data labeling, and linguistic data analysis tasks.

IntermediateFrenchEnglishSpanish

Labeling Experience

WordCount_Teilur

Internal Proprietary ToolingTextClassification

This project identifies the most common words in five large datasets covering the following themes: data engineering, data analytics, data science, software engineering and business analytics, as well as the most common words for the five joined datasets as a whole. Datasets come in the form of csv documents, built from the webscraping of different webpages: GitHub, Documentation, Glassdoor and specific content sites (techical blogs and other similar sources). The total amount of words in the five datasets is 3.204 .121 We used a variety of libraries and packages including NLTK, collections, wordcloud, pandas, matplotlib and openpyxl. This report shows the steps that were followed, starting with the uploading of the datasets up until the writing of the excel files with the most common words per category. The project repository can be found in my github profile: https://github.com/nykolai-d/teilur_wordcount

2022 - 2023

LSTM_Brown_LOB

OtherTextClassification

This project trains a Long Short Term Memory (LSTM) network to detect and classify a text written in English according to a particular variant: whether it is British or American. The code was written in Python. We used nltk, sklearn, tensorflow, math and pandas. The full repository of this project can be found in my github profile: https://github.com/nykolai-d/LSTM_Brown_LOB/tree/main

2022 - 2022

Education

U

Universidad Konrad Lorenz

Bachelor's in Mathematics, Mathematics

Bachelor's in Mathematics

2017 - 2021

U

Université Rennes 2 in France

Master's in Linguistics, Languages, Literatures and Linguistics

Master's in Linguistics

2006 - 2008

Work History

P

Pixorize

Linguistic Project Manager

New York

2022 - 2024

P

Pixorize

Linguistic Project Manager

N/A

2022 - 2024