Key Skills

Software

Roboflow

Internal/Proprietary Tooling

Top Subject Matter

Document Classification

Q&A from Documents

Image Classification

Top Data Types

Document

Image

Text

Top Task Types

Bounding Box

Classification

Fine Tuning

Text Generation

Text Summarization

Freelancer Overview

Experienced data labeler with a strong background in document processing and AI training data preparation. Currently pursuing a degree in Data Science, I bring hands-on experience from my role at Exabyte Company, where I have been actively involved in labeling data for machine learning models and performing human evaluation for document processing tools. My academic projects have further honed my skills in entity and relationship extraction, making me adept at preparing high-quality training data for various NLP tasks.

Entry LevelEnglishSpanish

Labeling Experience

Image Document Classification

RoboflowDocumentClassification

We conducted a crucial document classification project to streamline the document processing pipeline for an advanced Q&A system using Large Language Models (LLMs). Scope of the project: The primary goal was to perform binary classification of documents into two categories: "simple" (containing only text) or "complex" (containing any visual elements such as charts, graphs, tables, etc.). This classification determines whether additional processing with a Vision-capable Large Language Model (VLLM) is required for improved indexing and Q&A performance. Specific data labeling tasks performed: Binary document classification: Categorized each document as either simple or complex based on visual inspection. Visual element presence check: Identified the presence of any non-textual elements that would classify a document as complex. Project size: We classified approximately 5000 documents from various domains, including financial reports, scientific papers, technical manuals, etc.

2024

Document Q&A

Internal Proprietary ToolingDocumentQuestion Answering

We conducted an extensive evaluation of Brainbox, a tool for Q&A over documents using Large Language Models (LLMs). The scope of the project involved identifying and categorizing hallucinations in answers generated by the LLMs. Specific data labeling tasks performed: Answer accuracy assessment: Evaluated LLM-generated answers against source documents for factual correctness. Hallucination identification: Flagged instances where the LLM provided information not present in the source material. Hallucination categorization: Classified types of hallucinations (e.g., factual errors, unsupported inferences, contradictions). Confidence score validation: Compared LLM-reported confidence levels with actual answer accuracy. Question-answer relevance rating: Assessed how well answers addressed the given questions. Project size: We analyzed approximately 1000 question-answer pairs across 100 diverse documents, covering topics like legal contracts, technical manuals, and academic papers.

2024

Education

No Education added yet

Sofia Catalina G. hasn’t added any Education History to their OpenTrain profile yet.

Work History

E

ExaByte Company

Data Labeler and LLM Quality Evaluator

Bogota

2023 - Present