For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
Sofia Catalina Galindo Mora

Sofia Catalina Galindo Mora

Data Labeling for LLM and Document AI in English and Spanish

Colombia flagBogota, Colombia
$10.00/hrEntry LevelRoboflowInternal Proprietary Tooling

Key Skills

Software

RoboflowRoboflow
Internal/Proprietary Tooling

Top Subject Matter

Document Classification
Q&A from Documents
Image Classification

Top Data Types

DocumentDocument
ImageImage
TextText

Top Task Types

Bounding Box
Classification
Fine Tuning
Text Generation
Text Summarization

Freelancer Overview

Experienced data labeler with a strong background in document processing and AI training data preparation. Currently pursuing a degree in Data Science, I bring hands-on experience from my role at Exabyte Company, where I have been actively involved in labeling data for machine learning models and performing human evaluation for document processing tools. My academic projects have further honed my skills in entity and relationship extraction, making me adept at preparing high-quality training data for various NLP tasks.

Entry LevelEnglishSpanish

Labeling Experience

Roboflow

Image Document Classification

RoboflowDocumentClassification
We conducted a crucial document classification project to streamline the document processing pipeline for an advanced Q&A system using Large Language Models (LLMs). Scope of the project: The primary goal was to perform binary classification of documents into two categories: "simple" (containing only text) or "complex" (containing any visual elements such as charts, graphs, tables, etc.). This classification determines whether additional processing with a Vision-capable Large Language Model (VLLM) is required for improved indexing and Q&A performance. Specific data labeling tasks performed: Binary document classification: Categorized each document as either simple or complex based on visual inspection. Visual element presence check: Identified the presence of any non-textual elements that would classify a document as complex. Project size: We classified approximately 5000 documents from various domains, including financial reports, scientific papers, technical manuals, etc.

We conducted a crucial document classification project to streamline the document processing pipeline for an advanced Q&A system using Large Language Models (LLMs). Scope of the project: The primary goal was to perform binary classification of documents into two categories: "simple" (containing only text) or "complex" (containing any visual elements such as charts, graphs, tables, etc.). This classification determines whether additional processing with a Vision-capable Large Language Model (VLLM) is required for improved indexing and Q&A performance. Specific data labeling tasks performed: Binary document classification: Categorized each document as either simple or complex based on visual inspection. Visual element presence check: Identified the presence of any non-textual elements that would classify a document as complex. Project size: We classified approximately 5000 documents from various domains, including financial reports, scientific papers, technical manuals, etc.

2024

Document Q&A

Internal Proprietary ToolingDocumentQuestion Answering
We conducted an extensive evaluation of Brainbox, a tool for Q&A over documents using Large Language Models (LLMs). The scope of the project involved identifying and categorizing hallucinations in answers generated by the LLMs. Specific data labeling tasks performed: Answer accuracy assessment: Evaluated LLM-generated answers against source documents for factual correctness. Hallucination identification: Flagged instances where the LLM provided information not present in the source material. Hallucination categorization: Classified types of hallucinations (e.g., factual errors, unsupported inferences, contradictions). Confidence score validation: Compared LLM-reported confidence levels with actual answer accuracy. Question-answer relevance rating: Assessed how well answers addressed the given questions. Project size: We analyzed approximately 1000 question-answer pairs across 100 diverse documents, covering topics like legal contracts, technical manuals, and academic papers.

We conducted an extensive evaluation of Brainbox, a tool for Q&A over documents using Large Language Models (LLMs). The scope of the project involved identifying and categorizing hallucinations in answers generated by the LLMs. Specific data labeling tasks performed: Answer accuracy assessment: Evaluated LLM-generated answers against source documents for factual correctness. Hallucination identification: Flagged instances where the LLM provided information not present in the source material. Hallucination categorization: Classified types of hallucinations (e.g., factual errors, unsupported inferences, contradictions). Confidence score validation: Compared LLM-reported confidence levels with actual answer accuracy. Question-answer relevance rating: Assessed how well answers addressed the given questions. Project size: We analyzed approximately 1000 question-answer pairs across 100 diverse documents, covering topics like legal contracts, technical manuals, and academic papers.

2024

Education

No Education added yet

Sofia Catalina G. hasn’t added any Education History to their OpenTrain profile yet.

Work History

E

ExaByte Company

Data Labeler and LLM Quality Evaluator

Bogota
2023 - Present