Antonio Jimenez - Data Labeling Freelancer

Key Skills

Software

AWS SageMaker

Anno-Mage

Appen

Argilla

Axiom AI

Clickworker

CloudFactory

Internal/Proprietary Tooling

Top Subject Matter

No subject matter listed

Top Data Types

Text

Top Task Types

Classification

Entity Ner Classification

Evaluation Rating

Fine Tuning

Function Calling

Company Overview

Bitext provides custom annotation for GenAI tasks, like model training and evaluation, and for NLP tasks like entity extraction, event extraction, sentiment analysis… Bitext automates data annotation and generation tasks for AI/NLP applications for Language Model training and evaluation. Our unique differentiator: we combine automation tools with human-in-the-loop curation, to annotate data. Additionally, we leverage proprietary NLG (Natural Language Generation) technology to produce and augment Synthetic Training Data; as well as proprietary NLP tools for Entity Extraction, Relationship Detection, Sentiment Analysis or lemmatization, POS tagging and Phrase Extraction. Bitext also provides off-the-shelf datasets for GenAI tasks (synthetically generated conversational datasets in 20 verticals), and for NLP tasks (manually curated resources like morphological dictionaries, synonyms dictionaries and ontologies). DAL: Automation Tools for Data Annotation and Labelling NLP: Text Annotation Tools for NLP Tasks in 70+ Languages NLP: Lexical and Semantic Data in 70+ Languages NLG: Synthetic Text Generation Tools to generate custom datasets Pre-Built Datasets to train and evaluate your assistant/chatbot

ExpertHindiArabicFrenchGermanKoreanEnglishSpanishJapanesePortugueseChinese Mandarin

Security

Security Overview

We have everything secured and work with the client's tools to protect their data, such as AWS, Databricks, etc. We ensure: Physical security measures in our facilities, including CCTV surveillance and secure access to workstations. Robust cybersecurity policies, featuring secure network infrastructure, firewalls, and antivirus software. Employee confidentiality and data handling protocols, including non-disclosure agreements and training on data privacy. Regular audits and compliance checks to maintain the highest security standards.

Security Credentials

GDPR

Labeling Experience

Financial Report Q&A

Internal Proprietary ToolingTextClassificationQuestion Answering

Collaborated with client on improving project specifications and developing annotation guidelines Annotation Tasks: Paraphrase questions to remove ambiguity and add implicit information (company names, industry, years…) needed to find the financial reports and tables that contain the answers Generate new complex questions involving tables from multiple financial reports, and generate the data extraction and calculation steps needed to obtain the answers Annotation Tools: Custom Excel-based interface using formulae and complex macros to check calculation steps and ensure data consistency Data Format: Input: HTML Output: JSON Volume: Paraphrasing: 10K query/report pairs Question generation: 8K report pairs Timeframe: 2 months

2024 - 2024

NLQ Paraphrase

Internal Proprietary ToolingTextEntity Ner ClassificationQuestion Answering

Collaborated with client on improving project specifications and developing annotation guidelines Annotation Tasks: Paraphrasing natural language queries that are answerable by tables, covering: Adding domain-specific filler words to entities in the query Adding domain-specific synonyms to entities in the query Rephrasing queries from explicit to implicit questions Conversion of unit and date formats Rephrasing verbs into noun phrases (and vice versa) Conversion between complete names and abbreviations (and vice versa) Changing between active and passive voice Annotation Tools: Client-specific annotation tool Data Format: Input: JSON Output: JSON Volume: Queries: 60K Tables: 25K Timeframe: 1.5 months

2024 - 2024

Business Name Generation

Aws SagemakerTextClassificationEntity Ner Classification

Overview The client needed to improve the recall of a model designed to extract information from large numbers of tables with hundreds of columns each – specifically, they wanted the model to learn to work transparently with expanded (“account creation date”) and condensed/cryptic column names (“acct_cr_date”). We annotated a set of 100K tables and 1M columns, generating both expanded and condensed names for each column – this typically required disambiguating based on the field values and the context of the table. We developed pre-annotation tools that would identify whether an input column name was expanded or condensed, and would generate the corresponding variation. This pre-annotation was then reviewed by human annotators, and their corrections were used to further improve the pre-annotation process. This automation allowed us to annotate the full set of 1M columns in a month. Details Collaborated with client on improving project specifications and developing annotation

2023 - 2024