Vemula Shivasai - AI Data Annotator – NLP & Agentic AI

Key Skills

Software

Other

Top Subject Matter

Medical/dicom Domain Expertise

Nlp Domain Expertise

Retrieval-Augmented Generation

Top Data Types

Image

Document

Text

Top Task Types

Classification

Text Summarization

Text Generation

Fine Tuning

Evaluation Rating

Function Calling

Prompt Response Writing SFT

RLHF

Object Detection

Entity Ner Classification

Segmentation

Freelancer Overview

I am a 4th-year B.Tech student specializing in Artificial Intelligence and Machine Learning, currently working as a remote contractor at Turing AI, where I focus on data annotation and AI training data tasks. My work at Turing involves evaluating, labeling, and verifying AI-generated outputs to help improve the quality and accuracy of large language models. Through this experience, I have developed strong expertise in data labeling workflows, quality assurance for AI training datasets, and prompt evaluation across multiple domains. I am well-versed in working with text, code, and conversational data annotations, and I bring a solid understanding of AI/ML concepts that allows me to annotate with high precision and contextual accuracy.

IntermediateEnglishHindiTelugu

Labeling Experience

Document Chunking & Summarization — Domain-Specific RAG Pipeline

OtherDocumentText Summarization

I developed a retrieval-augmented generation (RAG) pipeline to process and label academic PDFs by segmenting documents into overlapping, information-rich chunks. This included extracting text, designing labeling rules for keyword coverage, and benchmarking summarization performance. Embedding-based search and chunk selection formed the core of the data labeling strategy. • Processed and annotated 594 pages from six academic PDFs, generating 695 labeled document chunks. • Defined chunk-level labeling criteria to ensure 93.3% keyword coverage against benchmarks. • Evaluated and compared summarization quality based on ROUGE-L and cosine similarity metrics. • Created an interactive local web interface for end-to-end labeled Q&A workflows.

2026 - 2026

Medical Image Classification — Diabetic Retinopathy Grading System

OtherImageClassification

I fine-tuned the EfficientNet-B0 model on medical retinal images to classify diabetic retinopathy severity into five classes. This involved preparing and annotating the APTOS 2019 dataset, implementing weighted sampling to address class imbalance, and validating cross-dataset performance. Lesion-level interpretability was achieved through the use of LIME and SHAP for explainable AI. • Curated and labeled a dataset of 35,000 medical retinal images for multi-class classification. • Applied stratified data splits and annotation techniques to address a 10:1 class imbalance. • Used LIME and SHAP for local interpretability insights at the lesion level. • Deployed the workflow as a REST endpoint with logging for prediction confidence.

2026 - 2026

Code-Mixed Hinglish-English NLP Pipeline — Translation Data Labeling

OtherText

I fine-tuned the NLLB-200 model for translation tasks on code-mixed Hinglish-English social media data, requiring robust data cleaning and normalization. This work included labeling translation pairs, curating noisy datasets, and evaluating model outputs using BLEU and ROUGE-L scores. The pipeline produced high-quality, domain-specific label sets for multilingual NLP. • Labeled and cleaned Hinglish-English text data for machine translation model fine-tuning. • Established annotation guidelines for translation accuracy and text normalization. • Evaluated translation tasks with BLEU and ROUGE-L as labeling metrics. • Leveraged the Hugging Face Transformers library for annotation and training tasks.

2025 - 2025

Education

W

Woxsen University

Bachelor of Technology, Artificial Intelligence and Machine Learning

Bachelor of Technology

2024 - 2027

I

INTI International College Subang

Undergraduate Studies, Artificial Intelligence and Machine Learning

Undergraduate Studies

2023 - 2024

Work History

T

Turing

LLM S2 Annotator

Hyderabad

2026 - Present