Research Intern in Contract Review, Compliance, and Legal Research
Abu Dhabi, Emirate
$20.00/hrEntry LevelCrowdsourceDatatroniqDatumbox
Key Skills
Software
CrowdSource
Datatroniq
Datumbox
AWS SageMaker
Top Subject Matter
Legal Services & Contract Review
Regulatory Compliance & Risk Analysis
Legal Research & Document Analysis
Top Data Types
Document
Text
Image
Top Task Types
Segmentation
Entity Ner Classification
Freelancer Overview
Research Intern in Contract Review, Compliance, and Legal Research. Brings 7+ years of professional experience across complex professional workflows, research, and quality-focused execution.
Education includes Bachelor of Medicine and Bachelor of Surgery, Atal Bihari Vajpayee Institute of Medical Sciences (2025) and Bachelor of Medicine and Bachelor of Surgery Internship, Dr. Ram Manohar Lohia Hospital (2025).
Entry LevelHindiFrenchEnglishPunjabi
Labeling Experience
Medical Image Annotation for Diabetic Retinopathy Detection and Grading Using Fundus Photography
ImageSegmentation
Contributed as a Medical Domain Expert and Image Annotation Specialist on a computer vision pipeline aimed at training a deep learning model to automatically detect and grade diabetic retinopathy from fundus photographs. The project was developed to support an AI-assisted screening tool intended for deployment in primary care settings where access to ophthalmologists is limited, enabling earlier detection and referral of high-risk diabetic patients.
Specific Data Labeling Tasks Performed:
Lesion Segmentation
Performed pixel-level segmentation of pathological findings on retinal images, including:
Microaneurysms (small red dots indicating early vascular damage)
Haemorrhages (flame-shaped or dot/blot type)
Hard exudates (bright yellow lipid deposits)
Soft exudates / cotton wool spots (pale fluffy lesions indicating ischemia)
Neovascularization (abnormal new vessel growth in PDR)
Bounding Box Annotation
Drew bounding boxes around discrete lesions and anatomical landmarks including:
Optic disc location and margins
Macula and foveal centre
Focal areas of tractional changes or fibrous proliferation
Image-level Grading / Classification
Assigned overall severity grade to each fundus image using the International Clinical Diabetic Retinopathy Severity Scale:
Grade 0: No apparent retinopathy
Grade 1: Mild NPDR
Grade 2: Moderate NPDR
Grade 3: Severe NPDR
Grade 4: Proliferative Diabetic Retinopathy (PDR)
Diabetic Macular Edema (DME) Flagging
Separately classified each image for the presence and severity of DME:
No DME
DME not involving the foveal centre
DME involving the foveal centre (vision-threatening — urgent referral flag)
Image Quality Assessment
Screened and filtered images prior to annotation for:
Adequate illumination and focus
Presence of ungradable artefacts (lens opacity, camera glare, poor dilation)
Assigned gradability score (Gradable / Partially Gradable / Ungradable) to each image before it entered the annotation pipeline
Landmark-based Annotation
Placed keypoint markers on standardised anatomical reference points used to normalize image orientation and scale across the dataset, ensuring consistency for model training.
Project Size:
Total images annotated: 3,800 fundus photographs
Total lesion-level annotations: ~52,000 individual segmentation and bounding box instances
Team size: 5 annotators (2 with clinical background, 3 trained lay annotators)
Duration: 12 weeks
Annotation tool used: CVAT (Computer Vision Annotation Tool — open-source)
Quality Measures Adhered To:
Inter-Annotator Agreement (IAA): Cohen's Kappa score of 0.82 or above maintained for image-level grading; Intersection over Union (IoU) threshold of 0.75 or above required for all segmentation masks
Annotation Guidelines: Followed a 30-page clinical annotation schema aligned with the International Clinical Diabetic Retinopathy and DME Disease Severity Scales and NHS Diabetic Eye Screening Programme standards
Gold Standard Validation: 8% of images were double-annotated by a senior clinician and used as benchmark references to calibrate annotator accuracy throughout the project
Ungradable Image Rate: Kept below 6% of total dataset through systematic quality screening prior to annotation
Data Privacy Compliance: All images were fully de-identified and stripped of EXIF metadata in accordance with GDPR and HIPAA Safe Harbor requirements before entering the annotation workflow
Audit Trail: All annotation sessions were logged with annotator ID, timestamp, time-on-task per image, and revision history for full traceability and quality review
Contributed as a Medical Domain Expert and Image Annotation Specialist on a computer vision pipeline aimed at training a deep learning model to automatically detect and grade diabetic retinopathy from fundus photographs. The project was developed to support an AI-assisted screening tool intended for deployment in primary care settings where access to ophthalmologists is limited, enabling earlier detection and referral of high-risk diabetic patients.
Specific Data Labeling Tasks Performed:
Lesion Segmentation
Performed pixel-level segmentation of pathological findings on retinal images, including:
Microaneurysms (small red dots indicating early vascular damage)
Haemorrhages (flame-shaped or dot/blot type)
Hard exudates (bright yellow lipid deposits)
Soft exudates / cotton wool spots (pale fluffy lesions indicating ischemia)
Neovascularization (abnormal new vessel growth in PDR)
Bounding Box Annotation
Drew bounding boxes around discrete lesions and anatomical landmarks including:
Optic disc location and margins
Macula and foveal centre
Focal areas of tractional changes or fibrous proliferation
Image-level Grading / Classification
Assigned overall severity grade to each fundus image using the International Clinical Diabetic Retinopathy Severity Scale:
Grade 0: No apparent retinopathy
Grade 1: Mild NPDR
Grade 2: Moderate NPDR
Grade 3: Severe NPDR
Grade 4: Proliferative Diabetic Retinopathy (PDR)
Diabetic Macular Edema (DME) Flagging
Separately classified each image for the presence and severity of DME:
No DME
DME not involving the foveal centre
DME involving the foveal centre (vision-threatening — urgent referral flag)
Image Quality Assessment
Screened and filtered images prior to annotation for:
Adequate illumination and focus
Presence of ungradable artefacts (lens opacity, camera glare, poor dilation)
Assigned gradability score (Gradable / Partially Gradable / Ungradable) to each image before it entered the annotation pipeline
Landmark-based Annotation
Placed keypoint markers on standardised anatomical reference points used to normalize image orientation and scale across the dataset, ensuring consistency for model training.
Project Size:
Total images annotated: 3,800 fundus photographs
Total lesion-level annotations: ~52,000 individual segmentation and bounding box instances
Team size: 5 annotators (2 with clinical background, 3 trained lay annotators)
Duration: 12 weeks
Annotation tool used: CVAT (Computer Vision Annotation Tool — open-source)
Quality Measures Adhered To:
Inter-Annotator Agreement (IAA): Cohen's Kappa score of 0.82 or above maintained for image-level grading; Intersection over Union (IoU) threshold of 0.75 or above required for all segmentation masks
Annotation Guidelines: Followed a 30-page clinical annotation schema aligned with the International Clinical Diabetic Retinopathy and DME Disease Severity Scales and NHS Diabetic Eye Screening Programme standards
Gold Standard Validation: 8% of images were double-annotated by a senior clinician and used as benchmark references to calibrate annotator accuracy throughout the project
Ungradable Image Rate: Kept below 6% of total dataset through systematic quality screening prior to annotation
Data Privacy Compliance: All images were fully de-identified and stripped of EXIF metadata in accordance with GDPR and HIPAA Safe Harbor requirements before entering the annotation workflow
Audit Trail: All annotation sessions were logged with annotator ID, timestamp, time-on-task per image, and revision history for full traceability and quality review
2026 - Present
Clinical NLP Data Annotation for Symptom Extraction and Disease Classification in Pulmonary Conditions
TextEntity Ner Classification
Project Title:
Clinical NLP Data Annotation for Symptom Extraction and Disease Classification in Pulmonary Conditions
Data Type:
Unstructured & Semi-structured Text Data
De-identified patient discharge summaries
Radiology reports (chest X-ray and HRCT findings)
Clinical notes from pulmonology outpatient consultations
Labelling Type:
Multi-label Text Annotation / Named Entity Recognition (NER) / Classification
Subject Matter:
Respiratory Medicine / Pulmonology — covering conditions including COPD, Pulmonary Fibrosis, Pneumonia, Pleural Effusion, and Pulmonary Tuberculosis
Project Description:
Contributed as a Medical Domain Expert and Clinical Data Annotator on a healthcare NLP pipeline aimed at training a machine learning model to automatically extract clinically relevant entities from pulmonary patient records. The project was designed to support a clinical decision-support system that could flag high-risk respiratory patients for early intervention.
Specific Data Labeling Tasks Performed:
Named Entity Recognition (NER) Tagging
Identified and tagged clinical entities within free-text notes, including:
Symptoms (e.g., dyspnea, haemoptysis, crepitations)
Diagnoses (e.g., COPD exacerbation, community-acquired pneumonia)
Medications (e.g., salbutamol, budesonide, azithromycin)
Lab values and vitals (e.g., SpO2 88%, FEV1/FVC ratio 0.62)
Procedures (e.g., bronchoscopy, pulmonary function test)
Relation Extraction Annotation
Labelled relationships between entities, for example:
"Patient administered salbutamol for acute bronchospasm" → Drug–Indication relationship
"HRCT showed bilateral ground-glass opacities consistent with COVID-19 pneumonia" → Finding–Diagnosis relationship
Sentence-level Classification
Classified individual sentences from clinical notes into predefined categories:
Chief Complaint / History of Presenting Illness
Past Medical History
Examination Finding
Investigation Result
Impression / Diagnosis
Treatment Plan
Severity Scoring Labels
Applied standardized severity labels to diagnoses based on clinical criteria:
COPD: GOLD Stage I–IV
Pneumonia: Mild / Moderate / Severe / Critical (based on CURB-65 indicators present in text)
Negation and Uncertainty Tagging
Flagged negated findings (e.g., "no pleural effusion noted") and uncertain language (e.g., "likely fibrotic changes") to prevent the model from treating absent findings as present — a critical nuance in clinical NLP.
Inter-annotator Disagreement Resolution
Participated in weekly calibration sessions to resolve conflicting labels between annotators, using adjudication guidelines developed with the clinical lead.
Project Size:
Total records annotated: 2,400 clinical documents
Total annotation instances: ~38,000 individual entity tags
Team size: 6 annotators (3 clinical, 3 non-clinical)
Duration: 10 weeks
Annotation tool used: Label Studio (open-source)
Quality Measures Adhered To:
Inter-Annotator Agreement (IAA): Cohen's Kappa score maintained at 0.80 or above across all entity categories
Annotation Guidelines: Followed a 24-page internal clinical annotation schema developed in alignment with SNOMED CT and ICD-11 terminology
Gold Standard Validation: 10% of documents were randomly selected as a gold standard set, reviewed by a senior clinician, and used to benchmark annotator accuracy
Error Rate: Individual annotator error rate kept below 5% per review cycle
Data Privacy Compliance: All documents were de-identified in accordance with HIPAA Safe Harbor standards prior to annotation
Audit Trail: Every label change was logged with timestamp and annotator ID for full traceability
Project Title:
Clinical NLP Data Annotation for Symptom Extraction and Disease Classification in Pulmonary Conditions
Data Type:
Unstructured & Semi-structured Text Data
De-identified patient discharge summaries
Radiology reports (chest X-ray and HRCT findings)
Clinical notes from pulmonology outpatient consultations
Labelling Type:
Multi-label Text Annotation / Named Entity Recognition (NER) / Classification
Subject Matter:
Respiratory Medicine / Pulmonology — covering conditions including COPD, Pulmonary Fibrosis, Pneumonia, Pleural Effusion, and Pulmonary Tuberculosis
Project Description:
Contributed as a Medical Domain Expert and Clinical Data Annotator on a healthcare NLP pipeline aimed at training a machine learning model to automatically extract clinically relevant entities from pulmonary patient records. The project was designed to support a clinical decision-support system that could flag high-risk respiratory patients for early intervention.
Specific Data Labeling Tasks Performed:
Named Entity Recognition (NER) Tagging
Identified and tagged clinical entities within free-text notes, including:
Symptoms (e.g., dyspnea, haemoptysis, crepitations)
Diagnoses (e.g., COPD exacerbation, community-acquired pneumonia)
Medications (e.g., salbutamol, budesonide, azithromycin)
Lab values and vitals (e.g., SpO2 88%, FEV1/FVC ratio 0.62)
Procedures (e.g., bronchoscopy, pulmonary function test)
Relation Extraction Annotation
Labelled relationships between entities, for example:
"Patient administered salbutamol for acute bronchospasm" → Drug–Indication relationship
"HRCT showed bilateral ground-glass opacities consistent with COVID-19 pneumonia" → Finding–Diagnosis relationship
Sentence-level Classification
Classified individual sentences from clinical notes into predefined categories:
Chief Complaint / History of Presenting Illness
Past Medical History
Examination Finding
Investigation Result
Impression / Diagnosis
Treatment Plan
Severity Scoring Labels
Applied standardized severity labels to diagnoses based on clinical criteria:
COPD: GOLD Stage I–IV
Pneumonia: Mild / Moderate / Severe / Critical (based on CURB-65 indicators present in text)
Negation and Uncertainty Tagging
Flagged negated findings (e.g., "no pleural effusion noted") and uncertain language (e.g., "likely fibrotic changes") to prevent the model from treating absent findings as present — a critical nuance in clinical NLP.
Inter-annotator Disagreement Resolution
Participated in weekly calibration sessions to resolve conflicting labels between annotators, using adjudication guidelines developed with the clinical lead.
Project Size:
Total records annotated: 2,400 clinical documents
Total annotation instances: ~38,000 individual entity tags
Team size: 6 annotators (3 clinical, 3 non-clinical)
Duration: 10 weeks
Annotation tool used: Label Studio (open-source)
Quality Measures Adhered To:
Inter-Annotator Agreement (IAA): Cohen's Kappa score maintained at 0.80 or above across all entity categories
Annotation Guidelines: Followed a 24-page internal clinical annotation schema developed in alignment with SNOMED CT and ICD-11 terminology
Gold Standard Validation: 10% of documents were randomly selected as a gold standard set, reviewed by a senior clinician, and used to benchmark annotator accuracy
Error Rate: Individual annotator error rate kept below 5% per review cycle
Data Privacy Compliance: All documents were de-identified in accordance with HIPAA Safe Harbor standards prior to annotation
Audit Trail: Every label change was logged with timestamp and annotator ID for full traceability
2025 - 2025
Education
S
St. Vincent’s Hospital, University of Melbourne
Clinical Elective, Endocrinology
Clinical Elective
2025 - 2025
D
Dr. Ram Manohar Lohia Hospital
Bachelor of Medicine and Bachelor of Surgery Internship, Medicine and Surgery
Bachelor of Medicine and Bachelor of Surgery Internship