AI Data Pipeline Developer & Annotator (PATAMDE)
Developed and implemented an OCR pipeline using Google Document AI and Mistral fallback, focusing on reliable text extraction from educational documents. Designed and shipped an AI-assisted test generation platform with multilingual OCR ingestion, automatic chapter extraction, and question generation for teachers and coaching institutes. Facilitated real-time quiz generation and export workflows directly from PDFs, images, and raw text via web and mobile platforms. • Utilized Gemini API, Mistral API, and Google Document AI to automate extraction and labeling tasks. • Improved extraction reliability on low-quality scans, reducing manual preparation efforts significantly. • Built pipelines to generate quizzes and educational data from annotated documents and images. • Managed entire labeling lifecycle for data fed into AI models enhancing learning platforms.