AI Data Curation & Pipeline Architect
Helped architect and curate datasets for generative AI system development and optimization. Implemented a RAG pipeline and curation workflow to ensure data quality and effective model responses. Analyzed and logged token usage and system performance for AI training results. • Engineered and maintained vector database-backed semantic retrieval workflows for LLM training. • Curated and prioritized training samples using Python (Pandas) for dataset refinement. • Leveraged tools like LangChain, Pinecone, FAISS, and internal scripting for preprocessing. • Focused on data pipeline optimization for rapid and accurate AI model tuning.