AI Data Labeling / Evaluation (Multimodal & Voice Assistants)
Worked as a Multimodal AI Evaluator, testing and assessing the performance of conversational AI systems in real-world scenarios. The project involved interacting with voice-based AI assistants and evaluating their responses across multiple dimensions such as audio understanding, visual grounding, content quality, response relevance, and conversational smoothness. Conducted structured evaluations by assigning ratings and providing detailed, timestamp-based justifications for model behavior. Tasks included identifying issues such as hallucinations, incorrect reasoning, weak context understanding, and poor interaction handling. Performed side-by-side comparisons between different AI models (e.g., GPT vs Gemini) using identical prompts to benchmark performance and highlight strengths and weaknesses. The evaluation process included multimodal inputs such as audio and camera-based interactions to simulate real-world use cases like navigation, product comparison, and decision-making. Maintained strict quality and compliance standards, ensuring consistency, objectivity, and adherence to data privacy guidelines during data collection and evaluation.