Image and Audio prompt code generation
Created and annotated prompts combining code snippets, charts, and diagrams to assess LLM capabilities in tasks such as bug fixing, algorithmic reasoning, and logic explanation — often framed entirely in Spanish. Evaluated LLM output for accuracy, clarity, and alignment with intended logic across multiple programming languages, including Python, JavaScript, and pseudocode-like instruction. Translated complex technical tasks into Spanish while preserving pedagogical intent, enabling broader accessibility and model coverage across non-English code learners. Maintained strict accuracy standards, contributing to internal datasets used for model fine-tuning and benchmark evaluation by AI research teams. Supported iterative refinement of prompt templates and evaluation rubrics, helping shape what "good" outputs should look like across multilingual and multimodal tasks.