AI Evaluation Contractor (Coding & Bilingual Tasks)
Reviewed and evaluated outputs from large language models for coding and bilingual tasks, ensuring output quality and alignment. Assessed Java code generated by AI models for correctness, logical consistency, and completeness. Analyzed model responses and provided structured feedback for training data improvement. • Compared multiple model-generated outputs for instruction adherence • Rated and ranked responses according to specific rubrics • Focused on identifying logical flaws and edge-case robustness in code • Supported improvements to LLM training and data quality through feedback