AI Code Evaluation and RLHF Output Quality Assessment (Freelance/Entrepreneurship)
Worked on evaluating AI-generated code outputs for correctness, efficiency, and adherence to best practices as part of RLHF-adjacent evaluation tasks. Responsibilities included prompt engineering, hallucination detection, and output quality assessment using LLM integration patterns. Familiar with fine-tuning concepts, evaluation rubrics, and safety considerations for large language models. • Evaluated Python, JavaScript, TypeScript, Java, C++, and SQL code outputs from AI models. • Identified and flagged hallucinated or logically incorrect code outputs. • Utilized platforms such as Labelbox, Alignerr, Scale AI, and DataAnnotation Tech for annotation tasks. • Worked with Claude API, OpenAI Realtime API, and Bland.ai for model evaluation and prompt testing.