AI Model Evaluation & Task Generation STEM Domains for Computer Science (Python)
Acting as a Subject Matter Expert (SME) to train and evaluate Large Language Models (LLMs) in the domain of Computer Science, Mathematics, Physics and Python programming. Key Responsibilities: * Task Generation: Creating computationally intensive STEM problems that require multi-step reasoning and Python coding to solve, designed specifically to challenge and identify reasoning failures in AI models. Model Evaluation (RLHF): Evaluating AI-generated code for correctness, efficiency, and adherence to constraints. Analyzing failure modes such as logic errors, hallucinations, or suboptimal algorithms. Golden Solution Creation: Developing deterministic, reproducible, and efficient Python solutions (using libraries like pandas, numpy, scipy) alongside clear, human-readable explanations to serve as ground truth for model training. Vibe Coding/Rapid Prototyping: executing rapid coding tasks to correct AI responses based on programming standards and clean code principles.