AI Evaluation Engineer
As an AI Evaluation Engineer, I contributed to the evaluation of AI-generated code and alignment tasks. My work involved testing and reviewing outputs of frontier models to ensure their correctness and utility. This process required designing challenging algorithmic tasks as well as providing reference solutions and feedback. • Evaluated code models' outputs for accuracy and adherence to task instructions • Designed adversarial test problems to stress-test AI reasoning and logic • Collaborated on open-source contributions to align AI agents with human software practices • Provided feedback to improve model performance based on test results