AI Model Evaluation & Prompt Engineering Specialist
As an AI Model Evaluation & Prompt Engineering Specialist, I designed analytical problems and evaluated AI model outputs using RLHF-style methods. My responsibilities included creating multi-step reasoning prompts, assessing model outputs with structured metrics, and refining prompt designs to enhance evaluation rigor. The role required logical consistency checks and scenario simulations for robust AI system assessment. • Generated 50+ multi-step reasoning problems for use in RLHF workflows. • Evaluated machine learning outputs for correctness and reliability using RMSE, R², and F1-score. • Refined prompt engineering techniques to improve AI evaluation outcomes. • Simulated real-world use cases using synthetic datasets for comprehensive model assessment.