Charxiv
Project: Prompt Composition and Chain-of-Thought Evaluation Developed and evaluated reasoning-focused prompts designed to test model problem-solving and critical-thinking capabilities. Each prompt required deterministic single-answer reasoning, often involving multi-step analysis such as comparison, trend detection, or pattern recognition from visual data. Ensured all prompts referenced only visual information, excluding captions or OCR text. Structured model reasoning via Chain of Thought (CoT) with 12–15 sequential, atomic steps to enhance interpretability and logical flow. Conducted RLHF evaluation, rating model reasoning for correctness, logical consistency, and visual interpretation accuracy. Produced concise, verifiable final answers (MCQs or short-form responses) to benchmark model reliability. Followed standardized linguistic, formatting, and verification protocols to ensure reproducibility and high-quality data for model fine-tuning.