Code Evaluation
I performed comparative assessments of Response A vs Response B for LLM-generated code solutions. Each comparison evaluated functional correctness, logical reasoning, edge-case handling, performance considerations, and adherence to problem specifications. I delivered structured judgments (better / worse / tie), identified failure modes, and provided concise rationales explaining why one response outperformed the other, generating strong preference signals for model training and ranking.