AI & LLM Trainer
Evaluated unit test coverage and test quality to determine whether repositories and tasks were suitable for benchmarking LLM bug-fixing, code modification, and reasoning performance. Modified, ran, and tested real-world Python codebases locally to assess LLM behavior in debugging and issueresolution scenarios. Used Python and Git extensively to inspect repository histories, reproduce issues, review code changes, validate fixes, and document findings clearly.