Code Generation and Evaluation for LLM Training
I was involved in training and evaluating LLMs for programming across 7 task types (e.g., code generation, code review, debugging, Test Case generation, documentation, and refactoring) and many different categories (including but not limited to front-end, back-end, data-engineering, data visualization, DB management, scripting, algorithms, etc..). Responsibilities included: Generating prompts tailored to specific task types and categories (e.g., Code generation make my x, or here's my code I have this error please debug). Performing domain classification to validate if prewritten prompts aligned with target use cases. Evaluating model responses on dimensions like correctness, clarity, efficiency, readability, and adherence to specifications, with criteria varying by category (e.g., database management would include query optimization).