Adversarial Prompt Engineering & CI/CD Validation for AI Model Evaluation
In the CODE VERIFIER Project, I engineered over 50 adversarial prompts that consistently challenged and failed models like Nova and GPT-4. These prompts were carefully designed to test the limits of the models and uncover weaknesses in their output, ensuring robust performance under edge cases. Additionally, I implemented an automated GitHub CI/CD pipeline using Python, Pytest, and GitHub Actions to verify pull requests (PRs), achieving a 100% success rate in PR validation and over 90% test coverage for reliable code verification. The project involved working closely with Large Language Models (LLMs), generating a wide range of test scenarios and ensuring the code verification process was seamless and efficient. The focus was on creating data that could evaluate AI performance and produce insightful results for training improvements.