AI System Evaluator & Engineer (Malcrypt-AI Project)
Engineered and evaluated AI agents for Computer Science, Cybersecurity, Healthcare, and Office automation tasks. Designed and implemented evaluation frameworks to assess the reliability and accuracy of AI models and agents. Developed comprehensive benchmarks and scoring logic to measure and document agent performance across diverse scenarios. • Built reproducible evaluation pipelines for consistency. • Conducted in-depth log analysis and testing to identify system limitations. • Focused on AI safety, bias mitigation, and reducing hallucinations. • Documented processes and findings for collaboration and review.