Freelance LLM Test Designer & QA Engineer – LLM/AI Model Evaluation & Testing
As a Freelance LLM Test Designer & QA Engineer, I designed and executed data labeling tasks focused on the evaluation of large language model outputs. My work included authoring evaluation criteria, data-driven test cases, and structured quality assessments for LLM-generated content. I integrated automated evaluation pipelines, using JSON and YAML formats to manage configurations and results consistently. • Designed test suites for assessing intent classification, slot filling, and conversational accuracy in LLM-powered chatbots. • Performed adversarial prompt/injection testing for AI safety using red-teaming methods on LLMs. • Authored and validated model output data for instruction-following, coherence, and hallucination detection. • Built CI-integrated frameworks to systematically rate and regress model output quality over time.