LLM Evaluation and Prompt + Response Writing
This project involves evaluating the performance of large language models by analyzing generated responses, assessing prompt completions, and writing new prompts and responses for supervised fine-tuning (SFT). Tasks include Red Teaming to identify vulnerabilities, Question Answering validation, and rating outputs according to detailed guidelines to enhance model quality. The work demands high attention to linguistic nuances, logical consistency, cultural context, and alignment with client-specific quality standards. All annotations undergo strict quality assurance processes to maintain project accuracy benchmarks.