LLM Response Evaluation & Instruction Tuning
Rated and ranked outputs from various language models based on criteria such as helpfulness, fluency, factual accuracy, and safety. Conducted side by side comparison (A/B) testing and provided detailed feedback for instruction tuning datasets.