LLM Prompt Writing & AI Response Evaluation
In this project, I contributed to the development and fine-tuning of large language models by generating diverse prompts and evaluating AI-generated responses. Tasks included writing high-quality prompts across various domains, assessing response accuracy, coherence, and helpfulness, and ranking multiple outputs based on specific guidelines. I also flagged harmful, biased, or irrelevant content to ensure adherence to safety and ethical standards. The project required strong judgment, consistency, and attention to nuanced differences in language. All tasks were completed on Appen’s internal platform with strict quality and throughput metrics.