LLM Integration & Response Evaluation — FlowState Platform
I evaluated outputs from LLMs (Groq/Llama 3.3 70B) integrated into a production platform, focusing on improving model-generated responses. The role involved creating and iterating on prompts, providing structured feedback, and scoring outputs for accuracy and quality. I applied best practices in model evaluation workflows analogous to RLHF-style trainer tasks. • Designed prompt variations and ran response evaluations • Used structured quality scoring for model responses • Provided feedback loops to improve accuracy of LLM outputs • Benchmarked and compared LLM responses for productivity tasks