Scale AI (Outlier)
I evaluated Python and JavaScript LLM responses on coding prompts, deciding the most effective response, that best answers the question. Giving scores on the response and giving feed back on where it did good and where it could improve