Bulba
Bulba projects which included Bulba MultiTurn, T2 and Code Eval projects. All projects required evaluation of a model response, rating, providing justification as well as coming up with alternate responses whenever applicable. The MultiTurn project required interaction with the model on multiple turns on a specific context and evaluating whether the responses are as expected. Scale AI provided a clear metric upon which to evaluate the model with keen enphasis on the functionality, security and accuracy of the model responses.