Python LLM response comparison
This project´s objective was to compare the coding skills of two models by asking the trainer to ask them to perform the same task over several prompt-response turns and evaluate both model´s solutions
Hire this AI Trainer
Sign in or create an account to invite AI Trainers to your job.
No subject matter listed
I have mainly worked in AI training comparing different models responses to code related tasks. This work involved asking two different models to perform a code task such as: new request feature, creation of tests, refactoring, creation of documentation among others. Also, I have tested security testing of models, having bypassed a model security boundaries and using it to download and execute custom created malware. I have also worked in evaluating text model responses in terms of factuality, instruction following, writing tone and other aspects depending on the customer's request. This work also included comparing the responses of two models to the same prompt.
This project´s objective was to compare the coding skills of two models by asking the trainer to ask them to perform the same task over several prompt-response turns and evaluate both model´s solutions
This project evaluated the accuracy of a model to call the appropriate functions to retrieve the information requested by a user. Depending on the context (healthcare or customer support) there were a set of functions which the method could call. I needed to evaluate that the model called the right functions, at the right time and with the right parameters.
The project was testing how well Anthropic's Claude security boundaries prevented using the model to be used to download and install malware using public social network profiles.
M(Eng.) in Information Systems Security, Computer Systems Security
Master of Engineering, Information Systems Security
AWS Security Architect
AWS Security Architect