Multi-modal prompts
It involved creating multi-modal prompts to train a vision-language model that can generate or edit UI code based on screenshots, and natural language instructions. The goal is to simplify the development process by enabling the model to understand visual inputs and guidance in plain language to produce clean, functional UI code. What was expected: Reasoning (Text): The response should explain the process clearly and logically . The reasoning should be aligned with the prompt and image(s). Generated Code: Check whether the code is written in the correct language. UI design: Did the code build?. Verify that the resulting UI matches the provided image(s) and prompt requirements. Look for any mismatches, broken components, or missing elements. Score was a scale of 1-5 with three review levels. The task could only be moved forward if at a score was 4 for the first level reviewer and if 5 for the next two levels.