SFT & RLHF Data labelling
For finetuning for the model we have prepared a 100k Q&A pair of the document for the model training (7B params) and samewise for reinforcment learning with human feedback we have collect a response / reject dataset segementation work we have collected around 12k docs.