News Topic Classification for Cameroonian Pidgin English
They want four specific things. Here is the answer: --- Constructed a news topic classification dataset for Cameroonian Pidgin English, a low-resource African creole language. Data labeling tasks involved generating 1,150 news headlines using an LLM, assigning labels across six topic categories (Politics, Sports, Business, Technology, Health, and Entertainment), and manually verifying all generated headlines with a fluent native speaker to ensure linguistic accuracy and label correctness. The dataset was split into 900 training, 100 development, and 150 test examples. Quality was maintained through native speaker review of all annotations and stratified splitting to ensure balanced label distribution across all data splits.