Financial Document Classification & Entity Extraction for European Invoices
Performed multi-label classification and named entity annotation on a dataset of 2,000+ European financial documents (invoices, payment orders, receipts) in Polish and Portuguese. Tasks included labeling vendor names, tax IDs (NIP/NIF), VAT amounts, line-item descriptions, payment dates, and cash flow categories. Developed annotation guidelines to handle region-specific formatting (European decimal/thousands separators, local date formats). Maintained inter-annotator agreement above 95% through iterative calibration rounds. The labeled dataset was used to fine-tune an LLM-based extraction pipeline for automated bookkeeping.