Write a script that: Loads the new NER model. Runs it on new PDFs (or texts). Compares the extracted entities with the reference annotations (from the validation/test set). Outputs metrics and possibly visualizations (confusion matrix).