Fine Tuning DistillBERT on the FiNER-139 dataset
The model Checkpoints are in the distilbert-finetuned-ner directory at the root. checkpoint-1407 is the one on which all the evaluation has been done
This is in the distilber-finer-tuned directory.
TODO: To be published to HuggingFace
See TODOs
----DISTILLBERT
|_ distilbert-finetuned-ner
|_ src
| |_ data_preparation
| |_ training
|_ DatExploration.ipynb
The file in which we examine the dataset and see the distribution of the tokens and of the labels is DatExploration.ipynb
The 4 labels chosen to evaluate on are:
- B-ShareBasedCompensationArrangementByShareBasedPaymentAwardAwardVestingPeriod1
- I-ShareBasedCompensationArrangementByShareBasedPaymentAwardAwardVestingPeriod1
- B-DebtInstrumentMaturityDate
- I-DebtInstrumentMaturityDate
| eval_loss | eval_precision | eval_recall | eval_f1 | eval_accuracy |
|---|---|---|---|---|
| 0.044684 | 0.770968 | 0.784893 | 0.777868 | 0.976305 |
All the dependencies are defined in requirements.txt They should be installed in a new venv by running python -m pip install -r requirements.txt from the repo root.
- Export to ONNX
- Evaluate performance on ONNX runtime compared to the original Distil-BERT model
- Write to Hugging Face