This repository contains the source code to reproduce experiments from the project for Skoltech Machine Learning course (Paper, Slides).
Large language models (LLMs) have helped researchers to achieve tremendous results in the field of NLP. However, work is still being done on their interpretability, part of which is contextualized embeddings from LLMs. Previous works demonstrated that some dimensions in LLMs' embeddings are important to the representational quality of these embeddings for task specific knowledge. In this study, we analyze components' importance of LLMs by probing on simple tasks. Our results suggest that several embeddings' dimensions are directly responsible for definite linguistic properties.
The implementation is on python and GPU-based. Tested with torch 2.2.1 and 1 Tesla T4 on Google Colab.
Local setup:
- Clone this repository, for example,
git clone https://github.com/pkseniya/EmbeddingComponents.git pip install -r ./EmbeddingComponents/requirements.txt– installing the required librariespip install -e ./EmbeddingComponents/– SentEval installation
All the experiments are issued in the form of pretty self-explanatory jupyter notebooks (notebooks/). For convenience, the majority of the evaluation output is preserved. Auxilary source code is moved to .py (feature_importance/).
-
python -m examples.bert- computation of embeddings of probing tasks. Outputs the result intodatasetsfolder. -
notebooks/outliers.ipynb– calculation of outlier dimensions of embeddings -
notebooks/outlier_vs_random_vs_all.ipynb– comparing accuracues of logistic regression on all, outlier and random features -
notebooks/logreg.ipynb– getting feature importance of embedding components with logistic regression -
notebooks/shap.ipynb– getting feature importance of embedding components with shap and mlp -
python -m feature_importances.catboost- getting feature importance of embedding components from gradient boosting. -
notebooks/fvalue.ipynb– getting feature importance of embeddings with ANOVA F-value -
notebooks/one_feature_classification.ipynb– getting accuracy of predictions with single embeddings' component -
plot_methods.sh– plotting feature importance vs deviation from the mean (outlierness of the component)
The results show that models trained on outlier features significantly out-perform models trained on the same number of randomly chosen features. Also we can notice that for some tasks quality on outlier features is comparable with the quality on all features (e.g. Length, oddManOut), while on some other tasks it’s significantly lower (e.g. WordContent). Therefore we can make a conclusion that outlier dimensions hold more information about the encoded sentence than other features.
We obtained feature importances for all probing tasks and plot the features in the axes Deviations from the average - Feature importance. The results for BigramShift task can are presented in the plot.It’s notable, that the fraction of outlier dimensions (orange) above the 95-percentile by importance is much larger than overall. There is a clear trend as well with outlier dimensions occupying upper-right corner of plot. Therefore, we conclude that outlier dimensions tend to have high feature importance.
We check if outlier dimensions specify in some type of tasks. For that purpose we took results of catboost model. Probing tasks were divided in groups by linguistic properties they test: surface, syntactic, semantic. Also group of all tasks (general) was considered. Then, features which were in top of feature importances for every task in the group were found.Some outliers capture paticular type of information, outlier dimension 61 specifies on surface information, 217 captures syntactic information. No outlier dimensions were found to be important for all tasks.
Contact: petrushina.ke@phystech.edu


