Boltzmann Semantic Score

Official repository for:

Boltzmann Semantic Score: A Semantic Metric for Evaluating Large Vision Models Using Large Language Models
Ali Khajegili Mirabadi, Katherine Rich, Hossein Farahani, Ali Bashashati
International Conference on Learning Representations (ICLR) 2025

📄 Paper
📽️ ICLR Page (presentation + poster)

🔍 Overview

Boltzmann Semantic Score (BSS) is a novel metric for evaluating the semantic alignment between the representation spaces of Large Vision Models (LVMs) and Large Language Models (LLMs) using paired medical image-report datasets.

Unlike existing qualitative approaches, BSS offers a quantitative, scalable, and expert-free way to assess the semantic fidelity of LVMs.

🧠 Core Idea

For a dataset of paired images and medical reports:

Use LLMs (or any proper text embedder) to create a structural representation of expert-written pathology reports
Use LVMs to create an analogous structure from medical images
Define BSS as the structural alignment between the two modalities using a new Boltzmann-based similarity measure

The theory of the Boltzmann semantic score is demonstrated as follows:

📊 Highlights from the Paper

✅ Evaluated 5 LLMs (e.g., Command-R, Bio-Llama3, Llama3, Gemma, Jamba)
✅ Evaluated 7 LVMs (e.g., PLIP, UNI, CTransPath, Phikon, Swin, ViT, Lunit-Dino) using BSS.
📈 Found strong correlation between BSS and downstream tasks like retrieval accuracy and survival C-index

Why Use BSS?

✅ Scalable and model-agnostic
✅ No need for expert annotations or qualitative attention maps
✅ Quantifies semantic alignment between visual and textual spaces
✅ Applicable to any domain with paired image-text data (e.g., medical, industrial inspection)

💾 Dataset & Features

We use paired WSI images and pathology reports from 32 TCGA cancer types, covering ~9,500 patients.

▶ Download Sample Precomputed Features:
LVM Feature Files (Google Drive)

LLM features precomputed as a database dictionary can be found here: ./assets/generated_files/database/text/

These include:

.pt LLM embeddings of pathology reports as a dictionary
.h5 LVM features from 7 vision models. When you downloaded them, place them in the dedicated directory in ./assets/LVM.
We are now releasing a small portion of the data. We will be releasing the LLM encodings of pathology reports soon. Please stay tuned!

🗂 Repository Structure

The directory is structured as follows:

Boltzmann/
    ├── boltzmann_semantic_score/   # Core code for computing the Boltzmann Semantic Score (BSS)
    ├── text_retrieval/             # Code for information retrieval tasks using LLM embeddings
    ├── assets/                     # Directory containing all input and output files
    │   ├── files/                  # Preprocessed inputs required to run the code
    │   └── generated_files/        # Outputs generated by running the code
    └── README.md                   # Project overview and documentation

⚙️ Getting Started

0. Create the Conda environment

To run the code with the correct dependencies, use the provided YAML file to create a conda environment:

conda env create -f assets/files/cuda12_4.yaml

1. Clone the Repository

Clone the repository:

git clone https://github.com/AIMLab-UBC/Boltzmann.git
cd Boltzmann

2. Reproducing the Text Retrieval Experiment

To reproduce the LLM-based text retrieval pipeline described in the paper (only for a small sample set of TCGA-LGG and TCGA-GBM), run the following scripts in order:

1. ./text_retrieval/text_create_database.sh     # Step 1: Creates the encoded database of all LLM features (Note: we have provided the precomputed LLM database in `./assets/generated_files/database/text/`, so you do not need to run this step for the sake of testing the module. You should run this step in case you have the raw LLM features for each report, and you want to make the database instances directly.)
2. ./text_retrieval/text_search_eval.sh         # Step 2: Performs retrieval evaluation using the created database
3. ./text_retrieval/search_result_reporter.sh   # Step 3: Aggregates results into a final report format

3. Reproducing the Survival Prediction Experiment

Please follow the steps in

 ./survival_module/run_batch.sh

4. Boltzmann Semantic Score Computation

To evaluate the semantic alignment between vision and language models using the Boltzmann Semantic Score, simply run (for the given toy datasets, you can choose between LGG or GBM):

bash ./boltzmann_semantic_score/vision_language_score_evaluator.sh

Note: Following the feature structure for LVM and LLM, you can deploy the code for any other dataset!

📜 Citation

If you use this work, please cite:

@inproceedings{mirabadi2025boltzmann,
  title={Boltzmann Semantic Score: A Semantic Metric for Evaluating Large Vision Models Using Large Language Models},
  author={Ali Khajegili Mirabadi and Katherine Rich and Hossein Farahani and Ali Bashashati},
  booktitle={The Thirteenth International Conference on Learning Representations},
  year={2025},
  url={https://openreview.net/forum?id=9yJKTosUex}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Boltzmann Semantic Score

🔍 Overview

🧠 Core Idea

📊 Highlights from the Paper

Why Use BSS?

💾 Dataset & Features

🗂 Repository Structure

⚙️ Getting Started

0. Create the Conda environment

1. Clone the Repository

2. Reproducing the Text Retrieval Experiment

3. Reproducing the Survival Prediction Experiment

4. Boltzmann Semantic Score Computation

Note: Following the feature structure for LVM and LLM, you can deploy the code for any other dataset!

📜 Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
assets		assets
boltzmann_semantic_score		boltzmann_semantic_score
survival_module		survival_module
text_retrieval		text_retrieval
ReadMe.md		ReadMe.md

AIMLab-UBC/Boltzmann

Folders and files

Latest commit

History

Repository files navigation

Boltzmann Semantic Score

🔍 Overview

🧠 Core Idea

📊 Highlights from the Paper

Why Use BSS?

💾 Dataset & Features

🗂 Repository Structure

⚙️ Getting Started

0. Create the Conda environment

1. Clone the Repository

2. Reproducing the Text Retrieval Experiment

3. Reproducing the Survival Prediction Experiment

4. Boltzmann Semantic Score Computation

Note: Following the feature structure for LVM and LLM, you can deploy the code for any other dataset!

📜 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages