kl-project-2025

Process

Each JSON file in the SciERC dataset contains:

Documents
Sentences belonging to those documents
Relations between entities within those sentences

The goal is to generate questions for each document so that the LLM can produce triplets in the same structured format as defined by the SciERC dataset.
These triplets follow the structure:

$$[subject:label, relationship, object:label]$$

Entity and Relation Labels

Both entities and relations are constrained to a predefined set of valid labels.

Entity Labels

Method
Task
Dataset

Relation Labels

Used-For
Part-Of
Compare-With
SubClass-Of
Synonym-Of
Evaluated-With
Benchmark-For
Trained-With
SubTask-Of

Experimental Procedure

During the experiment, the LLM must be prompted with clear and specific instructions to ensure the correct extraction of these triplets for later comparison.

Because the SciERC dataset is quite extensive, only a subset of the available documents from the different JSON files should be used.
This makes the experiment computationally feasible while maintaining representative coverage of entity and relation types.

Five documents from the training dataset were randomly selected and had gold standard KG's generated for them, the questions that will be prompted to the LLM must now representitive enough of this documents in order for the LLM to construct a good KG.

Gold Standard Construction

The gold standard Knowledge Graph (KG), used as the reference for evaluation, must correspond exactly to the same set of documents for which the LLM-generated triplets were produced.

Including any additional documents in the gold standard that were not part of the LLM question set would introduce bias into the comparison process.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
Knowledge-graph		Knowledge-graph
Models		Models
Vector-store		Vector-store
docs		docs
plots		plots
postprocessed-dataset		postprocessed-dataset
results		results
.gitignore		.gitignore
LLM-Output-sanitization.ipynb		LLM-Output-sanitization.ipynb
PRAG-Steps.md		PRAG-Steps.md
Pipeline-operation.md		Pipeline-operation.md
README.md		README.md
RefCheck-Evaluation.ipynb		RefCheck-Evaluation.ipynb
RefChecker-Steps.md		RefChecker-Steps.md
combine_datasets.ipynb		combine_datasets.ipynb
dataset_preprocessing.ipynb		dataset_preprocessing.ipynb
kg_extractor.ipynb		kg_extractor.ipynb
rag-tuto.ipynb		rag-tuto.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

kl-project-2025

Process

$$[subject:label, relationship, object:label]$$

Entity and Relation Labels

Entity Labels

Relation Labels

Experimental Procedure

Gold Standard Construction

About

Uh oh!

Releases

Packages

Languages

McDoritos/KL_Knowledge-Injection-Hallucinations

Folders and files

Latest commit

History

Repository files navigation

kl-project-2025

Process

$$[subject:label, relationship, object:label]$$

Entity and Relation Labels

Entity Labels

Relation Labels

Experimental Procedure

Gold Standard Construction

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages