Skip to content

Datanadi/Awesome-Entity-Resolution

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Awesome Entity Resolution Resources Awesome


Open-Source Software

End-to-End Entity Resolution

  • Splink (Python, SQL, Spark) - Scalable Fellegi-Sunter and rule-based entity resolution using your choice of SQL or Spark backend.
  • Zingg (Python, Java) - Scalable, active learning model for entity resolution.
  • dedupe (Python) - Active learning and flexible Python tooling for entity resolution.
  • PyJedAI (Python, Java) - State-of-the-art entity resolution clustering algorithms.
  • DeepMatcher (Python) - Deep learning-based entity ersolution
  • FastLink (R) - Easy, scalable Fellegi-Sunter entity resolution on your laptop.
  • RecordLinkage (Python) - Toolkit for prototyping entity resolution systems.
  • dblink (R, Spark) - Scalable Bayesian graphical entity resolution.
  • exchanger (R, C++) - More flexible Bayesian graphical entity resolution on your laptop.
  • RELAIS (R, SQL, Java) - Record linkage software used at the Italian National Statistics Institute.

Evaluation

  • ER-Evaluation (Python) - End-to-End evaluation, including summary statistics for monitoring, principled performance metric estimators, and error analysis.
  • clevr (R) - Performance metrics and error tables.

String Comparison

  • jellyfish (Python, C) - Fast string distance and phonetic matching.
  • py_stringmatching (Python, C) - Large set of string comparison functions and tokenizaztion methods.
  • textdistance (Python) - Very large collection of sequence comparison functions, including token-based distances.
  • SecondString (Java) - Java implementation of string comparison functions.
  • StringCompare (Python, C++) - Time and space efficient implementation of common string distance functions. Architectured for maintainability and extendability.
  • Comparator (R, C++) - Efficient string comparison functions in R.

Embeddings (for pairwise comparison)

  • Entity Embed (Python, PyTorch) - Pytorch text embedding model for blocking.
  • FaceNet-PyTorch (Python, PyTorch) - Embeddings for facial identity resolution.

Data Cleaning and Parsing

  • cleanco (Python) - Company name cleaning.
  • libpostal (C, and bindings for Python, Java, Go, Ruby, PHP, and NodeJS) - Multinational address parsing.
  • Ftfy (Python) - Fixes text (unicode artifacts) for you.
  • PyJanitor (Python) - Clean code for clean data.
  • ProbablePeople - Western name parser.
  • python-nameparser (Python) - Separate names into individual components.
  • Nominally - Name parser for record linkage.

Data Quality Control

Blocking, Candidate Selection, and Search

Commercial Solutions

Books

Contributors

About

List of entity resolution software and resources.

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published