-
Notifications
You must be signed in to change notification settings - Fork 0
Home
This wiki contains comprehensive documentation for pat2vec.
pat2vec is a Python-based tool designed to transform raw electronic health records (EHR) into structured, time-series feature vectors. This process makes the data suitable for machine learning tasks, particularly binary classification. It can aggregate data at the patient level or construct detailed longitudinal timelines.
Compute summary statistics (e.g., the mean of n variables) for each unique patient, resulting in one row per patient. This is ideal for models requiring a single representation per individual.
Generate a monthly time series for each patient that includes:
- Biochemistry results
- Demographic attributes
- MedCat-derived clinical text annotations
The time series spans up to 25 years retrospectively, aligned to each patient's diagnosis date, enabling a consistent retrospective view across varying start times.