Warning
This project is still in development and may change very quickly. I will add more functionality in the future but this contains the bare minimum to support the needs of the project.
CensusForge is a Python toolkit for retrieving data from the U.S. Census API while also leveraging a local SQLite metadata database for fast lookups, the SQLite database creation repo can be found in this link in GitHub. It simplifies working with Census datasets by providing a unified interface for:
- Downloading and caching geographic files
- Querying the Census API
- Looking up dataset, variable, year, and geography metadata
- Returning results as Polars or GeoPandas objects
CensusForge consists of two main classes:
DataPull– Handles local metadata queries and file downloadsCensusAPI– ExtendsDataPulland adds direct Census API querying
pip install CensusForgeThe following example shows how to query the Census API using the CensusAPI class.
from CensusForge import CensusAPI
def main():
ca = CensusAPI()
print(
ca.query(
dataset="acs-acs1-pumspr",
year=2019,
params_list=["AGEP", "SCH", "SCHL", "HINCP", "PWGTP", "PUMA"],
)
)
if __name__ == "__main__":
main()Running the above will:
- Look up the dataset in the local metadata database
- Construct the correct Census API URL
- Fetch the API response
- Convert it to a Polars DataFrame
CensusForge/
│
├── CensusAPI.py # CensusAPI and DataPull classes
├── database.db # Local SQLite metadata database
├── jp_tools/ # Utility functions (e.g., file download helper)
│
├── data/ # Output directory for downloaded/cached files
└── README.md # Project documentation
Query a Census dataset using any set of variables or geography parameters.
Example
ca.query(
dataset="acs-acs1-pumspr",
year=2019,
params_list=["AGEP", "HINCP", "PUMA"],
extra="&for=state:*"
)| Method | Description |
|---|---|
get_database(id) |
Returns dataset name for ID |
get_database_id(name) |
Returns dataset ID |
get_year(id) |
Returns year for ID |
get_year_id(year) |
Returns year ID |
get_variable_id(name) |
Returns variable ID |
get_geo_id(name) |
Returns geography type ID |
get_geo_years(dataset_id, geo_id) |
Returns valid years for a dataset+geography |
Downloads a geographic file (if missing), caches it as Parquet, and returns a GeoDataFrame.
- Python 3.9+
- DuckDB
- GeoPandas
- Polars
- Requests
- jp_tools (for download helper)
Install dependencies:
pip install -r requirements.txtTo run tests or modify the project:
git clone https://github.com/yourusername/CensusForge.git
cd CensusForge
pip install -e .@software{ouslan2026censusforge,
author = {Ouslan, Alejandro},
title = {CensusForge},
month = jan,
year = 2026,
publisher = {Zenodo},
version = {0.5.0},
doi = {10.5281/zenodo.18121581},
url = {https://doi.org/10.5281/zenodo.18121581}
}This project is licensed under the GNU General Public License v3.0 (GPL-3.0).
You may copy, modify, and distribute this software only under the terms of the GPL-3.0 license.
Full license text: https://www.gnu.org/licenses/gpl-3.0.en.html