diff --git a/handbook/_analytic.md b/handbook/_analytic.md deleted file mode 100644 index d16f856..0000000 --- a/handbook/_analytic.md +++ /dev/null @@ -1,1620 +0,0 @@ -## Catalog - -The following data is available at: **`/n/dominici_nsaph_l3/Lab/projects/analytic/`** - -### MedPar (Admissions) - -`````{dropdown} **admissions_by_year** - -```{list-table} -:header-rows: 0 - -* - data_source - - MedPar -* - fasse_location - - `admissions_by_year` -* - rce_location - - `~/shared_space/ci3_health_data/medicare/gen_admission/ 1999_2016/targeted_conditions/cache_data/admissions_by_year/` -* - date_created - - Feb 20 2020 -* - size - - 22 GB -* - files - - -``` -``` - ├── admissions_1999.fst - ├── admissions_2000.fst - ├── ... - └── admissions_2016.fst -``` -````{dropdown} header -``` -QID : chr -AGE : int -SEX : int -RACE : int -SSA_STATE_CD : int -SSA_CNTY_CD : int -PROV_NUM : int -ADM_SOURCE : chr -ADM_TYPE : int -ADATE : chr -DDATE : chr -BENE_DOD : chr -DODFLAG : chr -ICU_DAY : int -CCI_DAY : int -ICU : int -CCI : int -DIAG1 : chr -DIAG2 : chr -DIAG3 : chr -DIAG4 : chr -DIAG5 : chr -DIAG6 : chr -DIAG7 : chr -DIAG8 : chr -DIAG9 : chr -DIAG10 : logi -diag11 : logi -diag12 : logi -diag13 : logi -diag14 : logi -diag15 : logi -diag16 : logi -diag17 : logi -diag18 : logi -diag19 : logi -diag20 : logi -diag21 : logi -diag22 : logi -diag23 : logi -diag24 : logi -diag25 : logi -YEAR : int -LOS : int -Parkinson_pdx : int -Parkinson_pdx2dx_10 : int -Parkinson_pdx2dx_25 : int -Alzheimer_pdx : int -Alzheimer_pdx2dx_10 : int -Alzheimer_pdx2dx_25 : int -Dementia_pdx : int -Dementia_pdx2dx_10 : int -Dementia_pdx2dx_25 : int -CHF_pdx : int -CHF_pdx2dx_10 : int -CHF_pdx2dx_25 : int -AMI_pdx : int -AMI_pdx2dx_10 : int -AMI_pdx2dx_25 : int -COPD_pdx : int -COPD_pdx2dx_10 : int -COPD_pdx2dx_25 : int -DM_pdx : int -DM_pdx2dx_10 : int -DM_pdx2dx_25 : int -Stroke_pdx : int -Stroke_pdx2dx_10 : int -Stroke_pdx2dx_25 : int -CVD_pdx : int -CVD_pdx2dx_10 : int -CVD_pdx2dx_25 : int -CSD_pdx : int -CSD_pdx2dx_10 : int -CSD_pdx2dx_25 : int -Ischemic_stroke_pdx : int -Ischemic_stroke_pdx2dx_10: int -Ischemic_stroke_pdx2dx_25: int -Hemo_Stroke_pdx : int -Hemo_Stroke_pdx2dx_10 : int -Hemo_Stroke_pdx2dx_25 : int -zipcode_R : int -Race_gp : chr -Sex_gp : chr -age_gp : chr -Dual : int -``` -```` -````` - -### MBSF (Denominator) -`````{dropdown} **denom** - -```{list-table} -:header-rows: 0 - -* - data_source - - MBSF -* - fasse_location - - `denom` -* - size - - 7.4 GB -* - files - - -``` -``` - ├── qid_data_2009.fst - ├── qid_data_2010.fst - ├── ... - ├── qid_data_2016.fst - ├── qid_entry_exit.fst - └── year_zip_confounders.fst -``` -````{dropdown} header (qid_data_yyyy) -``` -qid : chr -year : int -zip : int -sex : int -age : int -dual : chr -dead : logi -hmo_mo: chr -fips : int -race : chr -sexM : num -``` -```` -````{dropdown} header (year_zip_confounders) -``` -zip : num -year : int -mean_bmi : num -smoke_rate : num -hispanic : num -pct_blk : num -medhouseholdincome: num -medianhousevalue : num -poverty : num -education : num -popdensity : num -pct_owner_occ : num -summer_tmmx : num -winter_tmmx : num -summer_rmax : num -winter_rmax : num -city : chr -statecode : chr -latitude : num -longitude : num - -min_year: 2000 -max_year: 2016 -``` -```` -````` - -### Annual Exposure per Medicare Beneficiary -`````{dropdown} **qid_yr_exposures** - -```{list-table} -:header-rows: 0 - -* - rce_location - - `~/shared_space/ci3_analysis/dmork/Data/DLM_ADRD` -* - fasse_location - - `qid_yr_exposures` -* - dataset_author - - Daniel Mork -* - date_created - - April 2022 -* - size - - 139 GB -* - description - - Annual exposure measurements (columns, 2000-2016) for each Medicare benficiary (rows) tied to their zip code of residence in a given year. Exposures (xxx in file name) include: no2, ozone, pm2.5, pm2.5components, pr (precipitation), rmax (max humidity), tmmx (max temperature), zip (zip code of residence). -* - files - - -``` - -``` - ├── qid_yr_no2.fst - ├── qid_yr_ozone.fst - ├── qid_yr_pm25comp_br.fst - ├── qid_yr_pm25comp_ca.fst - ├── qid_yr_pm25comp_cu.fst - ├── qid_yr_pm25comp_ec.fst - ├── qid_yr_pm25comp_fe.fst - ├── qid_yr_pm25comp_k.fst - ├── qid_yr_pm25comp_nh4.fst - ├── qid_yr_pm25comp_ni.fst - ├── qid_yr_pm25comp_no3.fst - ├── qid_yr_pm25comp_oc.fst - ├── qid_yr_pm25comp_pb.fst - ├── qid_yr_pm25comp_si.fst - ├── qid_yr_pm25comp_so4.fst - ├── qid_yr_pm25comp_v.fst - ├── qid_yr_pm25comp_z.fst - ├── qid_yr_pm25.fst - ├── qid_yr_pr.fst - ├── qid_yr_rmax.fst - ├── qid_yr_tmmx.fst - └── qid_yr_zip.fst -``` - -````{dropdown} header (qid_yr_xxx.fst): -``` -qid : chr -2000: num -2001: num -2002: num -2003: num -2004: num -2005: num -2006: num -2007: num -2008: num -2009: num -2010: num -2011: num -2012: num -2013: num -2014: num -2015: num -2016: num -``` -```` -````` - -### MBSF (Enrollment file, denominator) -`````{dropdown} **denom_by_year** - -```{list-table} -:header-rows: 0 - -* - data_source - - MBSF, census (interpolated), BRFSS (interpolated), PM2.5 exposure, seasonal temperature -* - rce_location - - `~/shared_space/ci3_health_data/medicare/mortality/ 1999_2016/wu/cache_data/merged_by_year_v2` -* - fasse_location - - `denom_by_year` -* - git_repository - - [github.com/NSAPH/National-Causal-Analysis](https://github.com/NSAPH/National-Causal-Analysis/tree/master/MergedData) -* - dataset_author - - Ben Sabath, Xiao Wu -* - spatial_resolution - - zipcode -* - temporal_coverage - - 1999-2016 -* - processing_description - - Recommended for use. Available in both `.fst` and `.csv` formats on FASSE. -* - date_created - - Apr 2021 -* - size - - 7.4 GB -* - files - - -``` - -``` - ├── confounder_exposure_merged_nodups_health_1999.fst - ├── ... - └── confounder_exposure_merged_nodups_health_2016.fst -``` -````{dropdown} header -``` -zip : int -year : int -qid : chr -dodflag : chr -bene_dod : chr -sex : int -race : int -age : int -hmo_mo : chr -hmoind : chr -statecode : chr -latitude : num -longitude : num -dual : chr -death : int -dead : logi -entry_age : int -entry_year : int -entry_age_break : int -followup_year : num -followup_year_plus_one : num -pm25_ensemble : num -pm25_no_interp : num -pm25_nn : num -ozone : num -ozone_no_interp : num -zcta : int -poverty : num -popdensity : num -medianhousevalue : num -pct_blk : num -medhouseholdincome : num -pct_owner_occ : num -hispanic : num -education : num -population : num -zcta_no_interp : int -poverty_no_interp : num -popdensity_no_interp : num -medianhousevalue_no_interp : num -pct_blk_no_interp : num -medhouseholdincome_no_interp: num -pct_owner_occ_no_interp : num -hispanic_no_interp : num -education_no_interp : num -population_no_interp : int -smoke_rate : num -mean_bmi : num -smoke_rate_no_interp : num -mean_bmi_no_interp : num -amb_visit_pct : num -a1c_exm_pct : num -amb_visit_pct_no_interp : num -a1c_exm_pct_no_interp : num -tmmx : num -rmax : num -pr : num -cluster_cat : chr -fips_no_interp : int -fips : int -summer_tmmx : num -summer_rmax : num -winter_tmmx : num -winter_rmax : num -``` -```` -````` - -### AD/ADRD Hospitalization -`````{dropdown} **hospitalization** - -```{list-table} -:header-rows: 0 - -* - data_source - - MedPar derived -* - rce_location - - `~/shared_space/ci3_analysis/dmork/Data/DLM_ADRD` -* - fasse_location - - `hospitalization` -* - dataset_author - - Daniel Mork -* - description - - The first recorded hospitalization for each individual broken down by primary/secondary/any billing code (ICD). -* - size - - 1.2 GB -* - files - - -``` -``` - ├── First_hosp_AD_any.fst - ├── First_hosp_AD_primary.fst - ├── First_hosp_ADRD_any.fst - ├── First_hosp_ADRD_primary.fst - ├── First_hosp_ADRD_secondary.fst - └── First_hosp_AD_secondary.fst -``` -````{dropdown} header -``` -QID : Factor -ADATE: Date -year : num -``` -```` -````` - -### Medicare Entry Age -`````{dropdown} **medicare_entry_age** - -```{list-table} -:header-rows: 0 - -* - data_source - - MBSF derived -* - rce_location - - `/nfs/nsaph_ci3/scratch/jan2021_whanhee_cache/entry_age/` -* - fasse_location - - `medicare_entry_age` -* - size - - 2.3 GB -* - date_created - - Jan 26, 2021 -* - dataset_author - - Ben Sabath, Whenhee Lee -* - spatial_resolution - - zipcode -* - git_repository - - [NSAPH/data_requests](https://github.com/NSAPH/data_requests/blob/master/request_projects/jan2021_whanhee_fisrt_hosps/code/1_create_indivdual_vars.R) -* - files - - -``` -``` - └── medicare_entry_age.csv -``` -````` - -### Years in Medicare -`````{dropdown} **years_in_medicare** -```{list-table} -:header-rows: 0 - -* - data_source - - MBSF derived -* - rce_location - - `/nfs/nsaph_ci3/scratch/jan2021_whanhee_cache/follow_up/` -* - fasse_location - - `years_in_medicare` -* - description - - Number of years a beneficiary has been in Medicare (or in other words, the number of years since one has entered Medicare). Allows for grouping on how long beneficiaries have been in Medicare. -* - size - - 8.8 GB -* - date_created - - Jan 26, 2021 -* - temporal_coverage - - 1999-2016 -* - dataset_author - - Ben Sabath, Whanhee Lee -* - spatial_resolution - - zipcode -* - git_repository - - [NSAPH/data_requests](https://github.com/NSAPH/data_requests/blob/master/request_projects/jan2021_whanhee_fisrt_hosps/code/1_create_indivdual_vars.R) -* - files - - -``` -``` - ├── follow_up_year_2000.fst - ├── ... - └── follow_up_year_2016.fst -``` -````` - -### Temperature Humidity Precipitation -`````{dropdown} **temperature_seasonal_zipcode** -```{list-table} -:header-rows: 0 -* - rce_location - - `/nfs/nsaph_ci3/ci3_confounders/data_for_analysis/earth_engine/ temperature/temperature_seasonal_zipcode_combined.csv` -* - fasse_location - - `temperature_seasonal_zipcode` -* - dataset_author - - Xiao Wu, Ben Sabath -* - date_created - - Jul 23, 2020 -* - data_source - - Google Earth Engine provides a single interface for interacting with a number of geospatial data sources. The sources used and links to their documentation are: [GRIDMET](https://developers.google.com/earth-engine/datasets/catalog/IDAHO_EPSCOR_GRIDMET), [NLDAS](https://developers.google.com/earth-engine/datasets/catalog/NASA_NLDAS_FORA0125_H002), [MODIS MOD10A1.006](https://developers.google.com/earth-engine/datasets/catalog/MODIS_006_MOD10A1), [GLDAS](https://developers.google.com/earth-engine/datasets/catalog/NASA_GLDAS_V021_NOAH_G025_T3H), [NOAA CDR PATMOSX](https://developers.google.com/earth-engine/datasets/catalog/NOAA_CDR_PATMOSX_V53), [NOAA NCEP Climate Forecast System V2](https://developers.google.com/earth-engine/datasets/catalog/NOAA_CFSV2_FOR6H) -* - spatial_coverage - - contiguous US -* - spatial_resolution - - zipcode -* - temporal_coverage - - 1999-2019 -* - temporal_resolution - - annually -* - description - - This dataset contains information on temperature, relative humidity, and total precipitation data. The data is available as raster files on Google earth engine. The temporal and spatial resolutions varied by data source, but all were available at a daily resolution or more frequently. Where the time resolution of the rasters is more than daily, daily averages for each raster were calculated. Next, using Google earth engine's spatial averaging algorithms and a set of polygons representing the areas of interest, the daily value for each polygon was calculated. The polygons used were the ones described in the preceding section. The results of this calculation were then downloaded as a csv file to the RCE. At this point, there is one file for each year. Following this, annual averages are calculated for each location, and these are combined in to a single file. The daily values are also combined in to a single file. For the `combined_zips` files (which combine the zip code polygon based measures with the the point based estimates to address zip codes without area) there is an additional step. Values for zip codes not in the polygon based measure are taken from the point based measures to address the ~7000 zip codes without area that are missing from the polygon shape file. -* - git_repository - - [NSAPH/data_documentation](https://github.com/NSAPH/data_documentation/blob/master/earth_engine_docs/earth_engine_data.Rmd) -* - meterological - - Temperature (K) - variable name: tmmx (Source: GRIDMET); Relative Humidity - variable name: rmax (Source: GRIDMET) -* - size - - 65 MB -* - header - - `ZIP,year,summer_tmmx,summer_rmax,winter_tmmx,winter_rmax` -* - files - - -``` -``` - └── temperature_seasonal_zipcode_combined.csv -``` -````` - -### Pollution-Census-Temperature covariates -`````{dropdown} **merged_covariates_pm_census_temp** -```{list-table} -:header-rows: 0 -* - data_source - - US Census/ACS, Business Analyst Data Set, BRFSS -* - rce_location - - `/nfs/nsaph_ci3/ci3_health_data/medicare/ mortality/1999_2016/wu/output_data /merged_covariates.csv` -* - fasse_location - - `merged_covariates_pm_census_temp` -* - dataset_author - - Xiao Wu, Ben Sabath -* - date_created - - May 29, 2019 -* - spatial_coverage - - contiguous US -* - spatial_resolution - - zipcode -* - temporal_coverage - - 2000-2016 -* - temporal_resolution - - annually -* - publication - - https://www.science.org/doi/10.1126/sciadv.aba5692 -* - git_repository - - [nejm_confounder_summary/nejm_confounder](https://github.com/NSAPH/data_documentation/blob/master/nejm_confounder_summary/nejm_confounders.csv) and [rce_data_list/confounder_data](https://github.com/NSAPH/data_documentation/blob/master/rce_data_list/confounder_data.csv) -* - size - - 296 MB -* - header - - `zip, year, pm25_ensemble, pm25_no_interp, pm25_nn, ozone, ozone_no_interp, zcta, poverty, popdensity, medianhousevalue, pct_blk, medhouseholdincome, pct_owner_occ, hispanic, education, population, zcta_no_interp, poverty_no_interp, popdensity_no_interp, medianhousevalue_no_interp, pct_blk_no_interp, medhouseholdincome_no_interp, pct_owner_occ_no_interp, hispanic_no_interp, education_no_interp, population_no_interp, smoke_rate, mean_bmi, smoke_rate_no_interp, mean_bmi_no_interp, amb_visit_pct, a1c_exm_pct, amb_visit_pct_no_interp, a1c_exm_pct_no_interp, tmmx, rmax, pr, cluster_cat, fips, fips_no_interp` -* - files - - -``` -``` - └── merged_covariates.csv -``` -````` - -### Population-Weighted Daily County-Level Heat Metrics -`````{dropdown} **county_heat_metrics** -```{list-table} -:header-rows: 0 -* - data_source - - ERA5-Land gridded data -* - fasse_location - - `heatvars_county_2000-2020` -* - dataset_author - - Keith Spangler -* - date_created - - June 17, 2022 -* - spatial_coverage - - contiguous US -* - spatial_resolution - - county -* - temporal_coverage - - 2000-2020 -* - temporal_resolution - - daily -* - publication - - https://pubmed.ncbi.nlm.nih.gov/35715416/ -* - size - - 1.03 GB -* - header - - `"StCoFIPS", "Date", "Tmin_C", "Tmax_C", "Tmean_C", "TDmin_C", "TDmax_C", "TDmean_C", "NETmin_C", "NETmax_C", "NETmean_C", "HImin_C", "HImax_C", "HImean_C", "HXmin_C", "HXmax_C", "HXmean_C", "WBGTmin_C", "WBGTmax_C", "WBGTmean_C", "UTCImin_C", "UTCImax_C", "UTCImean_C", "Flag_T", "Flag_TD", "Flag_NET", "Flag_HI", "Flag_HX", "Flag_WBGT", "Flag_UTCI"` -* - files - - -``` -``` - └── Heatvars_County_2000-2020_v1.2.Rds -``` -````` - -### Medicaid - Respiratory Hospitalizations in Children -`````{dropdown} **medicaid_children_99-12** -```{list-table} -:header-rows: 0 -* - data_source - - Medicaid -* - rce_location - - `/nfs/nsaph_ci3/ci3_health_data/medicaid/respiratory /1999_2012/youth_resp_hosps_jlee/data` -* - fasse_location - - `medicaid_children_99-12` -* - dataset_author - - Jenny Lee -* - date_created - - 2021 -* - spatial_coverage - - contiguous US -* - spatial_resolution - - zipcode -* - temporal_coverage - - 1999-2012 -* - temporal_resolution - - annually -* - description - - The data prepared for this project consists of the Medicaid Fee For Service population, with unrestricted Medicaid benefits, under the age of 20 from 1999-2012. This data also includes all hospitalizations for that population, with indicators included regarding whether or not they were associated with a set of respiratory hospitalizations. See the schema for the hospitalization data below for details on specific indicators. -* - git_repository - - [NSAPH/data_requests](https://github.com/NSAPH/data_requests/tree/master/request_projects/feb2021_jenny_medicaid_resp) -* - exposures - - Xiao Wu's CausalGPS PM2.5 data -* - size - - 14 GB -* - files - - -``` -``` -├── denom -│ ├── denom_under_20_1999.fst -│ ├── ... -│ └── denom_under_20_2012.fst -└── hosps - ├── under_20_admissions_1999.fst - ├── ... - └── under_20_admissions_2012.fst -``` -````` - -### Exposure-census-BRFFS confounders -`````{dropdown} **confounders** -```{list-table} -:header-rows: 0 -* - data_source - - US Census, BRFSS -* - rce_location - - `/nfs/nsaph_ci3/scratch/jan2021_whanhee_cache/cache_dir/ merged_exposure_confounders/` -* - fasse_location - - `confounders` -* - dataset_author - - Ben Sabath, Whanhee Lee -* - date_created - - Apr 23, 2021 -* - spatial_coverage - - contiguous US -* - spatial_resolution - - zipcode, zcta -* - temporal_coverage - - 2000-2016 -* - temporal_resolution - - annually -* - git_repository - - [data_requests](https://github.com/NSAPH/data_requests/blob/master/request_projects/jan2021_whanhee_fisrt_hosps/code/6_join_exposure_to_confounders.R) -* - size - - 247 MB -* - header - - `ZIP, year, zcta, poverty, popdensity, medianhousevalue, pct_blk, medhouseholdincome, pct_owner_occ, hispanic, education, population, pct_asian, pct_native, pct_white, smoke_rate, mean_bmi, pm25.current_year, ozone.current_year, no2.current_year, ozone_summer.current_year, pm25.one_year_lag, ozone.one_year_lag, no2.one_year_lag, ozone_summer.one_year_lag` -* - files - - -``` -``` - ├── merged_confounders_2000.csv - ├── ... - └── merged_confounders_2016.csv -``` -````` - -### ADRD Hospitalization Records -`````{dropdown} **adrd_hospitalization** -```{list-table} -:header-rows: 0 -* - dataset_author - - Shuxin Dong -* - date_created - - Jan 27, 2022 -* - data_source - - MedPar (admissions) -* - spatial_coverage - - US -* - spatial_resolution - - zipcode (unaggregated) -* - temporal_coverage - - 2000-2016 -* - temporal_resolution - - daily (with admission date) -* - description - - extract the ADRD hospitalizations based on the Chronic Condition Warehouse -* - rce_location - - `~/shared_space/ci3_analysis/ data_ADRDhospitalization/ ADRDhospitalization_CCWlist/` -* - fasse_location - - `adrd_hospitalization` -* - size - - 1.9 GB -* - git_repository - - https://github.com/ShuxinD/ADRDdata -* - other - - The Chronic Condition Warehouse list for ADRD: https://www2.ccwdata.org/web/guest/condition-categories -* - files - - -``` -``` - ├── ADRD_2000.fst - ├── ... - └── ADRD_2016.fst -``` -````{dropdown} header -``` -QID : chr -ADATE : Date -DDATE : Date -zipcode_R : int -DIAG1 : chr -DIAG2 : chr -DIAG3 : chr -DIAG4 : chr -DIAG5 : chr -DIAG6 : chr -DIAG7 : chr -DIAG8 : chr -DIAG9 : chr -DIAG10 : chr -AGE : int -Sex_gp : chr -Race_gp : chr -SSA_STATE_CD : int -SSA_CNTY_CD : int -PROV_NUM : int -ADM_SOURCE : chr -ADM_TYPE : int -Dual : int -year : num -AD_primary : logi -AD_any : logi -AD_secondary : logi -ADRD_primary : logi -ADRD_any : logi -ADRD_secondary: logi -``` -```` -````` - -### Medpar File 2000-2016 Clean -`````{dropdown} **medpar_hospital_clean_0619** -```{list-table} -:header-rows: 0 -* - dataset_author - - Mahdieh Danesh Yazdi -* - date_created - - May 2019 -* - data_source - - MedPar (admissions) -* - spatial_coverage - - US -* - size - - 1.8 GB -* - spatial_resolution - - zipcode, city -* - temporal_coverage - - 2000-2016 -* - temporal_resolution - - admissions date -* - processing_description - - The data was limited to the years 2000-2016 (1999 was dropped). Demographic data was removed (use demographic data from denominator file). Duplicated admission records were removed. For multiple admissions on the same day, the longer length of stay was kept and those without missing diagnositic codes. Subset data to keep only first two diagnostic codes. A diabetes varible was created (would review ICD codes used clinically prior to use). -* - rce_location - - `~/shared_space/ci3_mdaneshyazdi/Medpar_Data/ data/medpar_hospital_clean_0619.rds` -* - fasse_location - - `medpar_hospital_clean_0619` -* - files - - -``` -``` - └── medpar_hospital_clean_0619.rds -``` -````` - -### Denominator File 2000-2016 Clean -`````{dropdown} **denominator_clean_0619** -```{list-table} -:header-rows: 0 -* - dataset_author - - Mahdieh Danesh Yazdi -* - date_created - - May 2019 -* - data_source - - MBSF (denominator) -* - spatial_coverage - - US -* - size - - 3.8 GB -* - spatial_resolution - - zipcode -* - temporal_coverage - - 2000-2016 -* - temporal_resolution - - annually -* - processing_description - - The data was limited to the years 2000-2016 (1999 was dropped). Rows with empty or missing QID values were dropped. Those whose sex changed through follow up were dropped. Those whose race changed through follow up were assigned "Other/Unknown" category. Those who had multiple dates of death in different years were dropped. For those with multiple dates of death in the same year, earlier date of death was assigned. If duplicate rows existed, one with date of death and one without, the row with non-missing date of death was kept. Multiple QID-year rows with differing values of other variables were removed. Observations with invalid zip codes were removed. Warning: There may be excess deaths on the last day of the month due to CMS processing. Sometimes when the exact date of death is unknown, it is assigned to the last day of the month. -* - rce_location - - `~/shared_space/ci3_mdaneshyazdi/Denominator_Data/ data/denominator_clean_0619.rds` -* - fasse_location - - `denominator_clean_0619` -* - files - - -``` -``` - └── denominator_clean_0619.rds -``` -````` - -### Denominator Clean Merged with Exposure and Covariate Data -`````{dropdown} **merged_denominator_clean_0619_exp_conf** -```{list-table} -:header-rows: 0 -* - dataset_author - - Mahdieh Danesh Yazdi -* - date_created - - February 2020 -* - data_source - - MedPar (admissions), MBSF (denominator) -* - spatial_coverage - - US -* - size - - 30 GB (`fst`) and 4.4 GB (`rds`) -* - spatial_resolution - - zipcode -* - temporal_coverage - - 2000-2016 -* - temporal_resolution - - annually -* - description - - The clean denominator file merged with annual PM2.5, NO2, O3 levels from 1-km exposure models generated by Qian Di and Weeberb Requia aggregatetd to zip code level by Yaguang Wei. Also merged with covariate data from the Census, ACS, BRFSS, and Dartmouth Health Atlas created by Ben Sabath. Missing values were filled in using interpolated/extrapolated values from Liuhua Shi. (Negative values were set to 0 and values greater than 100% were set to 100%). Other missing values were dropped. The exposure values and covariate data may need to updated depending on study being done. -* - rce_location - - `~/shared_space/ci3_mdaneshyazdi/ Merged_Data/data/denominator.rds` and `~/shared_space/ci3_mdaneshyazdi/ Merged Data/denominator.fst` -* - fasse_location - - `merged_denominator_clean_0619_exp_conf` -* - files - - -``` -``` - ├── denominator.rds - └── denominator.fst -``` -````` - -### Hospital Admissions Merged with Denominator, Exposure, and Covariates -`````{dropdown} **national_exp_0621** -```{list-table} -:header-rows: 0 -* - dataset_author - - Mahdieh Danesh Yazdi -* - date_created - - Jun 2021 -* - data_source - - MedPar (admissions), MBSF (denominator) -* - spatial_coverage - - US -* - spatial_resolution - - zipcode -* - temporal_coverage - - 2000-2016 -* - temporal_resolution - - annually -* - description - - The clean denominator file merged with the clean hospital admissions data, limited to FFS patients, and then merged with annual PM2.5, NO2, O3 levels and Warm-season O3 levels from 1-km exposure models generated by Qian Di and Weeberb Requia aggregatetd to zip code level by Yaguang Wei. Also merged with covariate data from the Census, ACS, BRFSS, and Dartmouth Health Atlas created by Ben Sabath. Missing values were filled in using interpolated/extrapolated values from Liuhua Shi. (Negative values were set to 0 and values greater than 100% were set to 100%). Other missing values were dropped. The exposure values and covariate data may need to updated depending on study being done. Individuals may have multiple admissions per year. -* - size - - 32 GB -* - rce_location - - `~/shared_space/ci3_mdaneshyazdi/ Merged_Data/data/national_exp_0621.fst` -* - fasse_location - - `national_exp_0621` -* - files - - -``` -``` - └── national_exp_0621.fst -``` -````` - - -### Aggregated 2010-2016 Medicare Mortality Data with PM2.5 Exposure and ZIP code level variables -`````{dropdown} **aggregate_medicare_data_2010to2016** -```{list-table} -:header-rows: 0 -* - description - - This data contain (annually aggregated) exposure to PM2.5 data, demographic data from census and mortality data + individual level characteristics for the entire Medicare population in 1999-2016. See [Xiao’s paper](https://www.science.org/doi/10.1126/sciadv.aba5692) for processing description. -* - dataset_author - - Xiao Wu, Ben Sabath -* - date_created - - 2020 -* - data_source - - Medicaid, Exposure Data, Census Data -* - spatial_coverage - - US -* - spatial_resolution - - zipcode -* - temporal_coverage - - 2010-2016 -* - temporal_resolution - - Annually -* - publication - - https://www.science.org/doi/10.1126/sciadv.aba5692 -* - rce_location - - `~shared_space/ci3_analysis/causal_rule_ensemble /aggregate_medicare_data_2010to2016.fst` -* - fasse_location - - `aggregate_medicare_data_2010to2016` -* - git_repository - - https://github.com/wxwx1993/National_Causal -* - files - - -``` -``` - └── aggregate_medicare_data_2010to2016.fst -``` -````` - -### Nationwide Medicare Strata - -`````{dropdown} **erc_strata** -```{list-table} -:header-rows: 0 -* - dataset_author - - Kevin Josey -* - date_created - - Aug 5 2022 -* - data_source - - Medicare File from Xiao et al.'s Science Advances paper (see `denom_by_year`) -* - spatial_coverage - - contiguous US -* - spatial_resolution - - zipcode -* - temporal_coverage - - 2000-2016 -* - temporal_resolution - - annual -* - description - - Data were divided and aggregated into custom strata, then subsetted depending on several individual factors. I further merged these data tables with neighborhood level covariates. -* - rce_location - - `~/shared_space/ci3_analysis/josey_erc_strata/Data` -* - fasse_location - - `erc_strata` -* - git_repository - - https://github.com/kevjosey/erc-strata -* - size - - 6.9 GB -* - files - - -``` -``` -├── aggregate_data_qd.RData -├── aggregate_data_rm.RData -├── national_merged2016_qd.RData -├── national_merged2016_rm.RData -├── qd -│ ├── 0_all_qd.RData -│ ├── 0_asian_qd.RData -│ ├── 0_black_qd.RData -│ ├── 0_hispanic_qd.RData -│ ├── 0_white_qd.RData -│ ├── 1_all_qd.RData -│ ├── 1_asian_qd.RData -│ ├── 1_black_qd.RData -│ ├── 1_hispanic_qd.RData -│ ├── 1_white_qd.RData -│ ├── 2_all_qd.RData -│ ├── 2_asian_qd.RData -│ ├── 2_black_qd.RData -│ ├── 2_hispanic_qd.RData -│ └── 2_white_qd.RData -└── rm - ├── 0_all_rm.RData - ├── 0_asian_rm.RData - ├── 0_black_rm.RData - ├── 0_hispanic_rm.RData - ├── 0_white_rm.RData - ├── 1_all_rm.RData - ├── 1_asian_rm.RData - ├── 1_black_rm.RData - ├── 1_hispanic_rm.RData - ├── 1_white_rm.RData - ├── 2_all_rm.RData - ├── 2_asian_rm.RData - ├── 2_black_rm.RData - ├── 2_hispanic_rm.RData - └── 2_white_rm.RData -``` -````` - -### CVD Medicaid - -`````{dropdown} **cvd_medicaid** -```{list-table} -:header-rows: 0 -* - dataset_author - - Ben Sabath -* - date_created - - January 28, 2020 -* - data_source - - Medicaid -* - spatial_coverage - - US (continental) -* - spatial_resolution - - zipcode -* - temporal_coverage - - 2002-2012 -* - temporal_resolution - - daily -* - size - - 86 GB -* - git_repository - - [dec2019_medicaid_platform_cvd](https://github.com/NSAPH/data_requests/tree/master/request_projects/dec2019_medicaid_platform_cvd) -* - rce_location - - `~/shared_space/ci3_health_data /medicaid/cvd/2010_2011/desouza-2` -* - fasse_location - - `cvd_medicaid` -* - publication - - https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7896354/ -* - files - - -``` -``` -├── [2.3G] cvd.csv -├── [2.1G] cvd.sas7bdat -├── [6.3K] CVD-specific data dictionary-07-12-2018.docx -├── [6.8K] data_dictionary.md -├── [ 70G] merged_cvd_data.csv -├── [ 18K] merge.out -│ ├── [5.7K] log.txt -│ ├── [ 77] r_error.0 -│ └── [ 12K] r_out.0 -├── [1.7K] merge.R -├── [ 906] readme -└── [ 899] r.submit -``` -````` - - -### Aggregated CVD cohort Medicare - -`````{dropdown} **aggregated_cvd_cohort_medicare** -```{list-table} -:header-rows: 0 -* - dataset_author - - Jochem Klompmaker -* - date_created - - April 2022 -* - data_source - - MedPar (admissions), MBSF (denominator) -* - spatial_coverage - - US -* - spatial_resolution - - zipcode -* - temporal_coverage - - 2000-2016 -* - temporal_resolution - - annually -* - description - - Denominator file linked with hospitalization data and merged with confounders and exposures (NO2, PM2.5, ozone, temperature, humidity). Person records were aggregated by zip code, year and individual demographics -* - rce_location - - `~/shared_space/ci3_health_data/medicare/gen_admission /1999_2016/Klompmaker/merged_data/cvd2/` -* - fasse_location - - `aggregated_cvd_cohort_medicare` -* - size - - 38 GB -* - files - - -``` -``` -├── [4.2G] aggregate_CVD_65yrs.fst -├── [3.8G] aggregate_CVD_75yrs.fst -├── [3.1G] aggregate_CVD_85yrs.fst -├── [6.5G] aggregate_CVD.fst -├── [4.8G] aggregate_death_CVD.fst -├── [4.6G] aggregate__excl_1yrhosp_CVD.fst -├── [4.3G] aggregate_excl_1yrhosp_RES.fst -├── [1.2M] cc_zipyear_all.fst -├── [1.2M] cc_zipyear_confounder.fst -├── [941K] cc_zipyear_cvd.fst -├── [347M] CVD_count.fst -├── [354M] CVD_death_count.fst -├── [439M] time_count.fst -└── [439M] time_death_count.fst -``` -````` - -### Aggregated CHD cohort Medicare - -`````{dropdown} **aggregated_chd_cohort_medicare** -```{list-table} -:header-rows: 0 -* - dataset_author - - Jochem Klompmaker -* - date_created - - April 2022 -* - data_source - - MedPar (admissions), MBSF (denominator) -* - spatial_coverage - - US -* - spatial_resolution - - zipcode -* - temporal_coverage - - 2000-2016 -* - temporal_resolution - - annually -* - description - - Denominator file linked with hospitalization data and merged with confounders and exposures (NO2, PM2.5, ozone, temperature, humidity). Person records were aggregated by zip code, year and individual demographics -* - rce_location - - `~/shared_space/ci3_health_data/medicare /gen_admission /1999_2016/Klompmaker/merged_data/chd2/` -* - fasse_location - - `aggregated_chd_cohort_medicare` -* - size - - 35 GB -* - files - - -``` -``` -├── [4.1G] aggregate_CHD_65yrs.fst -├── [3.8G] aggregate_CHD_75yrs.fst -├── [3.2G] aggregate_CHD_85yrs.fst -├── [ 14G] aggregate_CHD.fst -├── [4.3G] aggregate_excl_1yrhosp_CHD.fst -├── [1.2M] cc_zipyear_chd.fst -├── [ 92M] CHD_count.fst -└── [116M] time_count.fst -``` -````` - -### Aggregated CBV cohort Medicare - -`````{dropdown} **aggregated_cbv_cohort_medicare** -```{list-table} -:header-rows: 0 -* - dataset_author - - Jochem Klompmaker -* - date_created - - April 2022 -* - data_source - - MedPar (admissions), MBSF (denominator) -* - spatial_coverage - - US -* - spatial_resolution - - zipcode -* - temporal_coverage - - 2000-2016 -* - temporal_resolution - - annually -* - description - - Denominator file linked with hospitalization data and merged with confounders and exposures (NO2, PM2.5, ozone, temperature, humidity). Person records were aggregated by zip code, year and individual demographics -* - rce_location - - `~/shared_space/ci3_health_data/medicare/gen_admission /1999_2016/Klompmaker/merged_data/cbv2/` -* - fasse_location - - `aggregated_cbv_cohort_medicare` -* - size - - 35 GB -* - files - - -``` -``` -├── [4.1G] aggregate_CBV_65yrs.fst -├── [3.8G] aggregate_CBV_75yrs.fst -├── [3.2G] aggregate_CBV_85yrs.fst -├── [ 14G] aggregate_CBV.fst -├── [4.4G] aggregate__excl_1yrhosp_CBV.fst -├── [ 93M] CBV_count.fst -├── [1.2M] cc_zipyear_cbv.fst -└── [117M] time_count.fst -``` -````` - - -### Aggregated ADRD cohort Medicare - -`````{dropdown} **aggregated_adrd_cohort_medicare** -```{list-table} -:header-rows: 0 -* - dataset_author - - Jochem Klompmaker -* - date_created - - February 2022 -* - data_source - - MedPar (admissions), MBSF (denominator) -* - spatial_coverage - - US -* - spatial_resolution - - zipcode -* - temporal_coverage - - 2000-2016 -* - temporal_resolution - - annually -* - description - - Denominator file linked with hospitalization data and merged with confounders and exposures (NDVI, blue space, park cover, NO2, PM2.5, ozone, temperature, humidity). Person records were aggregated by zip code, year and individual demographics -* - rce_location - - `~/shared_space/ci3_health_data/medicare/gen_admission /1999_2016/Klompmaker/merged_data/alz2/` -* - fasse_location - - `aggregated_adrd_cohort_medicare` -* - size - - 28 GB -* - files - - -``` -``` -├── [3.6G] aggregate_ALZ_65yrs.fst -├── [3.4G] aggregate_ALZ_75yrs.fst -├── [2.8G] aggregate_ALZ_85yrs.fst -├── [4.5G] aggregate_ALZ.fst -├── [4.4G] aggregate_death_ALZ.fst -├── [3.8G] aggregate_excl_1yrhosp_ALZ.fst -├── [358M] ALZ_count.fst -├── [387M] ALZ_death_count.fst -├── [471M] time_count.fst -└── [472M] time_death_count.fst -``` -````` - -### Aggregated PD cohort Medicare - - -`````{dropdown} **aggregated_pd_cohort_medicare** -```{list-table} -:header-rows: 0 -* - dataset_author - - Jochem Klompmaker -* - date_created - - February 2022 -* - data_source - - MedPar (admissions), MBSF (denominator) -* - spatial_coverage - - US -* - spatial_resolution - - zipcode -* - temporal_coverage - - 2000-2016 -* - temporal_resolution - - annually -* - description - - Denominator file linked with hospitalization data and merged with confounders and exposures (NDVI, blue space, park cover, NO2, PM2.5, ozone, temperature, humidity). Person records were aggregated by zip code, year and individual demographics. -* - rce_location - - `~/shared_space/ci3_health_data/medicare /gen_admission/1999_2016 /Klompmaker/merged_data/par2/` -* - fasse_location - - `aggregated_pd_cohort_medicare` -* - size - - 29 GB -* - files - - -``` -``` -├── [4.4G] aggregate_death_PAR.fst -├── [4.4G] aggregate_excl_1yrhosp_PAR.fst -├── [3.7G] aggregate_PAR_65yrs.fst -├── [3.4G] aggregate_PAR_75yrs.fst -├── [2.9G] aggregate_PAR_85yrs.fst -├── [4.6G] aggregate_PAR.fst -├── [ 94M] PAR_count.fst -├── [405M] PAR_death_count.fst -├── [222M] time_count.fst -└── [486M] time_death_count.fst -``` -````` - -### Daily County Level Heatwave Associated Hospitalizations - -`````{dropdown} **daily_county_level_heatwave_assosciated_hospitalizations** -```{list-table} -:header-rows: 0 -* - dataset_author - - Ben Sabath -* - date_created - - July 10, 2020 -* - size - - 7.7 GB -* - data_source - - MedPar (admissions), MBSF (denominator), Medicaid MAX -* - spatial_coverage - - US -* - spatial_resolution - - county -* - temporal_coverage - - 2006-2016, 1999-2016 -* - temporal_resolution - - daily -* - description - - FIPS code, race, sex, age, and dual eligibility were determined for each case based on the information in the patient summary file for that individual in the year of their admission. The denominator for each observation is calculated monthly and contains all individuals who are eligible for Fee for Service (FFS) hospitalization coverage and have not died prior to that month. The CCS codes included were 2, 50, 55, 114, 157, 159, and 244. ICD processing done using the ICD package(Wasey 2018). The author of this package asks that it be cited in papers using data that was created using the package. -* - rce_location - - `~/shared_space/ci3_health_data/medicare/heat_related` -* - fasse_location - - `daily_county_level_heatwave_assosciated_hospitalizations` -* - publication - - https://arxiv.org/abs/2102.10478 -* - git_repository - - [https://github.com/wxwx1993/TS_Stochastic](https://github.com/wxwx1993/TS_Stochastic) -* - files - - -``` -``` -├── 1999_2016 -│ └── county_ccs_hosps -│ ├── cache_dir -│ │ ├── daily_counts -│ │ │ ├── daily_counts_by_ccs_1999.fst -│ │ │ ├── ... -│ │ │ └── daily_counts_by_ccs_2016.fst -│ │ └── denom -│ │ ├── ffs_patient_summary_by_county_1999.fst -│ │ ├── ... -│ │ └── ffs_patient_summary_by_county_2016.fst -│ ├── data -│ │ ├── daily_ccs_heatwave_counts_by_fips_1999.fst -│ │ ├── ... -│ │ └── daily_ccs_heatwave_counts_by_fips_2016.fst -│ └── data_daily_hosp_mort -│ ├── daily_only_ccs_heatwave_hosp_mort_counts_by_fips_1999.fst -│ ├── ... -│ └── daily_only_ccs_heatwave_hosp_mort_counts_by_fips_2016.fst -└── 2006_2016 - └── county_ccs_hosps - ├── cache_dir - │ ├── daily_counts - │ │ ├── daily_counts_by_ccs_2006.fst - │ │ ├── ... - │ │ └── daily_counts_by_ccs_2016.fst - │ └── denom - │ ├── ffs_patient_summary_by_county_2006.fst - │ ├── ... - │ └── ffs_patient_summary_by_county_2016.fst - ├── data - │ ├── daily_ccs_heatwave_counts_by_fips_2006.fst - │ ├── ... - │ ├── daily_ccs_heatwave_counts_by_fips_2016.fst - │ ├── Daily_Heat_CCS_2006-2016_with_Temperature_by_WFO.Rda - │ ├── Daily_Heat_CCS_2006-2016_with_Temperature_by_WFO_v0.Rda - │ ├── Daily_Heat_CCS_2006-2016_with_Temperature_ERA5Land.Rda - │ ├── Daily_Heat_CCS_2006-2016_with_Temperature.Rda - │ └── Daily_Heat_CCS_2006-2016_with_Temperature_v0.Rda - ├── readme.md - └── schema.yml -``` -````` - - -### Hospitalizations for kidney disease and comorbidities - -`````{dropdown} **medicare_for_kidney_diseases** -```{list-table} -:header-rows: 0 -* - dataset_author - - Ana Trisovic -* - date_created - - July 10, 2022 -* - data_source - - MedPar (admissions), MBSF (denominator), confounders -* - spatial_coverage - - US -* - spatial_resolution - - zipcode -* - temporal_coverage - - 2000-2016 -* - temporal_resolution - - annually -* - git_repository - - [mbsf-medpar-aki-first-hospitalization](https://github.com/NSAPH-Data-Processing/mbsf-medpar-aki-first-hospitalization) -* - description - - Special modifications for the kidney diseases for numerators and denominators (people at risk) for the analysis by Whanhee Lee. -* - rce_location - - `~/shared_space/ci3_analysis/whanhee_revisions` -* - fasse_location - - `medicare_for_kidney_diseases` -* - size - - 31 GB -* - header - - `year, sex, race, zip, dual, follow_up, entry_age_group, aki_primary_secondary_first_hosp, aki_primary_secondary_first_hosp_denom, ckdhosp_prior_aki, ckdhosp_prior_aki_denom, diabeteshosp_prior_aki, diabeteshosp_prior_aki_denom, diabetes_primary_aki_secondary_first_hosp, diabetes_primary_aki_secondary_first_hosp_denom, csd_primary_aki_secondary_first_hosp, csd_primary_aki_secondary_first_hosp_denom, ihd_primary_aki_secondary_first_hosp, ihd_primary_aki_secondary_first_hosp_denom, pneumonia_primary_aki_secondary_first_hosp, pneumonia_primary_aki_secondary_first_hosp_denom, hf_primary_aki_secondary_first_hosp, hf_primary_aki_secondary_first_hosp_denom, ami_primary_aki_secondary_first_hosp, ami_primary_aki_secondary_first_hosp_denom, cerd_primary_aki_secondary_first_hosp, cerd_primary_aki_secondary_first_hosp_denom, uti_primary_aki_secondary_first_hosp, uti_primary_aki_secondary_first_hosp_denom, zcta, poverty, popdensity, medianhousevalue, pct_blk, medhouseholdincome, pct_owner_occ, hispanic, education, population, pct_asian, pct_native, pct_white, smoke_rate, mean_bmi, pm25.current_year, ozone.current_year, no2.current_year, ozone_summer.current_year, pm25.one_year_lag, ozone.one_year_lag, no2.one_year_lag, ozone_summer.one_year_lag` -* - files - - -``` -``` -└── [ 27G] final.csv -``` -````` - -### IHD medicare hospitalizations (2005) - -`````{dropdown} **ihd_medicare_hosp_2005** -```{list-table} -:header-rows: 0 -* - dataset_name - - IHD medicare hospitalizations (2005) -* - dataset_author - - Cory Zigler -* - date_created - - Oct 4 2018 -* - data_source - - MedPar (admissions) -* - spatial_coverage - - US -* - spatial_resolution - - zipcode -* - temporal_coverage - - 2005 -* - temporal_resolution - - annually -* - size - - 234 MB -* - rce_location - - `~/shared_space/ci3_analysis/ zigler_lab/projects/ BipartiteInterference_GPS/ BipartiteInterference_GPS/ Data` -* - fasse_location - - `ihd_medicare_hosp_2005` -* - files - - -``` -``` -├── [4.8K] 00Tree.html -├── [348K] AnnualFacilityData.Rda -├── [773K] AnnualUnitData.Rda -├── [ 12K] Create Analysis Data.R -├── [6.4K] Create HyADS Adjacency Matrix.R -├── [9.3K] Create Power Plant Data.R -├── [5.8K] Create Zip Code Data.R -├── [ 10M] data_nomed.Rda -├── [ 31K] facilities_for_analysis.Rda -├── [ 53M] HyADSmat.Rda -├── [108M] HyADSmat_replaced20191212.Rda -├── [3.1M] MonthlyFacilityData.Rda -├── [9.7M] MonthlyUnitData.Rda -├── [ 11M] out.zip_pp.rda -├── [ 114] Readme -├── [5.6M] ZipcodeData.Rda -└── [ 89K] zips_included.rda -``` -````` - -### Daily Florida Hospitalization Counts by Zip - -`````{dropdown} **daily-florida-hosp-counts-zip** -```{list-table} -:header-rows: 0 -* - dataset_author - - Ben Sabath, Kate Burrows -* - date_created - - February 07 2020 -* - data_source - - MedPar (admissions), MBSF (denominator) -* - spatial_coverage - - Florida -* - spatial_resolution - - zipcode -* - temporal_coverage - - 1999-2016 -* - temporal_resolution - - daily -* - processing_description - - Denominator file linked with hospitalization data. This is the raw unprocessed data. -* - size - - 2.1 GB -* - rce_location - - `~/shared_space/ci3_health_data /medicare/gen_admission /1999_2016/burrows/cache_data` -* - fasse_location - - `daily-florida-hosp-counts-zip` -* - files - - -``` -``` -├── [308K] Burrows_DataRequest_September2019.pdf -├── [ 19M] death_count -│ ├── [1.0M] death_count_1999.fst -│ ├── [1.0M] ... -│ └── [1.2M] death_count_2016.fst -├── [104M] hosp_count -│ ├── [5.5M] hosp_count_1999.fst -│ ├── [5.6M] ... -│ └── [5.2M] hosp_count_2016.fst -├── [1.6G] merged_data -│ ├── [ 86M] daily_zips_1999.fst -│ ├── [106M] ... -│ └── [106M] daily_zips_2016.fst -└── [7.2M] zip_denom - ├── [382K] zip_denom_1999.fst - ├── [440K] ... - └── [450K] zip_denom_2016.fst -``` -````` - -### Coal PM2.5 Source Impacts - -`````{dropdown} **coal_exposure_pm25** -```{list-table} -:header-rows: 0 -* - dataset_author - - Lucas Henneman -* - date_created - - Sep 14, 2022 -* - data_source - - HyADS exposure modeling -* - spatial_coverage - - US -* - spatial_resolution - - zipcode -* - temporal_coverage - - 1999-2020 -* - temporal_resolution - - annually -* - rce_location - - `/nfs/home/H/henneman /shared_space/ci3_nsaph/ LucasH/disperseR/ main/output/ zips_model.lm.cv_single_poly` -* - fasse_location - - `coal_exposure_pm25` -* - GitHub repository/directory on how the data was processed - - https://github.com/lhenneman/coal_unit_PM25 -* - exposures - - This was created with the HyADS model using emissions from EPA's CAMD database. -* - meterological - - NOAA/NCAR reanalysis data. -* - size - - 6.3 GB -* - files - - -``` -``` -├── [300M] zips_pm25_byunit_1999.fst -├── [291M] ... -├── [134M] zips_pm25_byunit_2020.fst -├── [599K] zips_pm25_total_1999.fst -├── [599K] ... -└── [599K] zips_pm25_total_2020.fst -``` -````` - -### Aggregated 2000-2016 Medicare Mortality Data with PM2.5 Exposure by ZIP code - -`````{dropdown} **aggregated_2000-2016_medicare_mortality_pm25_zip** -```{list-table} -:header-rows: 0 -* - dataset_author - - Xiao Wu, Ben Sabath -* - date_created - - 2020 -* - data_source - - Medicaid, Exposure Data, Census Data -* - spatial_coverage - - US -* - spatial_resolution - - zipcode -* - temporal_coverage - - 2000-2016 -* - temporal_resolution - - Annually -* - processing_description - - See [Xiao’s paper](https://www.science.org/doi/10.1126/sciadv.aba5692) for processing description. -* - rce_location - - `~/shared_space/ci3_mic6949/ input_data/aggregate_data.RDS` -* - fasse_location - - `aggregated_2000-2016_medicare_mortality_pm25_zip` -* - publication - - [Xiao’s paper](https://www.science.org/doi/10.1126/sciadv.aba5692) -* - git_repository - - [National_Causal](https://github.com/wxwx1993/National_Causal) -* - size - - 166 MB -* - files - - -``` -``` -└── [166M] aggregate_data.RDS -``` -````` - - -### Mortality Prediction - -`````{dropdown} **Mortality Prediction** -```{list-table} -:header-rows: 0 -* - dataset_author - - Kaela Nelson -* - date_created - - March, 2021 -* - data_source - - Medicare and Medicaid Beneficiary Summary files, Census, and environmental/exposure data -* - spatial_coverage - - US -* - spatial_resolution - - zipcode -* - temporal_coverage - - 2011-2016 -* - temporal_resolution - - Annually -* - fasse_location - - `mortality_prediction` -* - description - - The hospitalization data was processed by converting the primary and secondary diagnoses into binary indicators. The data was then aggregated by zipcode and year, with cause-specific hospitalizations summed and other variables such as age averaged. For the seasonal environmental and exposure data, rolling averages were calculated over a 4-year period to capture long-term trends. Predictors were included for winter, spring, summer, and fall, as well as temperature and humidity. The demographic data obtained from the US Census Bureau was already available by zipcode and year. Subsequently, the census, hospitalization, and environmental/exposure data were merged by zipcode and year. To determine yearly death counts by zipcode, the number of deaths per year per zipcode was summed. In order to predict the death counts for the following year, all death counts were shifted one year forward for each zipcode. -* - githutb_repository - - [NSAPH-Data-Processing/mortality_prediction](https://github.com/NSAPH-Data-Processing/mortality_prediction/tree/main) -* - size - - 103 G -* - files - - -``` -``` -. -├── [177M] hosp_aggregated_zip_2011_2016.csv -├── [ 73G] medicare_2011_2016.csv -├── [ 69M] medicare_aggregated_zip_2011_2016.csv -├── [175M] medicare_deaths_monthly_2011_2016.csv -├── [4.6G] medicare_hosp_admin_2011_2016.csv -├── [893M] medicare_hosp_merged_zip_2011_2016.csv -├── [151M] merged_df3.csv -├── [1.4G] merged_df3_monthly.csv -├── [1.4G] merged_med_seasonal_2011_2016_v2.csv -├── [330M] monthly_temp_by_zip.csv -├── [195M] seasonal_environmental_data.csv -├── [144M] state_nerged_aggregated_.csv -├── [ 86M] train_test_merged.csv -├── [159M] zip_test_monthly.csv -├── [220M] zip_test_monthly_v2.csv -├── [ 14M] zip_test_yearly.csv -├── [7.3M] zip_test_yearly_higher_pop.csv -├── [6.8M] zip_test_yearly_lower_pop.csv -├── [ 14M] zip_test_yearly_v2.csv -├── [818M] zip_train_monthly.csv -├── [1.1G] zip_train_monthly_v2.csv -├── [ 73M] zip_train_yearly.csv -├── [ 37M] zip_train_yearly_higher_pop.csv -├── [ 34M] zip_train_yearly_lower_pop.csv -└── [ 71M] zip_train_yearly_v2.csv -``` -````` - - - diff --git a/handbook/_toc.yml b/handbook/_toc.yml index 1809bef..aef5665 100644 --- a/handbook/_toc.yml +++ b/handbook/_toc.yml @@ -34,11 +34,8 @@ parts: chapters: - file: red - file: fasse - - file: fasse_partitions - - file: fasse_github - - file: fasse_efficient - title: Efficient Resource Utilization on FASSE - - file: rstudio + - file: labshare_github + - file: fairshare - file: cannon - file: bashrc - file: vscode diff --git a/handbook/analytic.md b/handbook/analytic.md deleted file mode 100644 index f401935..0000000 --- a/handbook/analytic.md +++ /dev/null @@ -1,1585 +0,0 @@ -# Analytic data sets - -## Catalog - -The following data is available at: **`/n/dominici_nsaph_l3/Lab/projects/analytic/`** - -### MedPar (Admissions) - -`````{dropdown} **admissions_by_year** - -```{list-table} -:header-rows: 0 - -* - data_source - - MedPar -* - fasse_location - - `admissions_by_year` -* - rce_location - - `~/shared_space/ci3_health_data/medicare/gen_admission/ 1999_2016/targeted_conditions/cache_data/admissions_by_year/` -* - date_created - - Feb 20 2020 -* - size - - 22 GB -* - files - - -``` -``` - ├── admissions_1999.fst - ├── admissions_2000.fst - ├── ... - └── admissions_2016.fst -``` -````{dropdown} header -``` -QID : chr -AGE : int -SEX : int -RACE : int -SSA_STATE_CD : int -SSA_CNTY_CD : int -PROV_NUM : int -ADM_SOURCE : chr -ADM_TYPE : int -ADATE : chr -DDATE : chr -BENE_DOD : chr -DODFLAG : chr -ICU_DAY : int -CCI_DAY : int -ICU : int -CCI : int -DIAG1 : chr -DIAG2 : chr -DIAG3 : chr -DIAG4 : chr -DIAG5 : chr -DIAG6 : chr -DIAG7 : chr -DIAG8 : chr -DIAG9 : chr -DIAG10 : logi -diag11 : logi -diag12 : logi -diag13 : logi -diag14 : logi -diag15 : logi -diag16 : logi -diag17 : logi -diag18 : logi -diag19 : logi -diag20 : logi -diag21 : logi -diag22 : logi -diag23 : logi -diag24 : logi -diag25 : logi -YEAR : int -LOS : int -Parkinson_pdx : int -Parkinson_pdx2dx_10 : int -Parkinson_pdx2dx_25 : int -Alzheimer_pdx : int -Alzheimer_pdx2dx_10 : int -Alzheimer_pdx2dx_25 : int -Dementia_pdx : int -Dementia_pdx2dx_10 : int -Dementia_pdx2dx_25 : int -CHF_pdx : int -CHF_pdx2dx_10 : int -CHF_pdx2dx_25 : int -AMI_pdx : int -AMI_pdx2dx_10 : int -AMI_pdx2dx_25 : int -COPD_pdx : int -COPD_pdx2dx_10 : int -COPD_pdx2dx_25 : int -DM_pdx : int -DM_pdx2dx_10 : int -DM_pdx2dx_25 : int -Stroke_pdx : int -Stroke_pdx2dx_10 : int -Stroke_pdx2dx_25 : int -CVD_pdx : int -CVD_pdx2dx_10 : int -CVD_pdx2dx_25 : int -CSD_pdx : int -CSD_pdx2dx_10 : int -CSD_pdx2dx_25 : int -Ischemic_stroke_pdx : int -Ischemic_stroke_pdx2dx_10: int -Ischemic_stroke_pdx2dx_25: int -Hemo_Stroke_pdx : int -Hemo_Stroke_pdx2dx_10 : int -Hemo_Stroke_pdx2dx_25 : int -zipcode_R : int -Race_gp : chr -Sex_gp : chr -age_gp : chr -Dual : int -``` -```` -````` - -### MBSF (Denominator) -`````{dropdown} **denom** - -```{list-table} -:header-rows: 0 - -* - data_source - - MBSF -* - fasse_location - - `denom` -* - size - - 7.4 GB -* - files - - -``` -``` - ├── qid_data_2009.fst - ├── qid_data_2010.fst - ├── ... - ├── qid_data_2016.fst - ├── qid_entry_exit.fst - └── year_zip_confounders.fst -``` -````{dropdown} header (qid_data_yyyy) -``` -qid : chr -year : int -zip : int -sex : int -age : int -dual : chr -dead : logi -hmo_mo: chr -fips : int -race : chr -sexM : num -``` -```` -````{dropdown} header (year_zip_confounders) -``` -zip : num -year : int -mean_bmi : num -smoke_rate : num -hispanic : num -pct_blk : num -medhouseholdincome: num -medianhousevalue : num -poverty : num -education : num -popdensity : num -pct_owner_occ : num -summer_tmmx : num -winter_tmmx : num -summer_rmax : num -winter_rmax : num -city : chr -statecode : chr -latitude : num -longitude : num - -min_year: 2000 -max_year: 2016 -``` -```` -````` - -### Annual Exposure per Medicare Beneficiary -`````{dropdown} **qid_yr_exposures** - -```{list-table} -:header-rows: 0 - -* - rce_location - - `~/shared_space/ci3_analysis/dmork/Data/DLM_ADRD` -* - fasse_location - - `qid_yr_exposures` -* - dataset_author - - Daniel Mork -* - date_created - - April 2022 -* - size - - 139 GB -* - description - - Annual exposure measurements (columns, 2000-2016) for each Medicare benficiary (rows) tied to their zip code of residence in a given year. Exposures (xxx in file name) include: no2, ozone, pm2.5, pm2.5components, pr (precipitation), rmax (max humidity), tmmx (max temperature), zip (zip code of residence). -* - files - - -``` - -``` - ├── qid_yr_no2.fst - ├── qid_yr_ozone.fst - ├── qid_yr_pm25comp_br.fst - ├── qid_yr_pm25comp_ca.fst - ├── qid_yr_pm25comp_cu.fst - ├── qid_yr_pm25comp_ec.fst - ├── qid_yr_pm25comp_fe.fst - ├── qid_yr_pm25comp_k.fst - ├── qid_yr_pm25comp_nh4.fst - ├── qid_yr_pm25comp_ni.fst - ├── qid_yr_pm25comp_no3.fst - ├── qid_yr_pm25comp_oc.fst - ├── qid_yr_pm25comp_pb.fst - ├── qid_yr_pm25comp_si.fst - ├── qid_yr_pm25comp_so4.fst - ├── qid_yr_pm25comp_v.fst - ├── qid_yr_pm25comp_z.fst - ├── qid_yr_pm25.fst - ├── qid_yr_pr.fst - ├── qid_yr_rmax.fst - ├── qid_yr_tmmx.fst - └── qid_yr_zip.fst -``` - -````{dropdown} header (qid_yr_xxx.fst): -``` -qid : chr -2000: num -2001: num -2002: num -2003: num -2004: num -2005: num -2006: num -2007: num -2008: num -2009: num -2010: num -2011: num -2012: num -2013: num -2014: num -2015: num -2016: num -``` -```` -````` - -### MBSF (Enrollment file, denominator) -`````{dropdown} **denom_by_year** - -```{list-table} -:header-rows: 0 - -* - data_source - - MBSF, census (interpolated), BRFSS (interpolated), PM2.5 exposure, seasonal temperature -* - rce_location - - `~/shared_space/ci3_health_data/medicare/mortality/ 1999_2016/wu/cache_data/merged_by_year_v2` -* - fasse_location - - `denom_by_year` -* - git_repository - - [github.com/NSAPH/National-Causal-Analysis](https://github.com/NSAPH/National-Causal-Analysis/tree/master/MergedData) -* - dataset_author - - Ben Sabath, Xiao Wu -* - spatial_resolution - - zipcode -* - temporal_coverage - - 1999-2016 -* - processing_description - - Recommended for use. Available in both `.fst` and `.csv` formats on FASSE. -* - date_created - - Apr 2021 -* - size - - 7.4 GB -* - files - - -``` - -``` - ├── confounder_exposure_merged_nodups_health_1999.fst - ├── ... - └── confounder_exposure_merged_nodups_health_2016.fst -``` -````{dropdown} header -``` -zip : int -year : int -qid : chr -dodflag : chr -bene_dod : chr -sex : int -race : int -age : int -hmo_mo : chr -hmoind : chr -statecode : chr -latitude : num -longitude : num -dual : chr -death : int -dead : logi -entry_age : int -entry_year : int -entry_age_break : int -followup_year : num -followup_year_plus_one : num -pm25_ensemble : num -pm25_no_interp : num -pm25_nn : num -ozone : num -ozone_no_interp : num -zcta : int -poverty : num -popdensity : num -medianhousevalue : num -pct_blk : num -medhouseholdincome : num -pct_owner_occ : num -hispanic : num -education : num -population : num -zcta_no_interp : int -poverty_no_interp : num -popdensity_no_interp : num -medianhousevalue_no_interp : num -pct_blk_no_interp : num -medhouseholdincome_no_interp: num -pct_owner_occ_no_interp : num -hispanic_no_interp : num -education_no_interp : num -population_no_interp : int -smoke_rate : num -mean_bmi : num -smoke_rate_no_interp : num -mean_bmi_no_interp : num -amb_visit_pct : num -a1c_exm_pct : num -amb_visit_pct_no_interp : num -a1c_exm_pct_no_interp : num -tmmx : num -rmax : num -pr : num -cluster_cat : chr -fips_no_interp : int -fips : int -summer_tmmx : num -summer_rmax : num -winter_tmmx : num -winter_rmax : num -``` -```` -````` - -### AD/ADRD Hospitalization -`````{dropdown} **hospitalization** - -```{list-table} -:header-rows: 0 - -* - data_source - - MedPar derived -* - rce_location - - `~/shared_space/ci3_analysis/dmork/Data/DLM_ADRD` -* - fasse_location - - `hospitalization` -* - dataset_author - - Daniel Mork -* - description - - The first recorded hospitalization for each individual broken down by primary/secondary/any billing code (ICD). -* - size - - 1.2 GB -* - files - - -``` -``` - ├── First_hosp_AD_any.fst - ├── First_hosp_AD_primary.fst - ├── First_hosp_ADRD_any.fst - ├── First_hosp_ADRD_primary.fst - ├── First_hosp_ADRD_secondary.fst - └── First_hosp_AD_secondary.fst -``` -````{dropdown} header -``` -QID : Factor -ADATE: Date -year : num -``` -```` -````` - -### Medicare Entry Age -`````{dropdown} **medicare_entry_age** - -```{list-table} -:header-rows: 0 - -* - data_source - - MBSF derived -* - rce_location - - `/nfs/nsaph_ci3/scratch/jan2021_whanhee_cache/entry_age/` -* - fasse_location - - `medicare_entry_age` -* - size - - 2.3 GB -* - date_created - - Jan 26, 2021 -* - dataset_author - - Ben Sabath, Whenhee Lee -* - spatial_resolution - - zipcode -* - git_repository - - [NSAPH/data_requests](https://github.com/NSAPH/data_requests/blob/master/request_projects/jan2021_whanhee_fisrt_hosps/code/1_create_indivdual_vars.R) -* - files - - -``` -``` - └── medicare_entry_age.csv -``` -````` - -### Years in Medicare -`````{dropdown} **years_in_medicare** -```{list-table} -:header-rows: 0 - -* - data_source - - MBSF derived -* - rce_location - - `/nfs/nsaph_ci3/scratch/jan2021_whanhee_cache/follow_up/` -* - fasse_location - - `years_in_medicare` -* - description - - Number of years a beneficiary has been in Medicare (or in other words, the number of years since one has entered Medicare). Allows for grouping on how long beneficiaries have been in Medicare. -* - size - - 8.8 GB -* - date_created - - Jan 26, 2021 -* - temporal_coverage - - 1999-2016 -* - dataset_author - - Ben Sabath, Whanhee Lee -* - spatial_resolution - - zipcode -* - git_repository - - [NSAPH/data_requests](https://github.com/NSAPH/data_requests/blob/master/request_projects/jan2021_whanhee_fisrt_hosps/code/1_create_indivdual_vars.R) -* - files - - -``` -``` - ├── follow_up_year_2000.fst - ├── ... - └── follow_up_year_2016.fst -``` -````` - -### Temperature Humidity Precipitation -`````{dropdown} **temperature_seasonal_zipcode** -```{list-table} -:header-rows: 0 -* - rce_location - - `/nfs/nsaph_ci3/ci3_confounders/data_for_analysis/earth_engine/ temperature/temperature_seasonal_zipcode_combined.csv` -* - fasse_location - - `temperature_seasonal_zipcode` -* - dataset_author - - Xiao Wu, Ben Sabath -* - date_created - - Jul 23, 2020 -* - data_source - - Google Earth Engine provides a single interface for interacting with a number of geospatial data sources. The sources used and links to their documentation are: [GRIDMET](https://developers.google.com/earth-engine/datasets/catalog/IDAHO_EPSCOR_GRIDMET), [NLDAS](https://developers.google.com/earth-engine/datasets/catalog/NASA_NLDAS_FORA0125_H002), [MODIS MOD10A1.006](https://developers.google.com/earth-engine/datasets/catalog/MODIS_006_MOD10A1), [GLDAS](https://developers.google.com/earth-engine/datasets/catalog/NASA_GLDAS_V021_NOAH_G025_T3H), [NOAA CDR PATMOSX](https://developers.google.com/earth-engine/datasets/catalog/NOAA_CDR_PATMOSX_V53), [NOAA NCEP Climate Forecast System V2](https://developers.google.com/earth-engine/datasets/catalog/NOAA_CFSV2_FOR6H) -* - spatial_coverage - - contiguous US -* - spatial_resolution - - zipcode -* - temporal_coverage - - 1999-2019 -* - temporal_resolution - - annually -* - description - - This dataset contains information on temperature, relative humidity, and total precipitation data. The data is available as raster files on Google earth engine. The temporal and spatial resolutions varied by data source, but all were available at a daily resolution or more frequently. Where the time resolution of the rasters is more than daily, daily averages for each raster were calculated. Next, using Google earth engine's spatial averaging algorithms and a set of polygons representing the areas of interest, the daily value for each polygon was calculated. The polygons used were the ones described in the preceding section. The results of this calculation were then downloaded as a csv file to the RCE. At this point, there is one file for each year. Following this, annual averages are calculated for each location, and these are combined in to a single file. The daily values are also combined in to a single file. For the `combined_zips` files (which combine the zip code polygon based measures with the the point based estimates to address zip codes without area) there is an additional step. Values for zip codes not in the polygon based measure are taken from the point based measures to address the ~7000 zip codes without area that are missing from the polygon shape file. -* - git_repository - - [NSAPH/data_documentation](https://github.com/NSAPH/data_documentation/blob/master/earth_engine_docs/earth_engine_data.Rmd) -* - meterological - - Temperature (K) - variable name: tmmx (Source: GRIDMET); Relative Humidity - variable name: rmax (Source: GRIDMET) -* - size - - 65 MB -* - header - - `ZIP,year,summer_tmmx,summer_rmax,winter_tmmx,winter_rmax` -* - files - - -``` -``` - └── temperature_seasonal_zipcode_combined.csv -``` -````` - -### Pollution-Census-Temperature covariates -`````{dropdown} **merged_covariates_pm_census_temp** -```{list-table} -:header-rows: 0 -* - data_source - - US Census/ACS, Business Analyst Data Set, BRFSS -* - rce_location - - `/nfs/nsaph_ci3/ci3_health_data/medicare/ mortality/1999_2016/wu/output_data /merged_covariates.csv` -* - fasse_location - - `merged_covariates_pm_census_temp` -* - dataset_author - - Xiao Wu, Ben Sabath -* - date_created - - May 29, 2019 -* - spatial_coverage - - contiguous US -* - spatial_resolution - - zipcode -* - temporal_coverage - - 2000-2016 -* - temporal_resolution - - annually -* - publication - - https://www.science.org/doi/10.1126/sciadv.aba5692 -* - git_repository - - [nejm_confounder_summary/nejm_confounder](https://github.com/NSAPH/data_documentation/blob/master/nejm_confounder_summary/nejm_confounders.csv) and [rce_data_list/confounder_data](https://github.com/NSAPH/data_documentation/blob/master/rce_data_list/confounder_data.csv) -* - size - - 296 MB -* - header - - `zip, year, pm25_ensemble, pm25_no_interp, pm25_nn, ozone, ozone_no_interp, zcta, poverty, popdensity, medianhousevalue, pct_blk, medhouseholdincome, pct_owner_occ, hispanic, education, population, zcta_no_interp, poverty_no_interp, popdensity_no_interp, medianhousevalue_no_interp, pct_blk_no_interp, medhouseholdincome_no_interp, pct_owner_occ_no_interp, hispanic_no_interp, education_no_interp, population_no_interp, smoke_rate, mean_bmi, smoke_rate_no_interp, mean_bmi_no_interp, amb_visit_pct, a1c_exm_pct, amb_visit_pct_no_interp, a1c_exm_pct_no_interp, tmmx, rmax, pr, cluster_cat, fips, fips_no_interp` -* - files - - -``` -``` - └── merged_covariates.csv -``` -````` - -### Population-Weighted Daily County-Level Heat Metrics -`````{dropdown} **county_heat_metrics** -```{list-table} -:header-rows: 0 -* - data_source - - ERA5-Land gridded data -* - fasse_location - - `heatvars_county_2000-2020` -* - dataset_author - - Keith Spangler -* - date_created - - June 17, 2022 -* - spatial_coverage - - contiguous US -* - spatial_resolution - - county -* - temporal_coverage - - 2000-2020 -* - temporal_resolution - - daily -* - publication - - https://pubmed.ncbi.nlm.nih.gov/35715416/ -* - size - - 1.03 GB -* - header - - `"StCoFIPS", "Date", "Tmin_C", "Tmax_C", "Tmean_C", "TDmin_C", "TDmax_C", "TDmean_C", "NETmin_C", "NETmax_C", "NETmean_C", "HImin_C", "HImax_C", "HImean_C", "HXmin_C", "HXmax_C", "HXmean_C", "WBGTmin_C", "WBGTmax_C", "WBGTmean_C", "UTCImin_C", "UTCImax_C", "UTCImean_C", "Flag_T", "Flag_TD", "Flag_NET", "Flag_HI", "Flag_HX", "Flag_WBGT", "Flag_UTCI"` -* - files - - -``` -``` - └── Heatvars_County_2000-2020_v1.2.Rds -``` -````` - -### Medicaid - Respiratory Hospitalizations in Children -`````{dropdown} **medicaid_children_99-12** -```{list-table} -:header-rows: 0 -* - data_source - - Medicaid -* - rce_location - - `/nfs/nsaph_ci3/ci3_health_data/medicaid/respiratory /1999_2012/youth_resp_hosps_jlee/data` -* - fasse_location - - `medicaid_children_99-12` -* - dataset_author - - Jenny Lee -* - date_created - - 2021 -* - spatial_coverage - - contiguous US -* - spatial_resolution - - zipcode -* - temporal_coverage - - 1999-2012 -* - temporal_resolution - - annually -* - description - - The data prepared for this project consists of the Medicaid Fee For Service population, with unrestricted Medicaid benefits, under the age of 20 from 1999-2012. This data also includes all hospitalizations for that population, with indicators included regarding whether or not they were associated with a set of respiratory hospitalizations. See the schema for the hospitalization data below for details on specific indicators. -* - git_repository - - [NSAPH/data_requests](https://github.com/NSAPH/data_requests/tree/master/request_projects/feb2021_jenny_medicaid_resp) -* - exposures - - Xiao Wu's CausalGPS PM2.5 data -* - size - - 14 GB -* - files - - -``` -``` -├── denom -│ ├── denom_under_20_1999.fst -│ ├── ... -│ └── denom_under_20_2012.fst -└── hosps - ├── under_20_admissions_1999.fst - ├── ... - └── under_20_admissions_2012.fst -``` -````` - -### Exposure-census-BRFFS confounders -`````{dropdown} **confounders** -```{list-table} -:header-rows: 0 -* - data_source - - US Census, BRFSS -* - rce_location - - `/nfs/nsaph_ci3/scratch/jan2021_whanhee_cache/cache_dir/ merged_exposure_confounders/` -* - fasse_location - - `confounders` -* - dataset_author - - Ben Sabath, Whanhee Lee -* - date_created - - Apr 23, 2021 -* - spatial_coverage - - contiguous US -* - spatial_resolution - - zipcode, zcta -* - temporal_coverage - - 2000-2016 -* - temporal_resolution - - annually -* - git_repository - - [data_requests](https://github.com/NSAPH/data_requests/blob/master/request_projects/jan2021_whanhee_fisrt_hosps/code/6_join_exposure_to_confounders.R) -* - size - - 247 MB -* - header - - `ZIP, year, zcta, poverty, popdensity, medianhousevalue, pct_blk, medhouseholdincome, pct_owner_occ, hispanic, education, population, pct_asian, pct_native, pct_white, smoke_rate, mean_bmi, pm25.current_year, ozone.current_year, no2.current_year, ozone_summer.current_year, pm25.one_year_lag, ozone.one_year_lag, no2.one_year_lag, ozone_summer.one_year_lag` -* - files - - -``` -``` - ├── merged_confounders_2000.csv - ├── ... - └── merged_confounders_2016.csv -``` -````` - -### ADRD Hospitalization Records -`````{dropdown} **adrd_hospitalization** -```{list-table} -:header-rows: 0 -* - dataset_author - - Shuxin Dong -* - date_created - - Jan 27, 2022 -* - data_source - - MedPar (admissions) -* - spatial_coverage - - US -* - spatial_resolution - - zipcode (unaggregated) -* - temporal_coverage - - 2000-2016 -* - temporal_resolution - - daily (with admission date) -* - description - - extract the ADRD hospitalizations based on the Chronic Condition Warehouse -* - rce_location - - `~/shared_space/ci3_analysis/ data_ADRDhospitalization/ ADRDhospitalization_CCWlist/` -* - fasse_location - - `adrd_hospitalization` -* - size - - 1.9 GB -* - git_repository - - https://github.com/ShuxinD/ADRDdata -* - other - - The Chronic Condition Warehouse list for ADRD: https://www2.ccwdata.org/web/guest/condition-categories -* - files - - -``` -``` - ├── ADRD_2000.fst - ├── ... - └── ADRD_2016.fst -``` -````{dropdown} header -``` -QID : chr -ADATE : Date -DDATE : Date -zipcode_R : int -DIAG1 : chr -DIAG2 : chr -DIAG3 : chr -DIAG4 : chr -DIAG5 : chr -DIAG6 : chr -DIAG7 : chr -DIAG8 : chr -DIAG9 : chr -DIAG10 : chr -AGE : int -Sex_gp : chr -Race_gp : chr -SSA_STATE_CD : int -SSA_CNTY_CD : int -PROV_NUM : int -ADM_SOURCE : chr -ADM_TYPE : int -Dual : int -year : num -AD_primary : logi -AD_any : logi -AD_secondary : logi -ADRD_primary : logi -ADRD_any : logi -ADRD_secondary: logi -``` -```` -````` - -### Medpar File 2000-2016 Clean -`````{dropdown} **medpar_hospital_clean_0619** -```{list-table} -:header-rows: 0 -* - dataset_author - - Mahdieh Danesh Yazdi -* - date_created - - May 2019 -* - data_source - - MedPar (admissions) -* - spatial_coverage - - US -* - size - - 1.8 GB -* - spatial_resolution - - zipcode, city -* - temporal_coverage - - 2000-2016 -* - temporal_resolution - - admissions date -* - processing_description - - The data was limited to the years 2000-2016 (1999 was dropped). Demographic data was removed (use demographic data from denominator file). Duplicated admission records were removed. For multiple admissions on the same day, the longer length of stay was kept and those without missing diagnositic codes. Subset data to keep only first two diagnostic codes. A diabetes varible was created (would review ICD codes used clinically prior to use). -* - rce_location - - `~/shared_space/ci3_mdaneshyazdi/Medpar_Data/ data/medpar_hospital_clean_0619.rds` -* - fasse_location - - `medpar_hospital_clean_0619` -* - files - - -``` -``` - └── medpar_hospital_clean_0619.rds -``` -````` - -### Denominator File 2000-2016 Clean -`````{dropdown} **denominator_clean_0619** -```{list-table} -:header-rows: 0 -* - dataset_author - - Mahdieh Danesh Yazdi -* - date_created - - May 2019 -* - data_source - - MBSF (denominator) -* - spatial_coverage - - US -* - size - - 3.8 GB -* - spatial_resolution - - zipcode -* - temporal_coverage - - 2000-2016 -* - temporal_resolution - - annually -* - processing_description - - The data was limited to the years 2000-2016 (1999 was dropped). Rows with empty or missing QID values were dropped. Those whose sex changed through follow up were dropped. Those whose race changed through follow up were assigned "Other/Unknown" category. Those who had multiple dates of death in different years were dropped. For those with multiple dates of death in the same year, earlier date of death was assigned. If duplicate rows existed, one with date of death and one without, the row with non-missing date of death was kept. Multiple QID-year rows with differing values of other variables were removed. Observations with invalid zip codes were removed. Warning: There may be excess deaths on the last day of the month due to CMS processing. Sometimes when the exact date of death is unknown, it is assigned to the last day of the month. -* - rce_location - - `~/shared_space/ci3_mdaneshyazdi/Denominator_Data/ data/denominator_clean_0619.rds` -* - fasse_location - - `denominator_clean_0619` -* - files - - -``` -``` - └── denominator_clean_0619.rds -``` -````` - -### Denominator Clean Merged with Exposure and Covariate Data -`````{dropdown} **merged_denominator_clean_0619_exp_conf** -```{list-table} -:header-rows: 0 -* - dataset_author - - Mahdieh Danesh Yazdi -* - date_created - - February 2020 -* - data_source - - MedPar (admissions), MBSF (denominator) -* - spatial_coverage - - US -* - size - - 30 GB (`fst`) and 4.4 GB (`rds`) -* - spatial_resolution - - zipcode -* - temporal_coverage - - 2000-2016 -* - temporal_resolution - - annually -* - description - - The clean denominator file merged with annual PM2.5, NO2, O3 levels from 1-km exposure models generated by Qian Di and Weeberb Requia aggregatetd to zip code level by Yaguang Wei. Also merged with covariate data from the Census, ACS, BRFSS, and Dartmouth Health Atlas created by Ben Sabath. Missing values were filled in using interpolated/extrapolated values from Liuhua Shi. (Negative values were set to 0 and values greater than 100% were set to 100%). Other missing values were dropped. The exposure values and covariate data may need to updated depending on study being done. -* - rce_location - - `~/shared_space/ci3_mdaneshyazdi/ Merged_Data/data/denominator.rds` and `~/shared_space/ci3_mdaneshyazdi/ Merged Data/denominator.fst` -* - fasse_location - - `merged_denominator_clean_0619_exp_conf` -* - files - - -``` -``` - ├── denominator.rds - └── denominator.fst -``` -````` - -### Hospital Admissions Merged with Denominator, Exposure, and Covariates -`````{dropdown} **national_exp_0621** -```{list-table} -:header-rows: 0 -* - dataset_author - - Mahdieh Danesh Yazdi -* - date_created - - Jun 2021 -* - data_source - - MedPar (admissions), MBSF (denominator) -* - spatial_coverage - - US -* - spatial_resolution - - zipcode -* - temporal_coverage - - 2000-2016 -* - temporal_resolution - - annually -* - description - - The clean denominator file merged with the clean hospital admissions data, limited to FFS patients, and then merged with annual PM2.5, NO2, O3 levels and Warm-season O3 levels from 1-km exposure models generated by Qian Di and Weeberb Requia aggregatetd to zip code level by Yaguang Wei. Also merged with covariate data from the Census, ACS, BRFSS, and Dartmouth Health Atlas created by Ben Sabath. Missing values were filled in using interpolated/extrapolated values from Liuhua Shi. (Negative values were set to 0 and values greater than 100% were set to 100%). Other missing values were dropped. The exposure values and covariate data may need to updated depending on study being done. Individuals may have multiple admissions per year. -* - size - - 32 GB -* - rce_location - - `~/shared_space/ci3_mdaneshyazdi/ Merged_Data/data/national_exp_0621.fst` -* - fasse_location - - `national_exp_0621` -* - files - - -``` -``` - └── national_exp_0621.fst -``` -````` - - -### Aggregated 2010-2016 Medicare Mortality Data with PM2.5 Exposure and ZIP code level variables -`````{dropdown} **aggregate_medicare_data_2010to2016** -```{list-table} -:header-rows: 0 -* - description - - aggregate_medicare_data_2010to2016.fst only contains data for year 2011, pm2.5 level in 2010 and 2011 and the mortality in the following 5 years. That is, the dataset contains enrollees of year 2011 and information of 2010 exposures and the outcome is `dead in the following 5 years`. The data is aggregated at the ZIP code level. -* - dataset_author - - Falco J. Bargagli-Stoffi, Riccardo Cadei -* - date_created - - 2020 -* - data_source - - Medicaid, Exposure Data, Census Data -* - spatial_coverage - - US -* - spatial_resolution - - zipcode -* - temporal_coverage - - 2011 -* - temporal_resolution - - Annually -* - publication - - Causal Rule Ensemble: Interpretable Discovery and Inference of Heterogeneous Treatment Effects https://arxiv.org/abs/2009.09036 -* - rce_location - - `~shared_space/ci3_analysis/causal_rule_ensemble /aggregate_medicare_data_2010to2016.fst` -* - fasse_location - - `aggregate_medicare_data_2010to2016` -* - files - - -``` -``` - └── aggregate_medicare_data_2010to2016.fst -``` -````` - -### Nationwide Medicare Strata - -`````{dropdown} **erc_strata** -```{list-table} -:header-rows: 0 -* - dataset_author - - Kevin Josey -* - date_created - - Aug 5 2022 -* - data_source - - Medicare File from Xiao et al.'s Science Advances paper (see `denom_by_year`) -* - spatial_coverage - - contiguous US -* - spatial_resolution - - zipcode -* - temporal_coverage - - 2000-2016 -* - temporal_resolution - - annual -* - description - - Data were divided and aggregated into custom strata, then subsetted depending on several individual factors. I further merged these data tables with neighborhood level covariates. -* - rce_location - - `~/shared_space/ci3_analysis/josey_erc_strata/Data` -* - fasse_location - - `erc_strata` -* - git_repository - - https://github.com/kevjosey/erc-strata -* - size - - 6.9 GB -* - files - - -``` -``` -├── aggregate_data_qd.RData -├── aggregate_data_rm.RData -├── national_merged2016_qd.RData -├── national_merged2016_rm.RData -├── qd -│ ├── 0_all_qd.RData -│ ├── 0_asian_qd.RData -│ ├── 0_black_qd.RData -│ ├── 0_hispanic_qd.RData -│ ├── 0_white_qd.RData -│ ├── 1_all_qd.RData -│ ├── 1_asian_qd.RData -│ ├── 1_black_qd.RData -│ ├── 1_hispanic_qd.RData -│ ├── 1_white_qd.RData -│ ├── 2_all_qd.RData -│ ├── 2_asian_qd.RData -│ ├── 2_black_qd.RData -│ ├── 2_hispanic_qd.RData -│ └── 2_white_qd.RData -└── rm - ├── 0_all_rm.RData - ├── 0_asian_rm.RData - ├── 0_black_rm.RData - ├── 0_hispanic_rm.RData - ├── 0_white_rm.RData - ├── 1_all_rm.RData - ├── 1_asian_rm.RData - ├── 1_black_rm.RData - ├── 1_hispanic_rm.RData - ├── 1_white_rm.RData - ├── 2_all_rm.RData - ├── 2_asian_rm.RData - ├── 2_black_rm.RData - ├── 2_hispanic_rm.RData - └── 2_white_rm.RData -``` -````` - -### CVD Medicaid - -`````{dropdown} **cvd_medicaid** -```{list-table} -:header-rows: 0 -* - dataset_author - - Ben Sabath -* - date_created - - January 28, 2020 -* - data_source - - Medicaid -* - spatial_coverage - - US (continental) -* - spatial_resolution - - zipcode -* - temporal_coverage - - 2002-2012 -* - temporal_resolution - - daily -* - size - - 86 GB -* - git_repository - - [dec2019_medicaid_platform_cvd](https://github.com/NSAPH/data_requests/tree/master/request_projects/dec2019_medicaid_platform_cvd) -* - rce_location - - `~/shared_space/ci3_health_data /medicaid/cvd/2010_2011/desouza-2` -* - fasse_location - - `cvd_medicaid` -* - publication - - https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7896354/ -* - files - - -``` -``` -├── [2.3G] cvd.csv -├── [2.1G] cvd.sas7bdat -├── [6.3K] CVD-specific data dictionary-07-12-2018.docx -├── [6.8K] data_dictionary.md -├── [ 70G] merged_cvd_data.csv -├── [ 18K] merge.out -│ ├── [5.7K] log.txt -│ ├── [ 77] r_error.0 -│ └── [ 12K] r_out.0 -├── [1.7K] merge.R -├── [ 906] readme -└── [ 899] r.submit -``` -````` - - -### Aggregated CVD cohort Medicare - -`````{dropdown} **aggregated_cvd_cohort_medicare** -```{list-table} -:header-rows: 0 -* - dataset_author - - Jochem Klompmaker -* - date_created - - April 2022 -* - data_source - - MedPar (admissions), MBSF (denominator) -* - spatial_coverage - - US -* - spatial_resolution - - zipcode -* - temporal_coverage - - 2000-2016 -* - temporal_resolution - - annually -* - description - - Denominator file linked with hospitalization data and merged with confounders and exposures (NO2, PM2.5, ozone, temperature, humidity). Person records were aggregated by zip code, year and individual demographics -* - rce_location - - `~/shared_space/ci3_health_data/medicare/gen_admission /1999_2016/Klompmaker/merged_data/cvd2/` -* - fasse_location - - `aggregated_cvd_cohort_medicare` -* - size - - 38 GB -* - files - - -``` -``` -├── [4.2G] aggregate_CVD_65yrs.fst -├── [3.8G] aggregate_CVD_75yrs.fst -├── [3.1G] aggregate_CVD_85yrs.fst -├── [6.5G] aggregate_CVD.fst -├── [4.8G] aggregate_death_CVD.fst -├── [4.6G] aggregate__excl_1yrhosp_CVD.fst -├── [4.3G] aggregate_excl_1yrhosp_RES.fst -├── [1.2M] cc_zipyear_all.fst -├── [1.2M] cc_zipyear_confounder.fst -├── [941K] cc_zipyear_cvd.fst -├── [347M] CVD_count.fst -├── [354M] CVD_death_count.fst -├── [439M] time_count.fst -└── [439M] time_death_count.fst -``` -````` - -### Aggregated CHD cohort Medicare - -`````{dropdown} **aggregated_chd_cohort_medicare** -```{list-table} -:header-rows: 0 -* - dataset_author - - Jochem Klompmaker -* - date_created - - April 2022 -* - data_source - - MedPar (admissions), MBSF (denominator) -* - spatial_coverage - - US -* - spatial_resolution - - zipcode -* - temporal_coverage - - 2000-2016 -* - temporal_resolution - - annually -* - description - - Denominator file linked with hospitalization data and merged with confounders and exposures (NO2, PM2.5, ozone, temperature, humidity). Person records were aggregated by zip code, year and individual demographics -* - rce_location - - `~/shared_space/ci3_health_data/medicare /gen_admission /1999_2016/Klompmaker/merged_data/chd2/` -* - fasse_location - - `aggregated_chd_cohort_medicare` -* - size - - 35 GB -* - files - - -``` -``` -├── [4.1G] aggregate_CHD_65yrs.fst -├── [3.8G] aggregate_CHD_75yrs.fst -├── [3.2G] aggregate_CHD_85yrs.fst -├── [ 14G] aggregate_CHD.fst -├── [4.3G] aggregate_excl_1yrhosp_CHD.fst -├── [1.2M] cc_zipyear_chd.fst -├── [ 92M] CHD_count.fst -└── [116M] time_count.fst -``` -````` - -### Aggregated CBV cohort Medicare - -`````{dropdown} **aggregated_cbv_cohort_medicare** -```{list-table} -:header-rows: 0 -* - dataset_author - - Jochem Klompmaker -* - date_created - - April 2022 -* - data_source - - MedPar (admissions), MBSF (denominator) -* - spatial_coverage - - US -* - spatial_resolution - - zipcode -* - temporal_coverage - - 2000-2016 -* - temporal_resolution - - annually -* - description - - Denominator file linked with hospitalization data and merged with confounders and exposures (NO2, PM2.5, ozone, temperature, humidity). Person records were aggregated by zip code, year and individual demographics -* - rce_location - - `~/shared_space/ci3_health_data/medicare/gen_admission /1999_2016/Klompmaker/merged_data/cbv2/` -* - fasse_location - - `aggregated_cbv_cohort_medicare` -* - size - - 35 GB -* - files - - -``` -``` -├── [4.1G] aggregate_CBV_65yrs.fst -├── [3.8G] aggregate_CBV_75yrs.fst -├── [3.2G] aggregate_CBV_85yrs.fst -├── [ 14G] aggregate_CBV.fst -├── [4.4G] aggregate__excl_1yrhosp_CBV.fst -├── [ 93M] CBV_count.fst -├── [1.2M] cc_zipyear_cbv.fst -└── [117M] time_count.fst -``` -````` - - -### Aggregated ADRD cohort Medicare - -`````{dropdown} **aggregated_adrd_cohort_medicare** -```{list-table} -:header-rows: 0 -* - dataset_author - - Jochem Klompmaker -* - date_created - - February 2022 -* - data_source - - MedPar (admissions), MBSF (denominator) -* - spatial_coverage - - US -* - spatial_resolution - - zipcode -* - temporal_coverage - - 2000-2016 -* - temporal_resolution - - annually -* - description - - Denominator file linked with hospitalization data and merged with confounders and exposures (NDVI, blue space, park cover, NO2, PM2.5, ozone, temperature, humidity). Person records were aggregated by zip code, year and individual demographics -* - rce_location - - `~/shared_space/ci3_health_data/medicare/gen_admission /1999_2016/Klompmaker/merged_data/alz2/` -* - fasse_location - - `aggregated_adrd_cohort_medicare` -* - size - - 28 GB -* - files - - -``` -``` -├── [3.6G] aggregate_ALZ_65yrs.fst -├── [3.4G] aggregate_ALZ_75yrs.fst -├── [2.8G] aggregate_ALZ_85yrs.fst -├── [4.5G] aggregate_ALZ.fst -├── [4.4G] aggregate_death_ALZ.fst -├── [3.8G] aggregate_excl_1yrhosp_ALZ.fst -├── [358M] ALZ_count.fst -├── [387M] ALZ_death_count.fst -├── [471M] time_count.fst -└── [472M] time_death_count.fst -``` -````` - -### Aggregated PD cohort Medicare - - -`````{dropdown} **aggregated_pd_cohort_medicare** -```{list-table} -:header-rows: 0 -* - dataset_author - - Jochem Klompmaker -* - date_created - - February 2022 -* - data_source - - MedPar (admissions), MBSF (denominator) -* - spatial_coverage - - US -* - spatial_resolution - - zipcode -* - temporal_coverage - - 2000-2016 -* - temporal_resolution - - annually -* - description - - Denominator file linked with hospitalization data and merged with confounders and exposures (NDVI, blue space, park cover, NO2, PM2.5, ozone, temperature, humidity). Person records were aggregated by zip code, year and individual demographics. -* - rce_location - - `~/shared_space/ci3_health_data/medicare /gen_admission/1999_2016 /Klompmaker/merged_data/par2/` -* - fasse_location - - `aggregated_pd_cohort_medicare` -* - size - - 29 GB -* - files - - -``` -``` -├── [4.4G] aggregate_death_PAR.fst -├── [4.4G] aggregate_excl_1yrhosp_PAR.fst -├── [3.7G] aggregate_PAR_65yrs.fst -├── [3.4G] aggregate_PAR_75yrs.fst -├── [2.9G] aggregate_PAR_85yrs.fst -├── [4.6G] aggregate_PAR.fst -├── [ 94M] PAR_count.fst -├── [405M] PAR_death_count.fst -├── [222M] time_count.fst -└── [486M] time_death_count.fst -``` -````` - -### Daily County Level Heatwave Associated Hospitalizations - -`````{dropdown} **daily_county_level_heatwave_assosciated_hospitalizations** -```{list-table} -:header-rows: 0 -* - dataset_author - - Ben Sabath -* - date_created - - July 10, 2020 -* - size - - 7.7 GB -* - data_source - - MedPar (admissions), MBSF (denominator), Medicaid MAX -* - spatial_coverage - - US -* - spatial_resolution - - county -* - temporal_coverage - - 2006-2016, 1999-2016 -* - temporal_resolution - - daily -* - description - - FIPS code, race, sex, age, and dual eligibility were determined for each case based on the information in the patient summary file for that individual in the year of their admission. The denominator for each observation is calculated monthly and contains all individuals who are eligible for Fee for Service (FFS) hospitalization coverage and have not died prior to that month. The CCS codes included were 2, 50, 55, 114, 157, 159, and 244. ICD processing done using the ICD package(Wasey 2018). The author of this package asks that it be cited in papers using data that was created using the package. -* - rce_location - - `~/shared_space/ci3_health_data/medicare/heat_related` -* - fasse_location - - `daily_county_level_heatwave_assosciated_hospitalizations` -* - publication - - https://arxiv.org/abs/2102.10478 -* - git_repository - - [https://github.com/wxwx1993/TS_Stochastic](https://github.com/wxwx1993/TS_Stochastic) -* - files - - -``` -``` -├── 1999_2016 -│ └── county_ccs_hosps -│ ├── cache_dir -│ │ ├── daily_counts -│ │ │ ├── daily_counts_by_ccs_1999.fst -│ │ │ ├── ... -│ │ │ └── daily_counts_by_ccs_2016.fst -│ │ └── denom -│ │ ├── ffs_patient_summary_by_county_1999.fst -│ │ ├── ... -│ │ └── ffs_patient_summary_by_county_2016.fst -│ ├── data -│ │ ├── daily_ccs_heatwave_counts_by_fips_1999.fst -│ │ ├── ... -│ │ └── daily_ccs_heatwave_counts_by_fips_2016.fst -│ └── data_daily_hosp_mort -│ ├── daily_only_ccs_heatwave_hosp_mort_counts_by_fips_1999.fst -│ ├── ... -│ └── daily_only_ccs_heatwave_hosp_mort_counts_by_fips_2016.fst -└── 2006_2016 - └── county_ccs_hosps - ├── cache_dir - │ ├── daily_counts - │ │ ├── daily_counts_by_ccs_2006.fst - │ │ ├── ... - │ │ └── daily_counts_by_ccs_2016.fst - │ └── denom - │ ├── ffs_patient_summary_by_county_2006.fst - │ ├── ... - │ └── ffs_patient_summary_by_county_2016.fst - ├── data - │ ├── daily_ccs_heatwave_counts_by_fips_2006.fst - │ ├── ... - │ ├── daily_ccs_heatwave_counts_by_fips_2016.fst - │ ├── Daily_Heat_CCS_2006-2016_with_Temperature_by_WFO.Rda - │ ├── Daily_Heat_CCS_2006-2016_with_Temperature_by_WFO_v0.Rda - │ ├── Daily_Heat_CCS_2006-2016_with_Temperature_ERA5Land.Rda - │ ├── Daily_Heat_CCS_2006-2016_with_Temperature.Rda - │ └── Daily_Heat_CCS_2006-2016_with_Temperature_v0.Rda - ├── readme.md - └── schema.yml -``` -````` - - -### Hospitalizations for kidney disease and comorbidities - -`````{dropdown} **medicare_for_kidney_diseases** -```{list-table} -:header-rows: 0 -* - dataset_author - - Ana Trisovic -* - date_created - - July 10, 2022 -* - data_source - - MedPar (admissions), MBSF (denominator), confounders -* - spatial_coverage - - US -* - spatial_resolution - - zipcode -* - temporal_coverage - - 2000-2016 -* - temporal_resolution - - annually -* - git_repository - - [mbsf-medpar-aki-first-hospitalization](https://github.com/NSAPH-Data-Processing/mbsf-medpar-aki-first-hospitalization) -* - description - - Special modifications for the kidney diseases for numerators and denominators (people at risk) for the analysis by Whanhee Lee. -* - rce_location - - `~/shared_space/ci3_analysis/whanhee_revisions` -* - fasse_location - - `medicare_for_kidney_diseases` -* - size - - 31 GB -* - header - - `year, sex, race, zip, dual, follow_up, entry_age_group, aki_primary_secondary_first_hosp, aki_primary_secondary_first_hosp_denom, ckdhosp_prior_aki, ckdhosp_prior_aki_denom, diabeteshosp_prior_aki, diabeteshosp_prior_aki_denom, diabetes_primary_aki_secondary_first_hosp, diabetes_primary_aki_secondary_first_hosp_denom, csd_primary_aki_secondary_first_hosp, csd_primary_aki_secondary_first_hosp_denom, ihd_primary_aki_secondary_first_hosp, ihd_primary_aki_secondary_first_hosp_denom, pneumonia_primary_aki_secondary_first_hosp, pneumonia_primary_aki_secondary_first_hosp_denom, hf_primary_aki_secondary_first_hosp, hf_primary_aki_secondary_first_hosp_denom, ami_primary_aki_secondary_first_hosp, ami_primary_aki_secondary_first_hosp_denom, cerd_primary_aki_secondary_first_hosp, cerd_primary_aki_secondary_first_hosp_denom, uti_primary_aki_secondary_first_hosp, uti_primary_aki_secondary_first_hosp_denom, zcta, poverty, popdensity, medianhousevalue, pct_blk, medhouseholdincome, pct_owner_occ, hispanic, education, population, pct_asian, pct_native, pct_white, smoke_rate, mean_bmi, pm25.current_year, ozone.current_year, no2.current_year, ozone_summer.current_year, pm25.one_year_lag, ozone.one_year_lag, no2.one_year_lag, ozone_summer.one_year_lag` -* - files - - -``` -``` -└── [ 27G] final.csv -``` -````` - -### IHD medicare hospitalizations (2005) - -`````{dropdown} **ihd_medicare_hosp_2005** -```{list-table} -:header-rows: 0 -* - dataset_name - - IHD medicare hospitalizations (2005) -* - dataset_author - - Cory Zigler -* - date_created - - Oct 4 2018 -* - data_source - - MedPar (admissions) -* - spatial_coverage - - US -* - spatial_resolution - - zipcode -* - temporal_coverage - - 2005 -* - temporal_resolution - - annually -* - size - - 234 MB -* - rce_location - - `~/shared_space/ci3_analysis/ zigler_lab/projects/ BipartiteInterference_GPS/ BipartiteInterference_GPS/ Data` -* - fasse_location - - `ihd_medicare_hosp_2005` -* - files - - -``` -``` -├── [4.8K] 00Tree.html -├── [348K] AnnualFacilityData.Rda -├── [773K] AnnualUnitData.Rda -├── [ 12K] Create Analysis Data.R -├── [6.4K] Create HyADS Adjacency Matrix.R -├── [9.3K] Create Power Plant Data.R -├── [5.8K] Create Zip Code Data.R -├── [ 10M] data_nomed.Rda -├── [ 31K] facilities_for_analysis.Rda -├── [ 53M] HyADSmat.Rda -├── [108M] HyADSmat_replaced20191212.Rda -├── [3.1M] MonthlyFacilityData.Rda -├── [9.7M] MonthlyUnitData.Rda -├── [ 11M] out.zip_pp.rda -├── [ 114] Readme -├── [5.6M] ZipcodeData.Rda -└── [ 89K] zips_included.rda -``` -````` - -### Daily Florida Hospitalization Counts by Zip - -`````{dropdown} **daily-florida-hosp-counts-zip** -```{list-table} -:header-rows: 0 -* - dataset_author - - Ben Sabath, Kate Burrows -* - date_created - - February 07 2020 -* - data_source - - MedPar (admissions), MBSF (denominator) -* - spatial_coverage - - Florida -* - spatial_resolution - - zipcode -* - temporal_coverage - - 1999-2016 -* - temporal_resolution - - daily -* - processing_description - - Denominator file linked with hospitalization data. This is the raw unprocessed data. -* - size - - 2.1 GB -* - rce_location - - `~/shared_space/ci3_health_data /medicare/gen_admission /1999_2016/burrows/cache_data` -* - fasse_location - - `daily-florida-hosp-counts-zip` -* - files - - -``` -``` -├── [308K] Burrows_DataRequest_September2019.pdf -├── [ 19M] death_count -│ ├── [1.0M] death_count_1999.fst -│ ├── [1.0M] ... -│ └── [1.2M] death_count_2016.fst -├── [104M] hosp_count -│ ├── [5.5M] hosp_count_1999.fst -│ ├── [5.6M] ... -│ └── [5.2M] hosp_count_2016.fst -├── [1.6G] merged_data -│ ├── [ 86M] daily_zips_1999.fst -│ ├── [106M] ... -│ └── [106M] daily_zips_2016.fst -└── [7.2M] zip_denom - ├── [382K] zip_denom_1999.fst - ├── [440K] ... - └── [450K] zip_denom_2016.fst -``` -````` - -### Coal PM2.5 Source Impacts - -`````{dropdown} **coal_exposure_pm25** -```{list-table} -:header-rows: 0 -* - dataset_author - - Lucas Henneman -* - date_created - - Sep 14, 2022 -* - data_source - - HyADS exposure modeling -* - spatial_coverage - - US -* - spatial_resolution - - zipcode -* - temporal_coverage - - 1999-2020 -* - temporal_resolution - - annually -* - rce_location - - `/nfs/home/H/henneman /shared_space/ci3_nsaph/ LucasH/disperseR/ main/output/ zips_model.lm.cv_single_poly` -* - fasse_location - - `coal_exposure_pm25` -* - GitHub repository/directory on how the data was processed - - https://github.com/lhenneman/coal_unit_PM25 -* - exposures - - This was created with the HyADS model using emissions from EPA's CAMD database. -* - meterological - - NOAA/NCAR reanalysis data. -* - size - - 6.3 GB -* - files - - -``` -``` -├── [300M] zips_pm25_byunit_1999.fst -├── [291M] ... -├── [134M] zips_pm25_byunit_2020.fst -├── [599K] zips_pm25_total_1999.fst -├── [599K] ... -└── [599K] zips_pm25_total_2020.fst -``` -````` - -### Aggregated 2000-2016 Medicare Mortality Data with PM2.5 Exposure by ZIP code - -`````{dropdown} **aggregated_2000-2016_medicare_mortality_pm25_zip** -```{list-table} -:header-rows: 0 -* - dataset_author - - Xiao Wu, Ben Sabath -* - date_created - - 2020 -* - data_source - - Medicaid, Exposure Data, Census Data -* - spatial_coverage - - US -* - spatial_resolution - - zipcode -* - temporal_coverage - - 2000-2016 -* - temporal_resolution - - Annually -* - processing_description - - See [Xiao’s paper](https://www.science.org/doi/10.1126/sciadv.aba5692) for processing description. -* - rce_location - - `~/shared_space/ci3_mic6949/ input_data/aggregate_data.RDS` -* - fasse_location - - `aggregated_2000-2016_medicare_mortality_pm25_zip` -* - publication - - [Xiao’s paper](https://www.science.org/doi/10.1126/sciadv.aba5692) -* - git_repository - - [National_Causal](https://github.com/wxwx1993/National_Causal) -* - size - - 166 MB -* - files - - -``` -``` -└── [166M] aggregate_data.RDS -``` -````` - - -````{warning} -The space of FASSE is limited, so do not copy analytic data to your own folder! Create symlinks to the data in your `data` folder. -Symbolic links (or symlinks) are special files that point to files or directories in other locations on your system. -You will be able to use data with symlinks as normal. - -Create the symlink in your `data` folder in the following way: -``` -cd data -ln -s /n/dominici_nsaph_l3/Lab/projects/analytic/fasse_location . -``` -```` - -```{note} -You need data that is not here, but exists on RCE? -If so, fill in the form [here](https://gist.github.com/atrisovic/93d379dd84e31f0d63b965de8d529777) to get it transfered to FASSE. -``` - -## Data questions - -1. What data sources (MedPar, MBSF, other) were used to create this data file? How many different data sources went into it? -2. What, if any, processing was done to the data sources? Were there any selections (cuts) done, data quality checks and aggregations? -3. Was this data used in any publication (add a link)? -4. Is there any git repository (or subfolder) related to it? (add git location)? -5. What is the RCE source location? -6. When was the data created and by who? -7. What is the spatial, temporal resolution? - diff --git a/handbook/bashrc.md b/handbook/bashrc.md index 88441b6..2532195 100644 --- a/handbook/bashrc.md +++ b/handbook/bashrc.md @@ -1,9 +1,9 @@ # .bashrc -The `.bashrc` file is a script executed whenever a new terminal session starts in an interactive, non-login shell. It's used to configure the environment, define aliases, set environment variables, and customize the command prompt. This is important, especially when working on data analysis to ensure reproducibility. Even more so when working in CANNON and FASSE, as certain configurations must be made to allow you to use common tools like Python, Git, and VS Code. +The `.bashrc` file is a script executed whenever a new terminal session starts in an interactive, non-login shell. It's used to configure the environment, define aliases, set environment variables, and customize the command prompt. This is important, especially when working on data analysis to ensure reproducibility. Even more so when working in CANNON, as certain configurations must be made to allow you to use common tools like Python, Git, and VS Code. -## Setting up a `.bashrc` file in CANNON/FASSE: -1. Log into CANNON/FASSE and open Terminal +## Setting up a `.bashrc` file in CANNON: +1. Log into CANNON and open Terminal 2. Navigate to your home directory. There should already by a `.bashrc` file in this directory. 3. Open and edit it with at least the following configurations to allow you to use Python, Git, and VS Code. diff --git a/handbook/cannon.md b/handbook/cannon.md index 9010dff..44b9277 100644 --- a/handbook/cannon.md +++ b/handbook/cannon.md @@ -7,7 +7,7 @@ The following are instructions for logging in to CANNON and setting up your own 1. Get a FASRC account by requesting it [here](https://docs.rc.fas.harvard.edu/kb/get-a-fasse-account-and-project-group/). 2. Navigate to the [Add Grants page](https://portal.rc.fas.harvard.edu/request/grants/add) in portal, you will need to login with your FASRC account 3. Expand the plus sign next to “Other” -4. Find the project group you want to be added to: `dominici_lab` (`dominici_nsaph` is for FASSE). +4. Find the project group you want to be added to; it could be `dominici_lab`, `dominici_nsaph`, or even `access to dominici_nsaph- Protected data: dat20-0613,dat21-0471,dua19-1403 (Approvers: Danielle Braun)`. Let us know if you don't see any entry that includes `dominici` or `Braun`! 5. Select the checkbox for the project group you want to be added to Your PI will have to approve the addition. Once you’re notified of the approval, it can take up to an hour for your permissions to be configured. If you’re not able to access the VPN or your home directory, try waiting an hour and logging in again. @@ -35,7 +35,7 @@ align: center ``` ```{warning} -Cannon is the Faculty of Arts and Sciences research computing cluster for users with Data Security Level 2 data. If you have Data Security Level 3 data, you must use the FAS Secure Environment (FASSE) cluster. +Cannon is the Faculty of Arts and Sciences research computing cluster for users with Data Security Level 2 data. ``` ## Step 2. Access CANNON diff --git a/handbook/data.md b/handbook/data.md deleted file mode 100644 index 547f459..0000000 --- a/handbook/data.md +++ /dev/null @@ -1,522 +0,0 @@ -# Data sources - -On this page you can find a description of common data sources and their location on the cluster. - -## Health data - -The following contains the description of the original/raw CMS data. - -`````{dropdown} MBSF and MedPar - -```{list-table} -:header-rows: 0 - -* - data_source - - [MBSF](https://resdac.org/cms-data/files/mbsf-base) and [MedPar](https://resdac.org/cms-data/files/medpar) -* - description - - MedPar includes hospitalizations for FFS individuals (1999-2018). MBSF or the enrollment file and also has mortality for everyone (1999-2018). -* - fasse_location - - Append `/n/dominici_nsaph_l3/Lab/data/` to the beginning of the paths: `ci3_d_medicare/original_data/cms_medicare/data` -* - size - - 733 GB -* - files - - -``` -````{dropdown} Medicare data folder tree -``` -├── 1999 -│ ├── denominator -│ └── inpatient -├── 2000 -│ ├── denominator -│ └── inpatient -├── 2001 -│ ├── denominator -│ └── inpatient -├── 2002 -│ ├── denominator -│ └── inpatient -├── 2003 -│ ├── denominator -│ └── inpatient -├── 2004 -│ ├── denominator -│ └── inpatient -├── 2005 -│ ├── denominator -│ └── inpatient -├── 2006 -│ ├── denominator -│ └── inpatient -├── 2007 -│ ├── denominator -│ └── inpatient -├── 2008 -│ ├── denominator -│ └── inpatient -├── 2009 -│ ├── denominator -│ └── inpatient -├── 2010 -│ ├── denominator -│ └── inpatient -├── 4334 -│ ├── 2011 -│ ├── 2012 -│ ├── 2015 -│ └── Extract File Documentation -├── 4580 -│ ├── 2013 -├── 5819 -│ ├── 2014 -│ └── Extract File Documentation -├── 7087 -│ ├── 2015 -│ └── Extract File Documentation -├── 8183 -│ ├── 2016 -│ └── Extract File Documentation -├── 10411 -│ └── 2017 -├── 2018 -│ └── extract_file_documentation -├── Medicare Claims -├── Medicare Enrollment -└── Xwalk -``` -```` -````` - -`````{dropdown} MCBS - -```{list-table} -:header-rows: 0 - -* - data_source - - [MCBS](https://www.cms.gov/Research-Statistics-Data-and-Systems/Research/MCBS) -* - description - - Survey for sample of all Medicare or just FFS (1999-2004, 2007-2013, 2015-2017). Check out NSAPH MCBS documentation [here](./mcbs.md). -* - fasse_location - - `/n/dominici_nsaph_l3/data/mcbs/` -* - size - - placeholder -``` -````` - -``` {seealso} -Check out the following resources about the CMS health data: -- [RESDAC Using Medicare Hospitalization Information and the MedPAR](http://resdac.umn.edu/sites/resdac.umn.edu/files/Using%20Medicare%20Hospitalization%20Information%20and%20the%20MedPAR%20(Slides).pdf) -- [Coverage Denials: Government And Private Insurer Policies For Medical Necessity In Medicare](https://www.healthaffairs.org/doi/pdf/10.1377/hlthaff.2021.01054) -- [RESDAC Online learning](https://resdac.org/online-learning) -- [RESDAC Learning and workshops](https://resdac.org/learn) -``` - -## Exposure data - -The following is the description of the air pollution exposure data. - -### ZIP code-level PM2.5, PM2.5 components, Ozone, and NO2 in the contiguous US - -`````{dropdown} PM2.5, Ozone, NO2 -```{list-table} -:header-rows: 0 -* - dataset_author - - Yaguang Wei -* - date_created - - Oct 19, 2022 -* - data_source - - Gridded PM2.5, PM2.5 components, ozone, and NO2; Esri ZIP code area and point files; U.S. ZIP code database. -* - spatial_coverage - - US -* - spatial_resolution - - zipcode -* - temporal_coverage - - 2000-2016 for PM2.5, ozone, and NO2; 2000-2019 for PM2.5 components. -* - temporal_resolution - - daily, annually -* - description - - For general ZIP Codes with a polygon representation, we estimated their pollution levels by averaging the predictions of grid cells whose centroids lie inside the polygon of that ZIP Code; For other ZIP Codes such as Post Offices or large volume single customers, we treated them as a single point and predicted their pollution levels by assigning the predictions of the nearest grid cell. These are updated ZIP code-level predictions. We filled in the missing values for grids, and added about 200 zip codes that are missing in the Esri files each year. The geographic information for the additional zip codes is extracted from US ZIP code database. **Version 2 update:** The v2 files (`exposure/ozone/O3_v2`, `exposure/pm25/PM25_v2`, `exposure/no2/NO2_v2`): (1) exclude zip codes that are outside the contiguous US; (2) a `state` column is added to each file, so we know which zip code belongs to which state. No exposure values were changed. This version (v2) is available on [NASA SEDAC]((https://sedac.ciesin.columbia.edu/data/set/aqdh-pm2-5-o3-no2-concentrations-zipcode-contiguous-us-2000-2016)). -* - git_repository - - [ZIP_add_missing](https://github.com/NSAPH-Data-Processing/ZIP_add_missing) and private [ZIP_add_missing](https://github.com/yycome/ZIP_add_missing) -* - publication - - The data are officially published through NASA SEDAC at [sedac.ciesin.columbia.edu](https://sedac.ciesin.columbia.edu/data/set/aqdh-pm2-5-o3-no2-concentrations-zipcode-contiguous-us-2000-2016). -* - fasse_location - - Add `/n/dominici_nsaph_l3/Lab/data/` to the beginning of the paths: `exposure/ozone/O3_v1`, `exposure/ozone/O3_v2`, `exposure/pm25/PM25_v1`, `exposure/pm25/PM25_v2`, `exposure/no2/NO2_v1`, `exposure/no2/NO2_v2`, `exposure/pm25_components/pm25_components_v2` -* - files - - -``` -``` -├── [2.3G] NO2 -│ ├── [6.5M] Annual -│ │ ├── [394K] 2000.rds -│ │ ├── [391K] ... -│ │ └── [391K] 2016.rds -│ └── [2.3G] Daily -│ ├── [393K] 20000101.rds -│ ├── [392K] ... -│ └── [395K] 20161231.rds -├── [2.3G] O3 -│ ├── [6.4M] Annual -│ │ ├── [390K] 2000.rds -│ │ ├── [385K] ... -│ │ └── [384K] 2016.rds -│ ├── [2.3G] Daily -│ │ ├── [394K] 20000101.rds -│ │ ├── [388K] ... -│ │ └── [371K] 20161231.rds -│ └── [6.5M] Summer -│ ├── [391K] 2000_summer.rds -│ ├── [387K] ... -│ ├── [386K] 2016_summer.rds -│ └── [ 101] readme.txt -├── [2.3G] PM25 -│ ├── [6.5M] Annual -│ │ ├── [395K] 2000.rds -│ │ ├── [392K] ... -│ │ └── [392K] 2016.rds -│ └── [2.3G] Daily -│ ├── [397K] 20000101.rds -│ ├── [395K] ... -│ └── [393K] 20161231.rds -├── [ 88M] PM25_components -│ ├── [4.4M] 2000.rds -│ ├── [4.4M] ... -│ ├── [4.4M] 2019.rds -│ └── [ 850] readme.txt -└── [ 974] README.md -``` -````` - -### PM2.5 Components - Obsolete - -`````{dropdown} PM2.5 component data -```{list-table} -:header-rows: 0 - -* - spatial_coverage - - US -* - spatial_resolution - - zipcode -* - temporal_coverage - - 2000-2019 -* - temporal_resolution - - annually -* - size - - 251 MG -* - processing_description - - Superseded by `/n/dominici_nsaph_l3/Lab/data/exposure/ pm25_components/pm25_components_v2`. These are annual estimations of PM2.5 speciations at ZIP Code-level across the contiguous US, aggregated from Heresh's grid-level estimations. For a general ZIP Code, which has normal street delivery route and therefore can be represented by a polygonal area, we estimate the ZIP Code-level PM2.5 by averaging the predictions of grid cells whose centroids lie inside the polygon of that ZIP Code; for other ZIP Codes that do not have polygon representations, for example an apartment building, a military base, or a post office, we consider them as single points and estimate their ZIP Code-level PM2.5 by linking the prediction of the nearest grid cell. For ec, oc, nh4, no3, and so4 the units are microgram per cubic meter; for br, ca, cu, fe, k, ni, pb, si, v, and z the units are nanogram per cubic meter. -* - fasse_location - - Append `/n/dominici_nsaph_l3/Lab/data/` to the beginning of the paths: `exposure/pm25_components/pm25_components_v1` -* - git_repository - - https://github.com/yycome/PM25_Components -* - publication - - Amini, H., M. Danesh-Yazdi, Q. Di, W. Requia, Y. Wei, Y. Abu Awad, L. Shi, M. Franklin, C.-M. Kang, J. M. Wolfson, P. James, R. Habre, Q. Zhu, J. S. Apte, Z. J. Andersen, X. Xing, C. Hultquist, I. Kloog, F. Dominici, P. Koutrakis, J. Schwartz. 2022. Annual Mean PM2.5Components (EC, NH4, NO3, OC, SO4) 50m Urban and 1km Non-Urban Area Grids for Contiguous U.S., 2000-2019 v1. (Preliminary Release). Palisades, NY: NASA Socioeconomic Data and Applications Center (SEDAC). https://doi.org/10.7927/7wj3-en73 -* - dataset_author - - Yaguang Wei -* - header - - `ZIP, br, ca, cu, ec, fe, k, nh4, ni, no3, oc, pb, si, so4, v, z` -* - files - - -``` -``` - ├── 2000.csv - ├── ... - └── 2019.csv -``` -````` - -### Predicted daily smoke PM2.5 over the Contiguous US, 2006 - 2020 - -`````{dropdown} Predicted daily smoke PM2.5 -```{list-table} -:header-rows: 0 -* - dataset_author - - Marissa Childs -* - date_created - - October 24, 2020 -* - data_source - - other (exposure predictions) -* - spatial_coverage - - Contiguous US -* - spatial_resolution - - originally 10 km (gridded), aggregated to zcta, census tract, and county by area and population-weighted averages -* - temporal_coverage - - 2006 - 2020 -* - temporal_resolution - - daily -* - exposures - - PM2.5 from smoke -* - processing_description - - none -* - fasse_location - - Append `/n/dominici_nsaph_l3/Lab/data/` to the beginning of the paths: `exposure/predicted_daily_smoke_pm25` -* - publication - - https://doi.org/10.1021/acs.est.2c02934 -* - git_repository - - [daily-10km-smokePM](https://github.com/echolab-stanford/daily-10km-smokePM) -* - size - - 6 GB -* - files - - -``` -``` -├── 10km_grid -│   ├── 10km_grid_wgs84 -├── county -│   └── tl_2019_us_county -├── tract -│   └── tracts -│   ├── tl_2019_01_tract -│   ├── tl_2019_04_tract -│   ├── ... -│   └── tl_2019_56_tract -└── zcta - └── tl_2019_us_zcta510 -``` -````` - - -### Space weather data - -`````{dropdown} Space weather data -```{list-table} -:header-rows: 0 -* - dataset_author - - Carolina L Zilli Vieira -* - date_created - - Oct 17 2022 -* - data_source - - [NASA](applewebdata://2B7CDFAB-4F4C-4BF8-9222-201B5E9C2E1B/NASA) - solar and geomagnetic activity data from https://omniweb.gsfc.nasa.gov/html/omni_source.html, DAAC NASA (solar radiation) from https://daac.ornl.gov/, BARTOL Neutron Station (neutrons) from https://neutronm.bartol.udel.edu/ -* - spatial_coverage - - Global UTC (from raw data) converted to local time. -* - spatial_resolution - - county -* - temporal_coverage - - 1996-2022 -* - temporal_resolution - - daily -* - processing_description - - We processed the data in UTC to US time zone data. From this source, it is not possible to have spatial data. To do so, we converted UTC global data to US local time data. Then we used these local time zone data to identify county. The numbers change a little by location based in the time zone. We provided daily data, which can be aggregated them to monthly and annual data. -* - fasse_location - - `/n/dominici_nsaph_l3/data/exposure/solar_activity` -* - git_repository - - [solar_data_timezone_to_zipcode](https://github.com/NSAPH-Data-Processing/solar_data_timezone_to_zipcode) -* - size - - 1.18 GB -* - files - - -``` -````` - -### PM2.5 US High Resolution Grid, 2000-2016 - -`````{dropdown} PM2.5 US Grid -```{list-table} -:header-rows: 0 - -* - spatial_coverage - - US -* - spatial_resolution - - 1km x 1km -* - temporal_coverage - - 2000-2016 -* - temporal_resolution - - annually -* - size - - ~80 MB/year -* - fasse_location - - Append `/n/dominici_nsaph_l3/Lab/data/exposure` to `/pm25/whole_us/annual/grid_pts/qd_new_predictions`. -* - processing_description - - Merge by row the 1-column matrix PM2.5 values (`PredictionStep2_Annual_PM25_USGrid_20**0101_20**1231.rds`) with the corresponding 1km x 1km United States Grid Matrix (`USGridSite.rds`). For data visualization, see: https://github.com/wxwx1993/National_Causal/blob/master/pm_map.R. -* - publication - - Q. Di, H. Amini, L. Shi, I. Kloog, R. Silvern, J. Kelly, M. B. Sabath, C. Choirat, P. Koutrakis, A. Lyapustin, Y. Wang, L. J. Mickley, J. Schwartz, An ensemble-based model of PM2.5 concentration across the contiguous United States with high spatiotemporal resolution. Environ. Int. 130, 104909 (2019). https://pubmed.ncbi.nlm.nih.gov/31272018/ -* - files - - -``` -``` - ├── PredictionStep2_Annual_PM25_USGrid_20000101_20001231.rds - ├── ... - ├── PredictionStep2_Annual_PM25_USGrid_20160101_20161231.rds - ├── readme.txt - └── USGridSite.rds -``` -````` - -## Confounder data - -### Gridmet - -This project aggregates Gridmet data into social boundaries such as zip codes or census tracts -that can be then joined with data available at those social units such as Medicare or Census data. Specifically it does the following: - -1. Start point: GRIDMET climate data (4x4km grid), Census Bureau Zip Code Tabulation Area (ZCTA) boundaries -2. Aggregation technique: area weight -3. Output: ZCTAs with the area-weighted average of each GRIDMET variable - -`````{dropdown} Gridmet data - -```{list-table} -:header-rows: 0 -* - fasse_location - - `/n/dominici_nsaph_l3/Lab/data/data/gridmet/` -* - dataset_author - - Nate Fairbank -* - date_created - - July 15, 2022 -* - spatial_coverage - - Continental US -* - spatial_resolution - - 4x4km aggregated to Zip Code Tabulation Area (ZCTA) -* - temporal_coverage - - 2000-2018 -* - temporal_resolution - - daily -* - data_source - - GRIDMET, Census Bureau -``` -``` {dropdown} data_source description -1. [GRIDMET data](https://www.northwestknowledge.net/metdata/data/permanent/) - -All original GRIDMET varaibles are preserved. There are a total of 18: - - Primary Climate Variables (9): Maximum temperature, minimum temperature, precipitation accumulation, downward surface shortwave radiation, wind-velocity, wind direction, humidity (maximum and minimum relative humidity and specific humidity) - - Derived variables (7): Reference evapotranspiration (ASCE Penman-Montieth), Energy Release Component*, Burning Index*, 100-hour and 1000-hour dead fuel moisture, mean vapor pressure deficit, 10-day Palmer Drought Severity Index *fuel model G (conifer forest) - - Variables from data processing (2): - - CRS: originally "coordinate reference system", this had a value of "1" for every grid in GRIDMET. As these grids were tabulated into ZCTAs, these "1"s were tabulated as well. Thus, this number indicates how many grids (partial or whole) were part of the area aggregation for that zip code. - - AreaProp: To do the area weighting, each ZCTA/grid pairing was given a percentage of how much of the ZCTA's area was contained in that grid. For each ZCTA, these proportions sum to 1, meaning that 100% of the ZCTA's area was accounted for. Thus this represents a "check" on the process. A small minority of the data does NOT sum to "1". These are cases on the edge of the map, such as the Florida Keys, that GRIDMET's data does not fully cover. -- For documentation on GRIDMET variables please refer to [their materials](https://www.climatologylab.org/gridmet.html). -- Notes from the GRIDMET files: - - author: John Abatzoglou - University of Idaho, jabatzoglou@uidaho.edu - - The projection information for this file is: GCS WGS 1984. - - Citation: Abatzoglou, J.T., 2013, Development of gridded surface meteorological data for ecological applications and modeling, International Journal of Climatology, DOI: 10.1002/joc.3413 - - Days correspond approximately to calendar days ending at midnight, Mountain Standard Time (7 UTC the next calendar day) - -2. [Census Bureau Zip Code Tabulation Area (ZCTA) TIGER/Line Files and Shapefiles](https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-line-file.2000.html) - -ZCTAs were used because they represent the government's "best guess" at what the spacial boundaries of a zip code are. While zip codes are commonly percieved as denoting spatial boundaries, they are in fact just a collection of addresses. Furthermore, they are "working units" that are defined and changed based on the needs (and whims) of the postal service. There is a degree of compromise/subjectivity here. The best answer would be "don't use zip codes as a unit of analysis". If they must be used, ZCTAs represent the best solution. -- NOT ALL ZIP CODES HAVE A CORRESPONDING ZCTA. ZCTAs are a trademark of the Census Bureau, an organization fundamentally concerned with PEOPLE. Zip Codes are a trademark of the US Postal Service, an organization fundamentally concerned with MAIL. Some zip codes map to a single address or very small collection of addresses. These represent high-volume mail facilities (think like PO boxes, etc), and are NOT included as seperate ZCTAs. While frustrating from a pure data perspective (why is there all this unmatched data!?) this makes sense from a practical perspective. If a Medicare patient gave a PO Box as their address, and we use that PO Box's zip code to infer what their exposure was we'd be making an inappropriate inference, as that patient doesn't actually live inside their PO Box. If matching all these "point" zip codes is necessary, a zip to ZCTA crosswalk is available here: `/n/dominici_nsaph_l3/Lab/data/shapefiles/` -- Because zip codes change constantly, ZCTAs have to be updated. They were first created following the 2000 census, and started receiving annual updates in 2007. Thus, this process uses the annual file for all data for that year, and the 2000 census file for years 2000-2006. -- The Census has made major updates to the ZCTAs every decade. For the 2000 Census, they include suffixes such as "XX" and "HH" to indicate large, unpopulated land areas such as national parks and bodies of water. -- "HH" suffix used to represent large water bodies -- For more about ZCTAs, read [here](https://www.census.gov/programs-surveys/geography/guidance/geo-areas/zctas.html) -``` -``` {dropdown} processing_description -- Stage 1: Crosswalk Development (done in ArcGIS): - 1. GRIDMET's 4x4km grid was imported and transformed into defined polygon formats (rather than raster or point features) - 2. Census Bureau's ZCTA shapefiles for that year were imported - 3. The "tabulate intersection" tool was used to calculate, for each ZCTA/grid pair, the proportion of the ZCTA's area that the grid square contributed. For example, if ZCTA 12345 overlapped 3 grids, there would be three rows: (12345, Grid A, .4), (12345, Grid B, .2), (12345, Grid C, .4). - 4. The crosswalk produced in step 3 was exported -- Stage 2: Area-weighted aggregation: - 1. The crosswalk for that year is is imported. - 2. For each day, the GRIDMET file is imported. - 3. The data for each grid (all 16 variables) is joined to the crosswalk by lat/long pair for that grid. Note that if a grid square overlaps, say, three ZCTAs, then its data will be repeated 3 times so that it can be weighted appropriately for each ZCTA. - 4. The data is multiplied by the ZCTA proportion for that grid square. - 5. The data is grouped by ZCTA with the aggregation method "sum". - 6. That day is appended to the netCDF file - 7. An annual netCDF file is exported. -``` -````` - -## Shapefiles - -`````{dropdown} Zipcode_info -```{list-table} -:header-rows: 0 -* - dataset_author - - Yaguang Wei -* - date_created - - Jun 3, 2020 -* - data_source - - The daily and annual estimations of ambient PM2.5 at ZIP Codes; U.S. ZIP code database. -* - spatial_coverage - - US -* - spatial_resolution - - zipcode -* - temporal_coverage - - 2000-2016 -* - temporal_resolution - - annually -* - description - - For general ZIP Codes with a polygon representation, we estimated their -pollution levels by averaging the predictions of grid cells whose centroids lie inside the polygon of that ZIP Code; For other ZIP Codes such as Post Offices or large volume single customers, we treated them as a single point and predicted their pollution levels by assigning the predictions of the nearest grid cell. Further description is available on [Spatial_aggregation]((https://www.overleaf.com/project/6248df38346ed665a2b1fb08)). -* - git_repository - - [Yaguang_pm25_code](https://github.com/NSAPH/National-Causal-Analysis/tree/master/Exposures/code/yaguang_pm25_code) - -* - fasse_location - - `/n/dominici_nsaph_l3/Lab/data/shapefiles/zip_shape_files/Zipcode_Info` -* - files - - -``` -pobox_csv -``` -├── pobox_csv -│ ├── ESRI00USZIP5_POINT_WGS84_POBOX.csv -│ ├── ESRI01USZIP5_POINT_WGS84_POBOX.csv -│ ├── ESRI02USZIP5_POINT_WGS84_POBOX.csv -│ ├── ESRI03USZIP5_POINT_WGS84_POBOX.csv -│ ├── ESRI04USZIP5_POINT_WGS84_POBOX.csv -│ ├── ESRI05USZIP5_POINT_WGS84_POBOX.csv -│ ├── ESRI06USZIP5_POINT_WGS84_POBOX.csv -│ ├── ESRI07USZIP5_POINT_WGS84_POBOX.csv -│ ├── ESRI08USZIP5_POINT_WGS84_POBOX.csv -│ ├── ESRI09USZIP5_POINT_WGS84_POBOX.csv -│ ├── ESRI10USZIP5_POINT_WGS84_POBOX.csv -│ ├── ESRI11USZIP5_POINT_WGS84_POBOX.csv -│ ├── ESRI12USZIP5_POINT_WGS84_POBOX.csv -│ ├── ESRI13USZIP5_POINT_WGS84_POBOX.csv -│ ├── ESRI14USZIP5_POINT_WGS84_POBOX.csv -│ ├── ESRI15USZIP5_POINT_WGS84_POBOX.csv -│ ├── ESRI16USZIP5_POINT_WGS84_POBOX.csv -│ ├── ESRI17USZIP5_POINT_WGS84_POBOX.csv -│ ├── ESRI18USZIP5_POINT_WGS84_POBOX.csv -│ └── ESRI19USZIP5_POINT_WGS84_POBOX.csv -└── polygon - ├── ESRIUSZIP5_POLY_WGS84.cpg - ├── ESRIUSZIP5_POLY_WGS84.dbf - ├── ESRIUSZIP5_POLY_WGS84.prj - ├── ESRIUSZIP5_POLY_WGS84.sbn - ├── ESRIUSZIP5_POLY_WGS84.sbx - ├── ESRIUSZIP5_POLY_WGS84.shp - ├── ESRIUSZIP5_POLY_WGS84.shp.xml - └── ESRIUSZIP5_POLY_WGS84.shx - -yy: 00, 01, ..., 18, 19 -``` -````` - - -### ZIP to ZCTA crosswalk (2015) -`````{dropdown} **zip_to_zcta** -```{list-table} -:header-rows: 0 -* - rce_location - - `~/shared_space/ci3_exposure/locations/zcta/crosswalk/` -* - fasse_location - - `/n/dominici_nsaph_l3/Lab/data/shapefiles/zip_to_zcta` -* - date_created - - Nov 2, 2015 -* - spatial_coverage - - contiguous US -* - size - - 1.8 MB -* - header - - `ZIP,PO_NAME,STATE,ZIP_TYPE,ZCTA` -* - files - - -``` -``` - └── Zip_to_ZCTA_crosswalk_2015_JSI.csv -``` -````` - -## Other data - -The following are other commonly-used public data sources, many of which may be found in the **confounders** folder on FASSE. - -- [CMS Synthetic data: 2008, 2009, and 2010](https://www.cms.gov/Research-Statistics-Data-and-Systems/Downloadable-Public-Use-Files/SynPUFs/DE_Syn_PUF) -- [CDC, Behavioral Risk Factor Surveillance System (BRFSS) data for body mass index and smoking status](https://www.cdc.gov/brfss/annual_data/annual_2008.htm) -- [PM2.5 concentrations (US) from Di et al. 2019](https://beta.sedac.ciesin.columbia.edu/data/set/aqdh-pm2-5-concentrations-contiguous-us-1-km-2000-2016) -- [Census data](https://www.census.gov/data/developers/data-sets/acs-5year.2010.html) -- [GridMET data](https://www.climatologylab.org/gridmet.html) -- [County Presidential Election Returns 2000-2020](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/VOQCHQ&version=10.0) -- [United States Broadband Usage Percentages Dataset](https://github.com/microsoft/USBroadbandUsagePercentages) - diff --git a/handbook/fasse_efficient.md b/handbook/fairshare.md similarity index 87% rename from handbook/fasse_efficient.md rename to handbook/fairshare.md index 7700148..9960dea 100644 --- a/handbook/fasse_efficient.md +++ b/handbook/fairshare.md @@ -1,10 +1,10 @@ -# Efficient Resource Utilization on FASSE +# Fairshare on FASRC clusters As members of our research group, we share the responsibility to ensure that our computational resources on the Slurm cluster are used efficiently. To promote fair and effective use, please take a moment to review the following guidelines on resource requests and usage. ## Fairshare Policy -Fairshare determines the fraction of system resources allocated to users, assigning scores to users based on their resource usage, and establishing priority levels for users based on these scores. Given that FASSE users come from different groups that have different resource needs, Fairshare aims to establish a method for prioritizing job allocation. This allows users who haven't fully utilized their allocated resources to receive higher priority for their jobs, ensuring that groups that have exceeded their resource allocation do not monopolize the system. +Fairshare determines the fraction of system resources allocated to users, assigning scores to users based on their resource usage, and establishing priority levels for users based on these scores. Given that FASRC cluster users come from different groups that have different resource needs, Fairshare aims to establish a method for prioritizing job allocation. This allows users who haven't fully utilized their allocated resources to receive higher priority for their jobs, ensuring that groups that have exceeded their resource allocation do not monopolize the system. Read more about the Farishare policy [here](https://docs.rc.fas.harvard.edu/kb/fairshare/). @@ -76,10 +76,8 @@ Lab moderators can use the `sreport` command to see the usage of the resources b sreport cluster AccountUtilizationByUser account=dominici_lab Start=2024-03-21 End=2024-03-28 ``` -## Best Practices in Using FASSE +## Best Practices in Using FASRC clusters -- **FASSE is for L3 data** - - Utilize FASSE exclusively for handling sensitive and L3 data. For other computations, such as those involving simulated data, please transition to Cannon. - **Understanding your needs** - Ensure you fully understand the resource requirements of your job before submission. - Conduct small-scale tests or pilot runs to assess the CPU and memory requirements. For instance: @@ -98,9 +96,8 @@ sreport cluster AccountUtilizationByUser account=dominici_lab Start=2024-03-21 E - **Communication** - If you anticipate a large or unusual resource request, consider discussing it with the group. This can help ensure that your needs are met without adversely impacting others. - **Other resources** - - FASSE (and Cannon) documentation provides a wealth of knowledge on best practices and available resources. Familiarize yourself with it to ensure efficient utilization. + - FASRC documentation provides a wealth of knowledge on best practices and available resources. Familiarize yourself with it to ensure efficient utilization. - Fairshare documentation: https://docs.rc.fas.harvard.edu/kb/fairshare/ - - FASSE partitions: https://docs.rc.fas.harvard.edu/kb/fasse/#SLURM_and_Partitions - SLURM: https://docs.rc.fas.harvard.edu/kb/running-jobs/ diff --git a/handbook/fasse_github.md b/handbook/fasse_github.md deleted file mode 100644 index a65f802..0000000 --- a/handbook/fasse_github.md +++ /dev/null @@ -1,14 +0,0 @@ -# FASSE and GitHub Project Work - -If you are working in a team on a project which uses GitHub, you might encounter some issues while using FASSE. If you and your teammates are both working from a folder in FASSE and connecting to GitHub through this folder, your work may suddenly start to disappear and be overwritten, even if you and your teammates are working on the project at different times. - -However, we do not have to stop using GitHub! To avoid this problem, you and your teammates can each make your own folder in FASSE in the /Lab/projects/ folder to work on the project. For example, you can have a folder named `johnsmith_health_pollution` and your teammate will have project folder `janesmith_health_pollution` (where health_polllution will be a descriptive title of the exposures and health outcomes). In each of these folders you can link to the project GitHub repository, where you will be able to push and pull without an issue, ideally to different feature branches (see guide to branches [here](https://nsaph.github.io/handbook/collaborative.html)). - -Since you will have multiple folders, it is especially important to remember to use symbolic links to your input data as these files can be very large. When you and your teammates have completed your work on the project, you can collapse back down to your `health_pollution` folder, which should reference the main branch from your GitHub repository. - -## Summary - -- Each teammate should have their own project folder in FASSE which is connected to one GitHub repository -- Use symbolic links to data sources -- It is a good idea to use different GitHub branches -- At the end of the project, collapse down to one folder in FASSE diff --git a/handbook/fasse_partitions.md b/handbook/fasse_partitions.md deleted file mode 100644 index c995670..0000000 --- a/handbook/fasse_partitions.md +++ /dev/null @@ -1,27 +0,0 @@ -# FASSE Compute Partitions - -If your FASSE session has crashed due to insufficient memory, it may be due to using the incorrect partition. A partition is a queue for your work, and FASSE has several of these, each with different sizes and restrictions. In RStudio, you can select which partition you would like to use before you start your session. Below you can see where to type the name of the desired partition. - - -Screen Shot 2022-08-23 at 5 15 37 PM - - - -Sometimes the data you are working with can require a very large amount of memory, such as individual data from Medicare enrollment files. If you are anticipating using a lot of memory, you should request from the fasse_bigmem partition. - -```{note} -To view FASSE partions and learn more, see [the official FASSE documentation](https://docs.rc.fas.harvard.edu/kb/fasse/#articleTOC_15). -``` - -For your convenience, the list is also below, updated August 2022: -|Partition |Number of Nodes |Cores per Node |CPU Core Types| Mem per Node (GB)| Time Limit |Max Jobs |Max Cores |MPI Suitable? |GPU Capable?| -|--- |------ |---- | ------ | ----- | ----- | -- | --- | ---- | ----- | -|fasse |42 |48 |Intel "Cascade Lake" |184 |7 days |none |none |yes |No| -|fasse_bigmem |6 |64 |Intel "Ice Lake" |499 |7 days| none |none |yes| No| -|fasse_gpu |4 |32 |Intel| "Cascade Lake"| 373 |7 days| none| none |yes |Yes (4 V100/node)| -|test |5 |48 |Intel "Cascade Lake"| 184 |8 hours| 5 |96 cores| yes| No| -|remoteviz |1| 32 |Intel "Cascade Lake" |373 |7 days| none| none| no |Shared V100 GPUs for rendering| -|serial_requeue| varies| varies| Intel| varies| 7 days |none |none |No| Yes| -|PI/Lab nodes| varies| varies |varies| varies| none| none| none| varies| varies| - - diff --git a/handbook/labshare_github.md b/handbook/labshare_github.md new file mode 100644 index 0000000..35fcb83 --- /dev/null +++ b/handbook/labshare_github.md @@ -0,0 +1,14 @@ +# Lab Share and GitHub Project Work + +If you are working in a team on a project which uses GitHub, you might encounter some issues while using Lab Share. If you and your teammates are both working from a folder in Lab Share and connecting to GitHub through this folder, your work may suddenly start to disappear and be overwritten, even if you and your teammates are working on the project at different times. + +However, we do not have to stop using GitHub! To avoid this problem, you and your teammates can each make your own folder in Lab Share in the /Lab/projects/ folder to work on the project. For example, you can have a folder named `johnsmith_health_pollution` and your teammate will have project folder `janesmith_health_pollution` (where health_polllution will be a descriptive title of the exposures and health outcomes). In each of these folders you can link to the project GitHub repository, where you will be able to push and pull without an issue, ideally to different feature branches (see guide to branches [here](https://nsaph.github.io/handbook/collaborative.html)). + +Since you will have multiple folders, it is especially important to remember to use symbolic links to your input data as these files can be very large. When you and your teammates have completed your work on the project, you can collapse back down to your `health_pollution` folder, which should reference the main branch from your GitHub repository. + +## Summary + +- Each teammate should have their own project folder in Lab Share which is connected to one GitHub repository +- Use symbolic links to data sources +- It is a good idea to use different GitHub branches +- At the end of the project, collapse down to one folder in Lab Share diff --git a/handbook/rce.md b/handbook/rce.md deleted file mode 100644 index dc096f8..0000000 --- a/handbook/rce.md +++ /dev/null @@ -1,52 +0,0 @@ -# Working on RCE - -Unless you are at Harvard and connected to the Harvard network, you will first need to connect to the Harvard's VPN (https://vpn.harvard.edu/) through the Cisco AnyConnect application in order to work on the Research Computing Environment (RCE). - -See the official RCE documentation here: https://rce-docs.hmdc.harvard.edu - -```{warning} -The RCE is no longer maintained and the users should plan for their transition to FASSE. -``` - -## Access RCE - -You can access the RCE in the three following ways: - -1. In the command line with: `ssh username@rce.hmdc.harvard.edu` -2. In the web browser at: https://rce.hmdc.harvard.edu/nxwebplayer -3. In [the NoMachine software](https://rce-docs.hmdc.harvard.edu/nx4_installation). - -## Running RStudio on RCE - -Inside the browser or NoMachine interface run the following application: `Menu` -> `Applications` -> `RCE Powered Applications` -> `Anaconda Shell` (see figure below). - -```{figure} imgs/img.png ---- -scale: 90% -align: center ---- -``` - -Set the number of CPUs and the memory size for your job. Make sure that the allocated memory exceeds -the size of the data you want to process. For instance, if your dataset is 20 GB in size, allocate -40 GB or 60 GB of memory. - -```{figure} imgs/job_size.png ---- -scale: 30% -align: center ---- -``` - -When the shell is open, run the following commands to load the NSAPH's R environment and the RStudio. - -```bash -export CONDA_ENVS_PATH=/nfs/projects/n/nsaph_common/conda/envs/ -export CONDA_PKGS_PATH=/nfs/projects/n/nsaph_common/conda/pkgs/ -source activate nsaph -rstudio -``` - -## RCE Conda environment - -[Steps for setting up a Conda environment](https://github.com/NSAPH/CausalGPS-test/blob/main/Analyses/scaling_synthetic_rce_1/scaling_synthetic_rce.md#steps-for-setting-up-environment) diff --git a/handbook/rstudio.md b/handbook/rstudio.md deleted file mode 100644 index 8e8d9c1..0000000 --- a/handbook/rstudio.md +++ /dev/null @@ -1,50 +0,0 @@ -## Using RStudio on FASSE - -There are two ways to use RStudio and access your files on FASSE: - -1. RStudio Server or -2. Remote Desktop (or Containerized Remote Desktop). You may find RStudio Server to be the easiest to use because you can use shortcuts like Cmd + C (or Ctrl + C) there. Meanwhile, Remote Desktop (or Containerized Remote Desktop) is useful because you can use Terminal there to upload files to GitHub. - -When starting a RStudio session, you may want to use the following options: - -- Partition: "fasse" (or "test" if "fasse" is taking too long to start because other user(s) are using that partition too). -- R version to be loaded with Rstudio: `R/4.0.5-fasrc02 Comp gcc` (or whatever version of R you're looking to use). -- Location of your R_LIBS_USER folder: `$HOME/apps/{your project name}/R_4.0.5:$R_LIBS_USER`. More on this last option below. - -## Installing R Packages - -To install R packages in RStudio Server in the FASEE cluster, you will need to configure the proxies according to our [Proxy Settings](https://docs.rc.fas.harvard.edu/kb/proxy-settings/) guidelines. To prepare for package installation, run the following two commands on RStudio Server before installing any packages. In this example, we want to install package `argparse` as explained by [this link](https://docs.rc.fas.harvard.edu/kb/rstudio-server-vs-rstudio-desktop/#Installing_R_packages_in_RStudio_Server_in_the_FASSE_cluster): - -```shell -Sys.setenv(http_proxy="http://rcproxy.rc.fas.harvard.edu:3128") -Sys.setenv(https_proxy="http://rcproxy.rc.fas.harvard.edu:3128") -install.packages("argparse") -``` - - -If you don't want to install the same R packages over and over again each time you open FASSE, create a \$R_LIBS_USER file. -[This link](https://docs.rc.fas.harvard.edu/kb/r-packages/) explains some of this, and below are some tips for setting up your \$R_LIBS_USER, but you might want to go to Harvard FAS Research Computing's weekly [office hours](https://www.rc.fas.harvard.edu/training/office-hours/) to get more help. - -You'll want to edit your `.bashrc` file (which you can see in your home folder by clicking "View" > "Show Hidden Files") to be the following. -In this example, the user wants to use R version 4.0.5. `projects`, `ml_r4`, and `ml_rstudio` are examples of Terminal commands that you may to create "aliases" (i.e., shortcuts) for. -After you do this, you can open a Terminal window (or more than one Terminal window) and type your aliases to, in this example, change the working directory to the `/n/dominici_nsaph_l3/Lab/projects/projects` folder or load RStudio (which requires loading R first). - -```shell -# .bashrc - -# Source global definitions -if [ -f /etc/bashrc ]; then -. /etc/bashrc -fi - -export R_LIBS_USER=$HOME/apps/{your project name}/R_4.0.5:$R_LIBS_USER - -# Aliases -alias projects='cd /n/dominici_nsaph_l3/Lab/projects' -alias ml_r4='module load gcc/9.3.0-fasrc01 R/4.0.5-fasrc02' -alias ml_rstudio='module load rstudio/1.1.453-fasrc01' -``` - -```{note} -If you want to use your \$R_LIBS_USER to access R packages you've installed in previous sessions (as already described above, this allows you to not have to `install.packages()` in RStudio every time you start a FASSE session), you'll need to create a folder, which is called `apps/R_4.0.5` in this example (that is, this example requires a folder called "R_4.0.5" within a folder called "apps"), in your private private home folder. -``` diff --git a/handbook/vscode.md b/handbook/vscode.md index 38a3966..9fbaf8b 100644 --- a/handbook/vscode.md +++ b/handbook/vscode.md @@ -2,14 +2,14 @@ ## Background and Installation VS Code, or Visual Studio Code, is a popular tool that allows for users to work in a variety of languages, including Python, Python notebooks, R, C, and more, all from one app. -It also can work with Git and Docker. It is not currently an option as an interactive app in FASSE like RStudio or Jupyter, but you can still use VS Code in CANNON/FASSE fairly easily, in 2 ways: +It also can work with Git and Docker. It is not currently an option as an interactive app in FASRC clusters like RStudio or Jupyter, but you can still use VS Code fairly easily, in 2 ways: 1. **Virtual Desktop** 2. **SSH Tunnel** ## Setting Up a Virtual Desktop -1. Launch **Remote Desktop** in CANNON/FASSE and follow the steps to create a session. +1. Launch **Remote Desktop** in CANNON and follow the steps to create a session. 2. Open VS Code: In a new Terminal window, run: ```bash module load vscode @@ -75,11 +75,10 @@ Before setting up the SSH Tunnel, ensure your `.bashrc` file is configured as sh - Or create an **[Azure VM](https://learn.microsoft.com/en-us/azure/virtual-machines/linux/quick-create-portal?tabs=ubuntu)** -### Connecting to CANNON/FASSE +### Connecting to CANNON 1. In VS Code, open the **Command Palette** (`F1` or `⇧⌘P`). 2. Select **Remote-SSH: Connect to Host...** and enter: - `username@login.rc.fas.harvard.edu` (for CANNON) - - `username@fasselogin.rc.fas.harvard.edu` (for FASSE) 3. Enter your **Harvard password**, followed by your **multi-factor authentication code** (e.g., Microsoft Authenticator, Duo). 4. VS Code will establish the connection, displaying progress notifications and logs in the **Remote - SSH output channel**. 5. Once connected, you’ll see an empty VS Code window. The **Status Bar** (bottom left corner) shows the active remote session.