Synthetic Medicare Data for Environmental Health Studies
Harvard Dataverse (Africa Rice Center, Bioversity International, CCAFS, CIAT, IFPRI, IRRI and WorldFish)
View Archive InfoField | Value | |
Title |
Synthetic Medicare Data for Environmental Health Studies
|
|
Identifier |
https://doi.org/10.7910/DVN/L7YF2G
|
|
Creator |
Khoshnevis, Naeem
Wu, Xiao Braun, Danielle |
|
Publisher |
Harvard Dataverse
|
|
Description |
We present a synthetic medicare claims dataset linked to environmental exposures and potential confounders. In most environmental health studies relying on claims data, data restrictions exist and the data cannot be shared publicly. Centers for Medicare and Medicaid services (CMS) has generated synthetic publicly available Medicare claims data for 2008-2010. In this dataset, we link the 2010 synthetic Medicare claims data to environmental exposures and potential confounders. We aggregated the Medicare claims synthetic data for 2010 to the county level. Data is compiled for the contiguous United States, which in 2010, included 3109 counties. We merged the Medicare claims synthetic data with air pollution exposure data, more specifically with estimates of 𝑃𝑀2.5 exposures obtained from Di et al., 2019, 2021, which provided daily and annual estimates of PM2.5 exposure at 1 km×1 km grid cells in the contiguous United States. We use Census Bureau (United States Census Bureau, 2021), the Center for Disease Control (Centers for Disease Control and Prevention (CDC), 2021), and GridMET (Abatzoglou, 2013) to obtain data on potential confounders. The mortality rate, as the outcome, was computed using the synthetic Medicare data (CMS, 2021). We use the average of surrounding counties to impute missing observations, except in the case of the CDC confounders, where we imputed missing values by generating a normal distribution for each state and randomly imputing from this distribution. The steps for generating the merged dataset are provided at NSAPH Synthetic Data Github Repository (https://github.com/NSAPH/synthetic_data). Analytic inferences based on this synthetic dataset should not be made. The aggregated dataset is composed of 46 columns and 3109 rows. |
|
Subject |
Earth and Environmental Sciences
Medicine, Health and Life Sciences Environmental Health Medicare Claims Data Air Pollution Mortality Rate Causal Inference Contiguous United States |
|
Contributor |
Khoshnevis, Naeem
|
|