Datasets

Author
Published

September 17, 2023

1 Vanderbilt Biostatistics Datasets

Most of the datasets on this page are in the S dumpdata and R compressed save() file formats. Some are available in Excel and ASCII ( .csv) formats and Stata (.dta). If you need one of the datasets we maintain converted to a non-S format please e-mail Frank Harrell to make a request.

If you install the R Hmisc package you can retrieve most of the datasets stored here using for example getHdata(titanic3).

Permission is granted to anyone wishing to use the data sets provided here. Please reference the original paper which, for most data sets, is given in our notes linked below, and note “Data obtained from http://hbiostat.org/data courtesy of the Vanderbilt University Department of Biostatistics.”


Description R S-Plus (.sdd)
Stata (.dta)
Excel ASCII contents()
Meningitis dataset
abm.html abm.sav abm.dta abm.xls NA Cabm.html
Cardiac catheterization diagnostic data
acath.html acath.sav acath.dta acath.xls.zip NA Cacath.html
Bacteremia data
Source bacteremia.sav Cbacteremia.html
Body Fat Data
Source R code bodyfat.rda bodyfat.csv Cbodyfat.html
CRASH-2
crash2.html crash2.rda crash2.dta NA NA Ccrash2.html
WHO ARI Multicentre Study of clinical signs and etiologic agents
Description ari.sav ari_other.sav NA NA ari.zip ari.html
Rosner’s estriol data
NA birth.estriol.sav NA NA birth_estriol.csv Cbirth.estriol.html
Boston neighborhood housing prices data
boston.html boston.sav boston.sdd NA NA Cboston.html
Cervical Dystonia longitudinal dataset
cdystonia.html cdystonia.sav cdystonia.dta NA NA Ccdystonia.html
U.S. counties and 1992 presidential election dataset
counties.html counties.sav counties.sdd countiesxls.zip NA Ccounties.html
Diabetes data
diabetes.html diabetes.sav diabetes.sdd diabetes.xls diabetes.csv Cdiabetes.html
Duchenne muscular dystrophy dataset
dmd.html dmd.sav dmd.sdd NA dmd.csv Cdmd.html
Esophageal pH Data
Article esopH.rda esopH.dta CesopH.html
Article esopH2.rda CesopH2.html
German Breast Cancer Data
GermanBreastCa gbsg.sav NA NA gbsg_ba_ca.dat Cgbsg.html
Hypertension data from the Dominican Republic
DominicanHTN.html DominicanHTN.sav NA DominicanHTN.xls NA CDominicanHTN.html
Rosner’s FEV data
FEV.html FEV.sav FEV.sdd NA FEV.csv CFEV.html
Depression drug trial data
hamdp.html hamdp.rda NA Chamdp.html
Rosner’s hospital data
NA hospital.sav NA NA NA Chospital.html
Rat vaginal cancer data
kprats.html kprats.sav kprats.sdd NA NA Ckprats.html
Rosner’s lead data
NA lead.sav lead.dta NA NA Clead.html
NHANES glycohemoglobin data
NhanesGh nhgh.rda nhgh.dta NA nhgh.tsv nhgh.html
1996 Olympics medal counts
olympics.1996.html olympics.1996.sav olympics.1996.sdd NA olympics.1996.asc Colympics.1996.html
Mayo Clinic primary biliary cirrhosis data
pbc.html pbc.sav pbc.dta pbc.xls NA Cpbc.html
Plasma Retinol/Beta-Carotene dataset
plasma.html plasma.sav plasma.sdd NA NA Cplasma.html
Byar & Greene prostate cancer data
prostate.html prostate.rda prostate.dta prostate.xls NA Cprostate.html
Right heart catheterization dataset
rhc.html rhc.sav rhc.sdd NA rhc.csv Crhc.html
Drug safety dataset
Slides safety.rda
Schizophrenia dataset
Schizophrenia schizophrenia.rda
40-observation sex-age-response data
sex.age.response.html sex.age.response.sav sex.age.response.dta NA NA Csex.age.response.html
Sicily interrupted time series dataset
Tutorial sicily.rda sicily.csv sicily.html
Simulated HIV dataset
Liu, Shepherd, Li, Harrell 2017 simhiv.sav
Simulated longitudinal ordinal clinical trial with 250,000 patients and a separate file for the first 500 patients
simlongord.html simlongord.rda simlongord500.rda
Stress Echocardiography Data
stressEcho.html stressEcho.sav stressEcho.sdd NA stressEcho.csv CstressEcho.html
SUPPORT study datasets
Description support.sav support.dta support.xls support.tsv Csupport.html
support2.sav support2.sdd NA support2csv.zip Csupport2.html
Data for Titanic passengers
titanic.html titanic.sav titanic.sdd NA titanic.txt Ctitanic.html
NA titanic2.sav titanic2.sdd NA NA Ctitanic2.html
NA titanic3.sav titanic3.dta titanic3.xls titanic3.csv Ctitanic3.html
titanic5.html titanic5.xlsx titanic5.csv
VA lung cancer data
valung.html valung.sav valung.sdd NA valung.csv Cvalung.html
Very low birth weight infant
vlbw.html vlbw.sav vlbw.dta vlbw.sdd NA vlbw.zip Cvlbw.html
Data sets from Dupont, W. D. (2002). Statistical Modeling for Biomedical Researchers
Bernard et al. (1997) NA NA NA 1.3.2.Sepsis.csv NA
Bernard et al. (1997) NA NA NA 1.4.11.Sepsis.csv NA
Parl et al. (1989) NA NA NA 10.7.ERpolymorphism.csv NA
Lang et al. (1995) NA NA NA 11.2.Isoproterenol.csv NA
Lang et al. (1995) NA NA NA 11.2.Long.Isoproterenol.csv NA
(no ref) NA NA NA 11.AreaUnderCurve.csv NA
Brent et al. (1999) NA NA NA 2.12.Poisson.csv NA
Gross et al. (1999) NA NA NA 2.18.Funding.csv NA
Levy (1999) NA NA NA 2.20.Framingham.csv NA
Eisenhofer et al. (1999) NA NA NA 2.ex.vonHippelLindau.csv NA
Gross et al. (1999) NA NA NA 3.ex.Funding.csv NA
Bernard et al. (1997) NA NA NA 4.11.Sepsis.csv NA
Bernard et al. (1997) NA NA NA 4.18.Sepsis.csv NA
Breslow & Day (1980) NA NA NA 4.21.EsophagealCa.csv NA
Bernard et al. (1997) NA NA NA 4.ex.Sepsis.csv NA
Breslow & Day (1980) NA NA NA 5.5.EsophagealCa.csv NA
Scholer et al. (1997) NA NA NA 5.ex.InjuryDeath.csv NA
O’Donnell et al. (2000) NA NA NA 6.9.Hemorrhage.csv NA
Dupont et al. (1985) NA NA NA 6.ex.Breast.csv NA
Levy (1999) NA NA NA 8.12.Framingham.csv NA
Levy (1999) NA NA NA 8.7.Framingham.csv NA
(no ref) NA NA NA 8.8.2.Person-Years.csv NA
(no ref) NA NA NA 8.8.2.Survival.csv NA
Scholer et al. (1997) NA NA NA 8.ex.InjuryDeath.csv NA
(no ref) NA NA NA 11.ex.Sepsis.csv NA

Note: To make csv files from R save files do the following:

load(url('https://hbiostat.org/data/repo/foo.sav')) 
ls() # find name of data frame just loaded (here assumed 'foo')
write.table(d, file='foo.csv', sep=',', col.names=NA) 

2 Other Datasets Available from the Web

3 Longitudinal Datasets

  • Personality and Subjective Age: Evidence from Six Samples (Replication Package): https://hrsdata.isr.umich.edu/sites/default/files/documentation/other/1641492433/HRS_Replication__Package_Stephan_et_al.pdf
  • National Longitudinal Survey of Youth 1997: https://dasil.sites.grinnell.edu/downloadable-data/
  • National Longitudinal Survey of Youth (1997 – 2012) is a longitudinal project that follows a sample of American youth born between 1980-84 on various life aspects from 1997 to 2012. Download: CSV (41.0MB)
  • Cebu Longitudinal Health and Nutrition Survey: https://dataverse.unc.edu/dataverse/cebu; Cohort Profile: The Cebu Longitudinal Health and Nutrition Survey
  • The Wisconsin Longitudinal Study (WLS): https://www.ssc.wisc.edu/wlsresearch/
  • National Longitudinal Study of Adolescent to Adult Health, 1994-2008: https://heardlibrary.github.io/digital-scholarship/script/r/nlsaah/
  • The Add Health Study: Design
  • Search studies associated with available BioLINCC resources: https://biolincc.nhlbi.nih.gov/studies/?q=longitudinal
  • Longitudinal Studies of HIV-Associated Lung Infections and Complications (Lung HIV): https://biolincc.nhlbi.nih.gov/studies/lung_hiv/
  • UK Data Service: https://beta.ukdataservice.ac.uk/datacatalogue/studies/?Search=Longitudinal#!?
  • UA Little Rock, Publicly Available Data Sets: https://ualr.edu/irb/files/2019/07/Public-Use-Data-Sets.pdf
  • NCHS Longitudinal Studies of Aging: https://www.cdc.gov/nchs/data_access/ftp_data.htm

Thanks to Drew Levy for compiling this list of longitudinal studies.