Journal Article

Decentralized and reproducible geocoding and characterization of community and environmental exposures for multisite studies

Cole Brokamp, Chris Wolfe, Todd Lingren, John Harley and Patrick Ryan

in Journal of the American Medical Informatics Association

Published on behalf of American Medical Informatics Association

Volume 25, issue 3, pages 309-314
Published in print March 2018 | ISSN: 1067-5027
Published online November 2017 | e-ISSN: 1527-974X | DOI:

More Like This

Show all results sharing these subjects:

  • Medical Statistics and Methodology
  • Bioinformatics and Computational Biology
  • Biomathematics and Statistics


Show Summary Details




Geocoding and characterizing geographic, community, and environmental characteristics of study participants is frequently done in epidemiological studies. However, participant addresses are identifiable protected health information (PHI) and geocoding must be conducted in a Health Insurance Portability and Accountability Act–compliant manner. Our objective was to create a software application for this process that addresses limitations in current approaches.

Materials and Methods

We used a containerization platform to create DeGAUSS (Decentralized Geomarker Assessment for Multi-Site Studies), a software application that facilitates reproducible geocoding and geomarker assessment while maintaining the confidentiality of PHI. To validate the software, 215 350 addresses in Hamilton County, Ohio, were geocoded using DeGAUSS, ArcGIS, Google, and SAS and compared to a gold-standard approach. We distributed the DeGAUSS software to sites in an ongoing multisite study (Electronic Medical Records and Genomics, or eMERGE), and individual sites independently geocoded and assigned median census tract–level income and distance to nearest major roadway to their participants’ addresses, removed associated PHI, and returned deidentified data.


Within a multisite study, 52 244 study participants’ addresses across 5 sites were geocoded with a median distance to roadway of 10 022m and a median census tract income of $57 266, demonstrating the feasibility of DeGAUSS within a multisite study. Compared to other commonly used geocoding platforms, DeGAUSS had similar geocoding and geomarker assessment accuracies.


The open source DeGAUSS software overcomes multiple challenges in the use of address data in multisite studies and also serves as a more general reproducible research tool for geocoding and geomarker assessment.

Keywords: geocoding; geomarker assessment; multisite study; reproducible research

Journal Article.  4158 words.  Illustrated.

Subjects: Medical Statistics and Methodology ; Bioinformatics and Computational Biology ; Biomathematics and Statistics

Users without a subscription are not able to see the full content. Please, subscribe or login to access all content. subscribe or login to access all content.