__________________________________________________________________ The world's user-generated road map is more than 80% complete: Data Release Chris Barrington-Leigh and Adam Millard-Ball __________________________________________________________________ Table of Contents _________________ 1 DATA SETS 2 COMPILED COUNTRY-LEVEL DATA 3 VISUAL ASSESSMENT 4 AGGREGATED HISTORY 5 CITATION 6 CONTACT This file describes the data that accompanies "The world's user-generated road map is more than 80% complete." [citation and DOI to be added] 1 DATA SETS =========== Three separate data sets are provided: - a compiled country-level dataset - the visual assessment data (Section 3.1 of the paper) - the OpenStreetMap history, aggregated by density, national and sub-national boundaries. Note that the raw history file is available online at https://planet.openstreetmap.org/planet/full-history/ The analysis can be recreated using the code at https://gitlab.com/cpbl/osm-completeness, or http://sprawl.research.mcgill.ca/PLoS2017/osm-completeness For the compiled country-level dataset and the OpenStreetMap history, a more recent analysis is also provided. That is, the files in the main directory reflect the analysis in the published paper. The files in the 2017-update directory reflect an updated analysis using a version of the raw history file from April 3, 2017. (Note there is no update to the visual assessment.) You can download ALL data available through this release through the following single file: http://sprawl.research.mcgill.ca/PLoS2017/Barrington-Leigh-Millard-Ball-PLoSOne2017-data-release-all.zip You can download just the smaller files (see sections 2 and 3, below) with the following file: http://sprawl.research.mcgill.ca/PLoS2017/Barrington-Leigh-Millard-Ball-PLoSOne2017-data-release-small.zip If you are downloading data in conjunction with the "osm-completeness" code package, follow download instructions given with that package, i.e. unzip the "all" file, above, from within the repository folder ("osm-completeness/") 2 COMPILED COUNTRY-LEVEL DATA ============================= The compiled country-level dataset is provided as a .tsv file and a Pandas dataframe: countries_compiled.pandas countries_compiled.tsv This dataset will be the most relevant for most users. It compiles all the estimates from the visual assessment and parametric fits, along with country-level data from the World Bank and other sources. There is one row for each country, identified by ISOalpha3 (ISO alpha3 country code) plus one row for the entire world (ISOalpha3='ALL'). The fields/columns are defined in country_datadictionary.tsv. See the paper for complete details. The broad categories of data included are: (a) Visual assessment data. These measures can be recreated from the observation-level data in visual_assessment.tsv (described below) (b) Data from the OSM history database (these are the actual, not modeled, lengths) (c) Parametric fits data (d) Combined data from the parametric fits and visual assessment (e) Worldwide Governance Indicators (secondary data, provided for convenience) (f) Other country-level data (secondary data, provided for convenience) (g) IRF World Roads data (secondary data, provided for convenience) 3 VISUAL ASSESSMENT =================== The observation-level dataset from the visual assessment is provided as a .tsv file and a Pandas dataframe. visual_assessment.pandas visual_assessment.tsv Each row corresponds to one observation. See Section 3.1 of the paper for a full discussion. The fields are defined as follows: ISOalpha3: ISO alpha3 country code country: country name NmissingSegs: number of segments missing in the observed area NpresentSegs: number of segments present in the observed area totSegs: total segments (missing+present) density: population density (from Landscan) of the Landscan pixel [persons/km2] lat: latitude of the sampled point (center of the observed area) long: longitude of the sampled point (center of the observed area) sampleType: "highDensity" for the sample restricted to the densest urban area, "popWeighted" for the probability weighted sample of the country. See Section 3.1 of the paper weight: inverse probability sampling weights, constrained to sum to 1 within each country 4 AGGREGATED HISTORY ==================== The aggregated history is provided as .hd5 files, which can be loaded using the read_hd5 functions in Pandas. There are 6 different files: osmHistory_level-1D.hd5: global resolution osmHistory_level-0.hd5: country-level resolution osmHistory_level-1.hd5: sub-national resolution, based on the GADM boundaries (see below) In each case, the _density suffix also disaggregates by population density The data are in long form, with each row representing the number and length of OSM ways on a given date, and (where applicable) intersecting a given geography in a given density range. The geographic boundaries are v2.8 of Global Administrative Area dataset (http://www.gadm.org). Note that ways are double counted where they intersect more than one geography. The fields are defined as follows: date: the date of the analysis roadFlag: whether the way is tagged with one of the "road" tags. True indicates a road, while False indicates a track, pedestrian path or similar way. See Section 3.1 of the paper densDecile: the density decile of pixels intersecting the way, as calculated from the Landscan density raster. -1 indicates missing data. freq: number of OSM ways in the dataset length: length of OSM ways in the dataset ISOalpha3: ISO code of the country (for the country-level dataset) id0: the GADM id_0 code for the country (for the sub-national dataset). Note that ids are specific to GADM v2.8, and may change in subsequent versions of GADM. id1: the GADM id_1 code for the sub-national unit (for the sub-national dataset). Note that ids are specific to GADM v2.8, and may change in subsequent versions of GADM 5 CITATION ========== Barrington-Leigh, Christopher and Millard-Ball, Adam (2015), "The world's user-generated road map is more than 80% complete," [citation to follow] 6 CONTACT ========= For further questions, please contact: - Chris Barrington-Leigh, McGill University: http://alum.mit.edu/www/cpbl - Adam Millard-Ball, University of California, Santa Cruz: http://people.ucsc.edu/~adammb