Mapping NYC Open Data pt 1 - MapPluto & GeoJson (on Mac OSX)

Mapping NYC Open Data pt 1 - MapPluto & GeoJson (on Mac OSX)

I embarked upon a data project last week that involved using the treasure trove of data provided by New York City’s Open Data Portal. The project involves studying the link between NYC Dept. of Building violations and gentrification in Brooklyn and I knew I ultimately wanted to visualize my data on a map. I quickly found that the open data required a lot of processing to get it into any sort of shape for web-based data visualization. The biggest issue is that buildings data didn’t provide any geo-location data like latitude and longitudes (which would make mapping it impossible).


To Sections

Luckily, I found a huge data gem in MapPluto, an open dataset that contains massive amounts of data on every single NYC building divided by borough including geo-location data. HOWEVER, even with the MapPluto data, there’s still some data processing that needs to be done to get this into a format we can use. Namely, as of the time of this writing, it’s missing GeoJSON data for Brooklyn and only provides it as a Shapefile or CSV. We need the data to be GeoJSON so we can map it on the web. So this is a guide about converting NYC’s MapPluto data into a usable format for web-based dataviz projects!


What are these formats?

On the MapPluto website, I downloaded the Brooklyn dataset as a Shapefile. A shapefile contains a number of different files that hold the data we’re looking for, but the only file we really need ends in .shp (specifically BKMapPLUTO.shp from the .zip file).

We can’t use .shp files with dataviz libraries like Leaflet or D3, so we need to convert this data into a format that these libraries can understand - JSON! Or specifically, GeoJSON.

What is GeoJSON? It’s a format that’s identical to JSON, except is has a few key properties in it to allow mapping software and libraries to understand it.

{
  "crs": {
    "properties": {
      "name": "urn:ogc:def:crs:OGC:1.3:CRS84"
    },
    "type": "name"
  },
  "features": [
    {
      "geometry": {
        "coordinates": [
          -73.98066920914556,
          40.673810438749584
        ],
        "type": "Point"
      },
      "type": "Feature"
    }
  ]
}

A data object like the one above can then be fed into a data mapping library and it should know where to put the point on a map!


Converting a shapefile to GeoJSON

We can convert the Shapefile into GeoJSON in our terminal using a tool called ogr2ogr which comes with the GDAL library.

brew install gdal

Once it’s installed with Homebrew, we have access to the ogr2ogr command

# Command format: 
# -f "file_format"
# -t_src "crs_type"
# "path to destination data"
# "path to source_data"

ogr2ogr -f "file_format" -t_srs crs_type destination_data source_data

We need to convert MapPluto’s CRS in addition to converting the format. CRS stands for “Coordinate Reference System”, and it’s what’s going to give us the correct longitude/latitudes with our data. If we use the CRS that comes by default with MapPluto, we won’t be able to use this data in online data-viz. We need to convert it to a world standard format called CRS:84 which allows us to plot on basically any map like Open Street Map or Google Maps.

The command listed above takes 4 arguments - fill them in with the correct paths and let it run! It may take several minutes to work since MapPluto is amost 500MB in size!

# Replace the ~/Downloads paths with the paths to your source .shp file and the destination of where you want the .geojson file to go.

ogr2ogr -f "GeoJSON" -t_srs crs:84 ~/Downloads/1.geojson ~/Downloads/bk_mappluto_shapefile_bundle/BKMapPLUTO.shp 

When this finishes, you should have a massive file containing over 275,000 data points for every building in Brooklyn.

You can also visit a repository I’ve started to host all of the data I’m working with while on this project. Feel free to download the file called mappluto_brooklyn.geojson from here and stay tuned for more posts about mapping this data on the web!


Categories:
coding   dataviz


Because every coding blog needs a comments section.

Please keep comments respectful! Harassment and general arrogance will not be tolerated.