This post was contributed by a community member. The views expressed here are the author's own.

Health & Fitness

Mapping the location of building permit submissions in Watertown

Each week I try to identify data that is published on the Watertown website looking for ways to make PDF, Excel or PowerPoint documents more interesting. Today I focused on building permits.

Hi,

Each week I'm trying to identify and hack on data that is published in the Watertown document center in order to try providing it in a manner that people might find more interesting than a PDF, Excel or PowerPoint presentation. Today I decided to focus effort on the building permit submissions. We’re about to do some water damage work in my daughters room so this felt like a relevant area to hack on.

The Patch blog doesn't allow embeding of dynamic maps so you'll have to either head over to my blog to view the full details or click on the attached photo to get a sense of how this works.

Find out what's happening in Watertownwith free, real-time updates from Patch.

Hack Goal

Determine the effort involved in creating an automated process to place issued building permits from Watertown, MA on a Google map.

Data for this hack

Tools used in this hack

My Hack Results and a description

Overall I’m pretty happy with how this hack turned out. I was able to take a very boring presentation of this pretty interesting information that was locked up in a PDF and display it on a map. Here are the hoops that I had to jump through to get this working.

Find out what's happening in Watertownwith free, real-time updates from Patch.

  1. Download the PDF from the Watertown website.
  2. Open up DeskUNPDF and use their conversion tool to identify tabular data in a PDF so it could be extracted as a comma separated value (CSV) file.
  3. Write a little Ruby script that further cleaned up the data in preparation for generating the latitude and longitude coordinates for use in the Google Geocoding API. After getting the lat/long coordinates from Google I then had to write it out as a new CSV file for Socrata to take over.
  4. Upload a new data set to Socrata and mark the lat/long fields into a location field
  5. Use Socrata to generate the Google Map view

How to improve this

Automation

  1. Automate the downloading of building permit PDFs from the Watertown website
  2. Detect when new PDFs are available for download
  3. Script the PDF to CSV conversion using the DeskUNPDF command line interface rather than the UI
  4. Automate the updating of the dataset on Socrata

Data integrity

One major sticking point that I have is that DeskUNPDF isn’t picking up the first two rows in each PDF so I need to ask them if they know what might be going on with that. Missing the first two rows isn’t a huge deal but I’d like the dataset to be accurate.

Trends

While I think seeing the individual permits on a map is better than looking at a list in a PDF, I would like to see this data in graph form with permits plotted over time. Obviously summer months are high for permits but I would be quite interested to see how weeks and months compare with previous years.

Code

Not a ton of code here but you can find the Ruby script below. If others are interested in helping me write scrapers for this data the scripts will be updated in github.

Thanks,

Matt

We’ve removed the ability to reply as we work to make improvements. Learn more here

The views expressed in this post are the author's own. Want to post on Patch?