Mapping the location of building permit submissions in Watertown

Each week I try to identify data that is published on the Watertown website looking for ways to make PDF, Excel or PowerPoint documents more interesting. Today I focused on building permits.


Each week I'm trying to identify and hack on data that is published in the Watertown document center in order to try providing it in a manner that people might find more interesting than a PDF, Excel or PowerPoint presentation. Today I decided to focus effort on the building permit submissions. We’re about to do some water damage work in my daughters room so this felt like a relevant area to hack on.

The Patch blog doesn't allow embeding of dynamic maps so you'll have to either head over to my blog to view the full details or click on the attached photo to get a sense of how this works.

Hack Goal

Determine the effort involved in creating an automated process to place issued building permits from Watertown, MA on a Google map.

Data for this hack

Tools used in this hack

My Hack Results and a description

Overall I’m pretty happy with how this hack turned out. I was able to take a very boring presentation of this pretty interesting information that was locked up in a PDF and display it on a map. Here are the hoops that I had to jump through to get this working.

  1. Download the PDF from the Watertown website.
  2. Open up DeskUNPDF and use their conversion tool to identify tabular data in a PDF so it could be extracted as a comma separated value (CSV) file.
  3. Write a little Ruby script that further cleaned up the data in preparation for generating the latitude and longitude coordinates for use in the Google Geocoding API. After getting the lat/long coordinates from Google I then had to write it out as a new CSV file for Socrata to take over.
  4. Upload a new data set to Socrata and mark the lat/long fields into a location field
  5. Use Socrata to generate the Google Map view

How to improve this


  1. Automate the downloading of building permit PDFs from the Watertown website
  2. Detect when new PDFs are available for download
  3. Script the PDF to CSV conversion using the DeskUNPDF command line interface rather than the UI
  4. Automate the updating of the dataset on Socrata

Data integrity

One major sticking point that I have is that DeskUNPDF isn’t picking up the first two rows in each PDF so I need to ask them if they know what might be going on with that. Missing the first two rows isn’t a huge deal but I’d like the dataset to be accurate.


While I think seeing the individual permits on a map is better than looking at a list in a PDF, I would like to see this data in graph form with permits plotted over time. Obviously summer months are high for permits but I would be quite interested to see how weeks and months compare with previous years.


Not a ton of code here but you can find the Ruby script below. If others are interested in helping me write scrapers for this data the scripts will be updated in github.



This post is contributed by a community member. The views expressed in this blog are those of the author and do not necessarily reflect those of Patch Media Corporation. Everyone is welcome to submit a post to Patch. If you'd like to post a blog, go here to get started.

Matt MacDonald September 06, 2011 at 10:26 AM
Hi Sonny, I'm still interested in the crime/fire incident data and while I continue to press on that front I'm looking for interesting information that the town makes available and how it can be presented in more useful ways. Thanks, Matt
david brooks September 08, 2011 at 03:37 AM
Sonny, sounds like you know what its like to wet yourself
David Simpson September 20, 2011 at 05:11 PM
For Matt: http://www.buildfax.com/


More »
Got a question? Something on your mind? Talk to your community, directly.
Note Article
Just a short thought to get the word out quickly about anything in your neighborhood.
Share something with your neighbors.What's on your mind?What's on your mind?Make an announcement, speak your mind, or sell somethingPost something
See more »