overplot: Overheard in New York meets Google Maps #

overplot screenshot I've been a fan of Overheard in New York for a while. At some point, it occurred to me that each quote has a pretty precise location attached to it and that it would be cool to plot all of them on a map. I eventually got motivated enough to actually do it. The result is overplot, a Google Maps API-powered visualization of all of the quotes I could get my hands on.

Technical Details

The most basic issue with implementing this is geocoding all of the location strings (like "Canal & Broadway") to a latitude/longitude pair. When I began this, the best option seemed to be geocoder.us. While it was good enough for prototyping, it quickly became apparent that the quality of data was not good enough. The most severe problem seemed to be that addresses in the southern part of Manhattan were off by half a block to the northwest. Thankfully, a few weeks after I began working on this, Google added a geocoding component to their API. Their geocoder turned out to be much faster and reliable. It is not perfect, but since the set of addresses is pretty tightly constrained, I was able to add some rewriting rules to make the input more easily parsed. As of right now, 54% of the addresses are geocoded. One remaining issue is that some the locations are actually business (like "Saks Fifth Avenue"). I may be able to use the Google AJAX Search API to do local searches for them and get their actual addresses.

I didn't want to directly scrape the HTML of the site to extract all of the quotes. I ended up using the data stored in Google Reader's archive of the site's feed. This allowed me to get at the quotes themselves more easily, without having to worry about the chrome of the site. Going back to October 6, 2005, I had in hand 5,578 quotes at 2838 locations.

Since I had accumulated so many quotes, indicating them with the standard marker object had performance problems since so many can be visible in the same area. This is a well-known problem with the version 2 of the Maps API, and the traditional solution is clustering. However, that doesn't really make sense because when zoomed in at the street level, one has no choice but to display the hundred or so markers that are visible at the same time (the southern part of Manhattan is particularly densely covered).

In the end, my solution was to do my own marker implementation. Instead of each marker being its own overlay, I put all of them in the same overlay (see the QuotesOverlay class). Additionally, I did not split each marker into several layers (shadow, image, click area) - having the shadow be part of the image works well enough. In order to deal with clicks while in the shadow of an info window, I added a global click handler that checks if the click event falls in the boundaries of a marker (see the handleMapClick). Finally, I hide and show markers so that only those bounded by the currently viewed area are displayed. All of this led to a performance improvement by a factor of 8 on Firefox Mac.

I then ran into a user interface problem. Even with marker rendering being efficient, when zoomed out the island of Manhattan becomes a sea of red, which is not very useful. I decided that switching to a neighborhood mode made more sense there. The problem was where to get the neighborhood boundaries. The map used in cabs seemed like a good start, but it made some interesting choices (e.g. TriBeCa was not bounded by Canal St.). New York Magazine's map also turned out to be of dubious quality (e.g. it combined Gramercy and Murray Hill). In the end, the best resource turned out to be the Wikipedia's list. Even there there is some debate (an mild edit war on the souther border of Spanish Harlem - 86th or 96th street - appears to be going on). I'm not entirely satisfied with the current divisions, but they generally make sense.

Once I actually plotted the 28 neighborhoods on the map, I noticed that it was slow to redraw, especially in Firefox. I was using GPolylines to draw the neighborhood outlines, one per area. Possily due to the switch to SVG for polyline rendering, this seemed to have high overheard. In the end, similarly to my solution to the marker performance problem, I switched to a single overlay for all neighborhood outlines. Inside this overlay I used a canvas object to do my own rendering. To deal with MSIE's lack of a canvas implementation, I used the ExplorerCanvas implementation of the interface on top of MSIE's VML API.

The site loads all of the data upfront (in a 728K almost-but-not-quite JSON file). This allowed me to do a simple client-side search that works pretty quickly once all of the quotes are tokenized (done the first time a search is executed).

Update on 12/17/2012: The code, including the scraping/generation tools is now available on GitHub.

Update on 12/1/2013: I have migrated the web UI from v2 to v3 of the Google Maps API (the v2 API was deprecated a few years after overplot was released). However, none of the tools used in data generation (geocoding, neighborhood boundary drawing, etc.) have been migrated. Additionally, I did a minimal migration, and have not investigated if any of the performance hacks that I did in 2006 (e.g. a custom overlay for drawing lots of markers) are still necessary.

11 Comments

Wow! That's probably one of the best Gmaps mashup I've ever seen... And that's an awesome solution for the size problem you came up with.
Hi,

I've read your blog for a long long time and this post tipped me over the edge to make a comment - I love it, serious kudos for putting such effort in and making such a slick hack. You rock.

akaDruid
Very nice done! Yet, i did not found all overheard in NY quotes on your map. Why?
Very well done, thanks!

Feature suggestion: If there was something like "link this page" one could send cites to friends.
Very nice work! and very good explanation as well!!
BTW, is the data feed dynamic? or it's pretty much static?
NYC Red Light Cameras Google Maps
http://www.photoenforced.com/ny.html
I have never seen this site before, it’s hilarious! Thank you for making and showing us this great “Gmaps mashup”. =)
Hi Mihai,

I must say that I am very very impressed with what you have done with overplot. I'm working on my own little google maps project as an exercise to get my feet wet with javascript and I am taking some ideas from what you've done and trying to apply them in my project.

I really like the way that you combine multiple quotes into a single overlay object and I have found a way to do something similar, but I have issues with the overlay not resizing if the content grows after the overlay is initially displayed.

Could you give me any tips on how you get the overlay to resize based on the size of the quote as you click through multiple quotes? Maybe just point me to the right place in code so I can work on understanding how you have accomplished this.

Thanks,
Chris
cjmartin[at]gmail[dot]com

P.S. my project is located at:
http://idt.gatech.edu/~cmartin9/geocash
if you would like to check it out.
Chris, you can look at QuoteSet.prototype.displayQuote in http://persistent.info/overplot/classes.js to see how I handle refreshing of the info window. Basically toggling the display of the contents and calling openInfoWindow makes it refresh.
Very nice mashup. I think that's the best one I've seen. Killer work!!
This is very interesting indeed. With your permission, I wanted to implement the neighborhood part into my map. But keep some parts in particular my info window. I just commented out the parts about the points from your code but I am having two issues. I managed to get the neighborhoods to work but I am having two problems.
1- I can not click on my markers anymore.
2-i can not get the neighborhood overlay to clear for zooms higher than 15.
Is there any suggestion so I can just get the neighborhood map?

thank you in advance for any help you can provide.
http://pizzainny.com/pizza_delivery_map.php

Post a Comment