Strangeloop 2011 Day 2

I’m headed back home from Strangeloop 2011 this morning. Once again I booked an early flight so was up at 4:45 to get to the airport (when will I learn?) The conference was a smashing success as far as I am concerned. It was extremely well run and the talks were full of solid content. I didn’t see nearly as much marketing during the conference as I’ve seen at other conferences which was really nice. Most of the marketing I did see was companies trying to recruit new developers. There seems to be a lot of demand out there right now for innovative thinkers and people who are eager to stay on the cutting edge. Makes me think…

I started the day with a talk by Jake Luciani called “Hadoop and Cassandra”. Basically this was an introduction to a tool called Brisk which helps take some of the pain out of bringing up Hadoop clusters and running MapReduce jobs. In essence it embeds the components of Hadoop inside Cassandra and makes it easy to deploy and easy to scale with no downtime. It replaces HDFS with CassandraFS which in an of itself looks really interesting. It’s turning the Cassandra DB into a distributed file system. Very interesting how they are doing that. Sounds like a topic for another post once I’ve had some time to read some more about it. Jake showed a demo that looked quite impressive as he brought up a four cluster Hadoop on Cassandra node and ran a portfolio manager application splitting it into an OLTP side and an OLAP side. Brisk definitely deserves further investigation.

Continue reading

Getting data from the Infochimps Geo API in R

I am very intrigued by the Infochimps Geo API, so wanted to play around with it a little bit and pull the data into R. I’ll start by getting data from the American Community Survey Topline API for a 10km area around where I live.

First some setup code here. It imports a couple libraries that we’ll need (RJSONIO and ggplot2) then sets up some variables that we’ll use later to construct the REST call into Infochimps Geo.

library(RJSONIO)
library(ggplot2)

api.uri <- "http://api.infochimps.com/"
acs.topline <- "social/demographics/us_census/topline/search?"
api.key <- "apikey=xxxxxxxxxx"

radius <- 10000  # in meters
lat <- 44.768202
long <- -91.491603

columns <- c("geography_name","median_household_income",
"median_housing_value", "avg_household_size") 

Note: if you want to use this code you’ll need to remove the x’s in the api.key and replace it with your Infochimps API key.

Continue reading