Visualising house price data with IPython

Data downloaded from the Land Registry.

I’ve had some interest in house prices recently, and there is so much data available. I’ve also only recently discovered the wonders of IPython notebooks, and of handling data with the Python library pandas. So it seems like a good opportunity to learn how to use these tools and how to deal with geographical data (I have a sneaking suspicion this will be trickier than I suspect).

Doing this I want to learn:

  • Using IPython notebooks
  • Data processing with pandas
  • Plotting geographical data
  • Something about house prices, maybe

Tools to use:

So much data

There is a whole ton of government data for the UK available fromhttp://data.gov.uk/, including loads relating to house prices. There is a handy pagefrom the Land Registry which lets you filter their price paid data and download a subsection. For this plotting exercise I’ve downloaded all the sold data from Cambridge from 2014, and saved it as a csv file in the res directory. I’ve also manually added some column headings to the csv so that pandas knows which column is what when the file is loaded.

First off, I’ll load the data using pandas read_csv function, and tell it to parse the dates (which are in day first format) and use them as the index to the data.


House price data table

Cleaning up

The full data table contains some columns I’m not interested in here, e.g. I specified the search to only include Cambridge, so the city column is redundant. Let’s trim down the table a little bit and drop some columns using pandas.

Also, I know there will be some NaN values in the Postcode column which will cause issues later, so lets also drop any rows which have nans in this column.

Sanity check

Lets also make sure the data looks sensible, and practice processing it a bit with Pandas. We can make a few summary stats and plots:


House price histogram

House price plot

House price box plot

House price sales graph

Getting location data

Currently I have a list of postcodes, and need to turn these into geographical data. For this there’s a handy api at Postcodes.io and I’ll call it with the Python Requests library.

First off, I needed to learn how to use the api (and Requests). But this being Python, it turns out to be pleasingly simple!

Practice over

Cool! What I’m really interested in here are the latitude and longitude that corresponds to the postcode centre. It’s easy to pull these out into a list with some comprhension (also making sure to handle any values that couldn’t be successfully looked up).

The real thing

Now that I know how to get the lat/lon data from postcodes, I’ll do the same for the entire dataset.

Maps 🙁

Now I have some geographical price data itching to be plotted on a map. This turned out to be trickier than I expected.

First we need a map to plot the data on, Stamen Mapstack lets you export really nicely stylised maps (based on osm data), simple enough to be nice and legible, but containing all the street data needed to make sense of the house locations.


Cambridge stylised map

Latitude, longitude, pixels

Now we have a map in pixels, and some data with latitude and longitude co-ordinates. If I want to plot one on top of the other I need to make these match up nicely. I’m sure there are some tools that let you plot lat/lon data directly on a map, but I want to use the pretty map (as an image) from Stamen. I’m going to use Basemap to plot the points using a Mercator projection that should match the map, but I still need to convert from lat/lon to the pixel units of the map image.

Where’s the box?

I made the map image by manually panning and zooming to what looked like an appropriate location. The output of Stamen Mapstack gives the centre latitude and longitude, and the zoom level. Finding this useful answer on Stackoverflow (after conversion to Python) lets me convert from these paramters to the bounding lat/lon box of the map.

Plot plot!

Now I can convert to pixel units, and try plotting the postcode data!


House sale map cambridge

A real plot

So plotting location data, with house price encoded by colour, seems like a reasonable thing to do. I’ll also add a bit of dithering, to separate the overlapping points, and I guess it reflects the uncertainty in the postcode based locations.

House price map cambridge

Done!

It worked, and was pretty easy thanks to many useful tools that cleverer people have written and explained how to use.

Now to try and make something informative…

See/download the original IPython notebook.

FacebookTwitterGoogle+PinterestTumblrRedditPocketShare


Leave a Reply

Your email address will not be published. Required fields are marked *