How we built a startup in 54 hours

June 13, 2010

Last weekend I attended the London Startup Weekend, a 54 hour event hosted at the IBM building on London’s Southbank. It was a fantastic event. I met loads of great people, had a lot of fun, and successfully launched a new website! Arianna, Pedro and Debbie have already written up great summaries of the event, so instead I’ll be focusing on how our team managed to build and launch our project, automatic event management for twitter.

After forming a group most of Friday evening and Saturday morning was spent discussing ideas around the original pitch of twitter calendar integration. We discussed a whole range of ideas including an event broker service, calendar availability widgets, and a twitter/google calendar mashup. It took until lunch time on Saturday to finalise our idea. We’d settled on a a service that would automatically work out the date of an event mentioned in a tweet and keep track of these events. There were now only 36 hours left!

I got straight to work on the code, using Django. I knew we’d need to pull in tweets and then analyse the dates. Using the python-twitter library I wrote a management command to pull in all tweets containing some specific hashtags every minute.

The next stage was to work out a date from the tweets. A Stack Overflow question suggested two options, parsedatetime library, and some pyparsing example code. I tried the parsedatetime library first, and it gave some fairly good results right away. It wasn’t so good at more complicated dates though. I tried the pyparsing example, but unless given just the date string (eg. “Next week” instead of “See you next week”) failed to work out a date at all. I did briefly investigate using the NLTK to extract the date from a tweet, but worried about running out of time I gave up on investigating further and stuck with parsedatetime.

While I’d been busy programming Guillume had come up with a great name for our service: tweevents. He’d then set about registering the domain name, setting up a twitter acccount and facebook fan page. Once we had the name sorted I setup an online logo competition offering $25 to the winner, and an hour later we had our logo.

Gabriel worked hard overnight to produce some HTML/CSS for the site, so on Sunday morning I worked on integrating that into the Django project. We worked on adding features to the website, such as links to add events to your google calendar, hCalendar markup, and the ability to filter events by twitter username, and the rest of the day was spent putting a presentation together.

So tweevents is now up and running. Within 54 hours we’d gone from a rough idea into a working website. The presentation we gave gives some details about what’s next from the business side, but there’s also some things I’d like to get done on the development side. Most importantly improved date parsing. I plan to give the python-dateutil library a try, and failing that go back to look into NLTK in more detail. Guillume’s also working hard on improving the library we’re already using. Should none of the Python options work I’ve also come across some great date parsing libraries for other languges, such as Chronic for Ruby and Datejs for Javascript. A Rails or Node.js rewrite might be on the cards! There are also features that we could add, such as Facebook event creation, or filtering the list of events to just your twitter followers, or people you follow.

A huge thanks to Damien, James and Franck who organised and helped out at the event, and made it all so much fun. Of course, tweevents wouldn’t be much without the rest of the team. Thanks also to my girlfriend, who was left on her own with our 2 month old daughter for the entire weekend!

FireEagle OAuth and Python2.5 Woes

May 18, 2010

Back in February I started work on integrating Yahoo’s FireEagle location service into Geomium and I ran into a problems with Python 2.5. Using Steve Marshall’s Python library the included test.py script was working perfectly with Python2.6, but when running with Python2.5 I’d get back an “Invalid OAuth signature error”.

I posted the problem to the OAuth user group but didn’t get any response. I got in touch with Yahoo. After quite a bit of back and forth we finally figured out the problem, which I’m posting here to try and save others from months of frustration!

The Yahoo guys noticed that with Python2.5 the HTTP host header was being sent through as as “fireeagle.yahooapis.com:443″, whereas 2.6 sends “fireeagle.yahooapis.com”. The inclusion of the port results in an invalid OAuth signature, because the signature is generated assuming the port isn’t included. I dug into the Python2.5 httplib code and came across this:

 813    if self.port == HTTP_PORT:
 814        self.putheader('Host', host_enc)
 815    else:
 816        self.putheader('Host', "%s:%s" % (host_enc, self.port))

In Python 2.6 the comparison on line 813 is done with self.default_port instead of HTTP_PORT, which prevents the port from being added with HTTPS requests. I noticed that later on in the code that if you pass in your own host header it prevents one being created for you:

 875     def _send_request(self, method, url, body, headers):
 876         # honour explicitly requested Host: and Accept-Encoding headers
 877         header_names = dict.fromkeys([k.lower() for k in headers])
 878         skips = {}
 879         if 'host' in header_names:
 880             skips['skip_host'] = 1

So the fix turns out to be really simple – explicitly set the http header. That’s exactly what I’ve done in my fork of the fireeagle library (see the fix). I’ve also sent a push request, so hopefully this fix will make it back into the original library. Thanks to Arnab Nandi and Anand S from Yahoo for helping to debug things their end.

PyWebShot – Generate website thumbnails using Python

April 11, 2010

There have been lots of links to automatic website thumbnail generators on sites like reddit and hacker news today, including webkit2png and CutyCapt. Well it just so happens that a few weeks ago I wrote my own website thumbnail generator, and today I got around to putting it on GitHub.

The code is based on Matt Biddulph’s screenshot-tng script, but heavily modified to be more user friendly and provide more options. It uses embedded mozilla for rendering, and therefore requires the python-gtkmozembed package.

You can specify a resolution to take the screenshot at, and also a resolution for the thumbnail. When generating the thumbnail the aspect ratio will be preserved. You can also specify a delay, so that the screenshot is only taken so many seconds after loading the page. Here’s an example of running PyWebShot with 3 URLs, and the resulting images:

$ ./pywebshot.py -t 500x250 http://www.coderholic.com http://geomium.com/update/598/ http://jobs.plasis.co.uk
Loading http://www.coderholic.com... saved as www.coderholic.com.png
Loading http://geomium.com/update/598/... saved as geomium.com.update.598..png
Loading http://jobs.plasis.co.uk... saved as jobs.plasis.co.uk.png

It you have a huge list of URLs you’d like to generate screenshots for you can put them all into a file and generate images for them all with the following command:

$ cat urls.txt | xargs ./pywebshot.py

For more details and the source code see the PyWebShot project page on GitHub.

8 Reasons Why You Should Try Django

January 6, 2010

django

I have been using Python for quite a few years, but mostly for writing one off sysadmin scripts, command line utilities, and of course PyRadio. Most of my web development work has been done with PHP. The language gets a lot of bad press, some deserved and some not so much. I’ve had my own gripes, but all-in-all I’ve been fairly happy with PHP.

Several months ago, though, I thought I’d give Django a try, a python based web framework. I was completely blown away!  Compared to the PHP frameworks I’d worked with, such as Cake, it was just so much more of a pleasure to work with. So here are 8 reasons why you should give Django a try yourself if you haven’t already. You won’t be disappointed!

1. Great Documentation

The Django documentation is well written, extremely comprehensive, and up to date. The official documentation contains details API references, loads of relevant examples, and tutorials for those getting started. If that isn’t enough there’s also a whole book that’s available for free online.

2. It’s Python

The fact that it’s Python is a huge plus point for me. It’s a great language that doesn’t suffer from many of the well documented inconsistencies that PHP does, and includes some nice features such as decorators and first-class functions. Going back to PHP you soon start to miss the little things, such as the ability to assign multiple values at once, and the simplicity of slicing lists.

3. The ORM

Django’s object relational mapper completely abstracts away the database, meaning you don’t need to worry about your database schema or constructing SQL queries. If you’re using to writing SQL queries then the QuerySet API takes a little getting used to, but it’s really worth the effort. Projects like South make the ORM even more powerful, allowing you to make schema changes and data migrations automatically.

4. Built in Development Server

Where PHP really shines is on its ease of deployment. Setting up a local development server can be a bit of a pain though, especially if you’re working on several different sites. Django comes with a built in development server though, so you can be up and running within minutes! From your project’s root directory you just do

./manage.py runserver

and access your django website from http://localhost:8000 – awesome!

5. The Admin Interface

Django’s built-in admin interface is practically a full blown CMS, allowing you to add, delete or update your data. It’s pretty much all automatic, but it’s also fairly configuration. See the documentation to see what it can do!

6. Reusable Applications

Django projects are broken up into “applications”, and there are lots of existing reusable applications that you can use for your own projects, such as those for user registration, facebook integration, blogging, and many many more.

Existing applications are great, but the whole project/application distinction also forces you to think about your own project structure and therefore more likely to make reusable components that you can use in more of your own projects, or even share for others to use.

7. Templates

I’ve always been a bit dubious about the merits of PHP template engines such as Smarty. The Django template layer is great though. The inheritance model works well, and the restrictive language really forces you to have a very clean separation of presentation and logic.

8. Forms

I usually find dealing with user input one of the most boring parts of web development. It takes time to get it right, and its repetitive. The Django Form API really simplifies things. You can define your form class, include and validation rules, and simply add a few lines to your template and few lines to your view and you’re done!


So those are my 8 reasons why you should give django a go. If you’re already a django user let me know if you have any points to add!

Parsing CSV data in Python

September 3, 2009

Python provides the csv module for parsing comma separated value files. It allows you to iterate over each line in a csv file and gives you a list of items on that row. For example, given the following csv data:

id, name, date
0, name, 2009-01-01
1, another name, 2009-02-01

You’d end up with something like:

["id", "name", "date"],
["0", "name", "2009-01-01"],
["1", "another name", "2009-02-01"]

In some situations it is nice to have a dictionary of keys and values though, so that instead of a simple list of columns we end up with:

{"id": "0", "name": "name", "date": "2009-01-01"},
{"id": "1", "name": "another name", "date": "2009-02-01"}

This would allow us to refer to fields by name rather than position in the list. Do you really want to remember that date is in position 2? And what happens if the input data changes, and a new column is added between name and date? If we’re referring to columns by position then we’ll have to change our existing code, but by referring to it by name we won’t have to change anything.

It turns out this is pretty easy to achieve, in only a few lines of python:

import csv
data = csv.reader(open('data.csv'))
# Read the column names from the first line of the file
fields = data.next()
for row in data:
        # Zip together the field names and values
	items = zip(fields, row)
	item = {}
        # Add the value to our dictionary
	for (name, value) in items:
		item[name] = value.strip()

The csv module allows you to specify a delimiter, so if your data separated you just need to make a single change:

data = csv.reader(open('data.tsv'), delimiter='\t')

Update

Thanks to several people for mentioning csv.DictReader, which does exactly what I’ve mentioned here. Having a look at the code it does something very similar, but also takes into account rows of different length, ignores empty columns, and uses the method Tim mentioned in the comments for creating the dictionary:

    # From csv.py
    def next(self):
        if self.line_num == 0:
            # Used only for its side effect.
            self.fieldnames
        row = self.reader.next()
        self.line_num = self.reader.line_num

        # unlike the basic reader, we prefer not to return blanks,
        # because we will typically wind up with a dict full of None
        # values
        while row == []:
            row = self.reader.next()
        d = dict(zip(self.fieldnames, row))
        lf = len(self.fieldnames)
        lr = len(row)
        if lf < lr:
            d[self.restkey] = row[lf:]
        elif lf > lr:
            for key in self.fieldnames[lr:]:
                d[key] = self.restval
        return d
Older Posts »