JavaScript: The Good Parts

July 18, 2010

I’ve been a huge fan of Douglas Crockford and his articles about JavaScript for a long time. I often point people to his The World’s Most Misunderstood Programming Language article when I hear them complaining about the language. It’s taken me a couple of years to get around to reading his book, JavaScript: The Good Parts, but it exceeded all of my expectations.

Being an avid reader of his online stuff, and having watched various talks of his I knew the book would be well written and informative, but I thought it’d probably just repeat much of what I’d already read, without providing much new information or insights. To a certain extent this was true, the book does reiterate what’s said in many of his online articles and talks, but the book is absolutely amazing for a different reason: The code examples.

The book is extremely succinct. There’s no padding, and very little dialog to join one section to the next. There is a common thread thoughout the book though, and that’s the code. Examples of good coding practices are repeated, and functions written in earlier chapters are often reused in later ones. So although the book appears to be presenting one feature at a time with a small code example you’re actually building up more and more complex javascript applications, and the book ends with a full JSON parser!

Crockford’s coding style seems to match his writing style: succinct and to the point. There’s some really great code in the book, that I think even non-JavaScript developers could appreciate. Here’s my favourite, from the chapter on regular expressions:

var parse_url = /^(?:([A-Za-z]+):)?(\/{0,3})([0-9.\-A-Za-z]+)(?::(\d+))?(?:\/([^?#]*))?(?:\?([^#]*))?(?:#(.*))?$/;
var url = 'http://www.ora.com:80/goodparts?q#fragment';
 
var result = parse_url.exec(url);
var names = ['url', 'scheme', 'slash', 'host', 'port', 'path', 'query', 'hash'];
var blanks = '       ';
var i;
for (i = 0; i < names.length; i += 1) {
    document.writeln(names[i] + ':' + blanks.substring(names[i].length), result[i]);
}

Which outputs:

url:    http://www.ora.com:80/goodparts?q#fragment
scheme: http
slash:  //
host:   www.ora.com
port:   80
path:   goodparts
query:  q
hash:   fragment

I also love the explanations Crockford gives as to why he avoids using certain parts of JavaScript, and limits himself to a “good” subset of the language. He doesn’t simply say “This is bad and you shouldn’t use it”, it’s usually accompanied by a story about a time when Crockford had used that feature and got caught out, which is much more compelling.

I think it’s fair to say that JavaScript: The Good Parts is one of my favourite programming books: It’s succinct, packed full of great code examples and best practices for writing maintainable and bug free code. It’s a fantastic JavaScript reference, but much of what the book talks about is relevant to programming in general. Highly recommended!

How we built a startup in 54 hours

June 13, 2010

Last weekend I attended the London Startup Weekend, a 54 hour event hosted at the IBM building on London’s Southbank. It was a fantastic event. I met loads of great people, had a lot of fun, and successfully launched a new website! Arianna, Pedro and Debbie have already written up great summaries of the event, so instead I’ll be focusing on how our team managed to build and launch our project, automatic event management for twitter.

After forming a group most of Friday evening and Saturday morning was spent discussing ideas around the original pitch of twitter calendar integration. We discussed a whole range of ideas including an event broker service, calendar availability widgets, and a twitter/google calendar mashup. It took until lunch time on Saturday to finalise our idea. We’d settled on a a service that would automatically work out the date of an event mentioned in a tweet and keep track of these events. There were now only 36 hours left!

I got straight to work on the code, using Django. I knew we’d need to pull in tweets and then analyse the dates. Using the python-twitter library I wrote a management command to pull in all tweets containing some specific hashtags every minute.

The next stage was to work out a date from the tweets. A Stack Overflow question suggested two options, parsedatetime library, and some pyparsing example code. I tried the parsedatetime library first, and it gave some fairly good results right away. It wasn’t so good at more complicated dates though. I tried the pyparsing example, but unless given just the date string (eg. “Next week” instead of “See you next week”) failed to work out a date at all. I did briefly investigate using the NLTK to extract the date from a tweet, but worried about running out of time I gave up on investigating further and stuck with parsedatetime.

While I’d been busy programming Guillume had come up with a great name for our service: tweevents. He’d then set about registering the domain name, setting up a twitter acccount and facebook fan page. Once we had the name sorted I setup an online logo competition offering $25 to the winner, and an hour later we had our logo.

Gabriel worked hard overnight to produce some HTML/CSS for the site, so on Sunday morning I worked on integrating that into the Django project. We worked on adding features to the website, such as links to add events to your google calendar, hCalendar markup, and the ability to filter events by twitter username, and the rest of the day was spent putting a presentation together.

So tweevents is now up and running. Within 54 hours we’d gone from a rough idea into a working website. The presentation we gave gives some details about what’s next from the business side, but there’s also some things I’d like to get done on the development side. Most importantly improved date parsing. I plan to give the python-dateutil library a try, and failing that go back to look into NLTK in more detail. Guillume’s also working hard on improving the library we’re already using. Should none of the Python options work I’ve also come across some great date parsing libraries for other languges, such as Chronic for Ruby and Datejs for Javascript. A Rails or Node.js rewrite might be on the cards! There are also features that we could add, such as Facebook event creation, or filtering the list of events to just your twitter followers, or people you follow.

A huge thanks to Damien, James and Franck who organised and helped out at the event, and made it all so much fun. Of course, tweevents wouldn’t be much without the rest of the team. Thanks also to my girlfriend, who was left on her own with our 2 month old daughter for the entire weekend!

FireEagle OAuth and Python2.5 Woes

May 18, 2010

Back in February I started work on integrating Yahoo’s FireEagle location service into Geomium and I ran into a problems with Python 2.5. Using Steve Marshall’s Python library the included test.py script was working perfectly with Python2.6, but when running with Python2.5 I’d get back an “Invalid OAuth signature error”.

I posted the problem to the OAuth user group but didn’t get any response. I got in touch with Yahoo. After quite a bit of back and forth we finally figured out the problem, which I’m posting here to try and save others from months of frustration!

The Yahoo guys noticed that with Python2.5 the HTTP host header was being sent through as as “fireeagle.yahooapis.com:443″, whereas 2.6 sends “fireeagle.yahooapis.com”. The inclusion of the port results in an invalid OAuth signature, because the signature is generated assuming the port isn’t included. I dug into the Python2.5 httplib code and came across this:

 813    if self.port == HTTP_PORT:
 814        self.putheader('Host', host_enc)
 815    else:
 816        self.putheader('Host', "%s:%s" % (host_enc, self.port))

In Python 2.6 the comparison on line 813 is done with self.default_port instead of HTTP_PORT, which prevents the port from being added with HTTPS requests. I noticed that later on in the code that if you pass in your own host header it prevents one being created for you:

 875     def _send_request(self, method, url, body, headers):
 876         # honour explicitly requested Host: and Accept-Encoding headers
 877         header_names = dict.fromkeys([k.lower() for k in headers])
 878         skips = {}
 879         if 'host' in header_names:
 880             skips['skip_host'] = 1

So the fix turns out to be really simple – explicitly set the http header. That’s exactly what I’ve done in my fork of the fireeagle library (see the fix). I’ve also sent a push request, so hopefully this fix will make it back into the original library. Thanks to Arnab Nandi and Anand S from Yahoo for helping to debug things their end.

PyWebShot – Generate website thumbnails using Python

April 11, 2010

There have been lots of links to automatic website thumbnail generators on sites like reddit and hacker news today, including webkit2png and CutyCapt. Well it just so happens that a few weeks ago I wrote my own website thumbnail generator, and today I got around to putting it on GitHub.

The code is based on Matt Biddulph’s screenshot-tng script, but heavily modified to be more user friendly and provide more options. It uses embedded mozilla for rendering, and therefore requires the python-gtkmozembed package.

You can specify a resolution to take the screenshot at, and also a resolution for the thumbnail. When generating the thumbnail the aspect ratio will be preserved. You can also specify a delay, so that the screenshot is only taken so many seconds after loading the page. Here’s an example of running PyWebShot with 3 URLs, and the resulting images:

$ ./pywebshot.py -t 500x250 http://www.coderholic.com http://geomium.com/update/598/ http://jobs.plasis.co.uk
Loading http://www.coderholic.com... saved as www.coderholic.com.png
Loading http://geomium.com/update/598/... saved as geomium.com.update.598..png
Loading http://jobs.plasis.co.uk... saved as jobs.plasis.co.uk.png

It you have a huge list of URLs you’d like to generate screenshots for you can put them all into a file and generate images for them all with the following command:

$ cat urls.txt | xargs ./pywebshot.py

For more details and the source code see the PyWebShot project page on GitHub.

Clojure: 12 New Programming Languages Update 1

March 20, 2010

At the start of the year I announced that I was setting myself the challenge of learning 12 new programming languages during 2010. That works out at a language a month, so seeing as it’s almost the end of March you might expect me to be wrapping up my third language. Unfortunately that isn’t the case. I’m just about to move on to my second. I’m still optimistic I can achieve the target of 12 new languages this year though, so expect future updates to be more regular.

Getting Starting with Clojure

Getting up and running with Clojure was made easy to to the wealth of documentation. There’s a great getting started guide, and a guide specifically for Clojure on Ubuntu. In terms of programming environment there’s a round up of Clojure IDEs. I stuck to Vim, but I didn’t take it as far as this guide, which describes turning Vim into a fairly comprehensive Clojure IDE.

Clojure has a REPL, which I always find makes learning a new language easier. When you want to find something out just type it in a see what the result is! The default REPL doesn’t support arrow navigation or pressing up to run previous commands though, so it can be a little frustrating. There are guides on enhacing the REPL with this functionality.

One of the first things I did was put together the following shell script which either runs the specified Clojure script, or gives you a REPL if no script was specified.

#!/bin/sh
CLOJURE_JAR="/opt/clojure-1.1.0/clojure.jar"
if [ -z "$1" ]; then 
        java -jar "${CLOJURE_JAR}" 
else
        java -jar "${CLOJURE_JAR}" "$@"
fi

Writing Code

I decided to write a port scanner in Clojure, which would introduce me to command line argument handling, the network API, and parallelization.

One of the great things about Clojure is that despite being such a new language there is so much example code available on the web. I was able to find a Clojure network scanner by Travis Whitton which detailed all of the network related code I’d need to my port scanner. Travis uses Clojure’s agents for parallelization, which is something else I borrowed from his script. I was amazed at how easy it was to parallelize the lookups. So much simpler than threading. If there is only one thing I take away from Clojure it’ll be its interesting approaches to parallelization.

So without further ado here is my Clojure port scanner:

(import '(java.io IOException)
        '(java.net Socket)
        '(java.net InetSocketAddress)
        '(java.net SocketTimeoutException)
        '(java.net UnknownHostException))
 
(if (== (count *command-line-args*) 1)
  (def hostname (first *command-line-args*))
  (
    (println "Usage: scanner &lt;hostname&gt;")
        (System/exit 1)
  ))
 
(defn port-open? [hostname port timeout]
  (let [sock-addr (InetSocketAddress. hostname port)]
    (try
     (with-open [sock (Socket.)]
       (. sock connect sock-addr timeout)
       port)
     (catch IOException e false)
     (catch SocketTimeoutException e false)
     (catch UnknownHostException e false))))
     	<li>
(defn host-port-open? [port]
  (port-open? hostname port 5000))
 
(def port-list (range 1 1024))
 
(def agents (for [port port-list] (agent port)))
 
(println (str "Scanning " hostname "..."))
 
(doseq [agent agents]
  (send-off agent host-port-open?))
 
(apply await agents)
 
(doseq [port (filter deref agents)]
       (println (str @port " is open")))
 
(shutdown-agents)

I’m sure it is far from an idiomatic solution, so any suggestions for improvement are welcome. Running the scanner with my bash script gives the following output:

$ ./clj.sh scanner
Usage: scanner <hostname>

$ ./clj.sh scanner github.com
Scanning github.com...
22 is open
80 is open
443 is open

What’s next?

I’ve barely scratched the surface of Clojure, but I’ve certainly become more aware of some of the concepts and idioms used by the language, which is what I was hoping for. I’ll be looking for more projects in the future where I can make use of it. For now though I need to move on to another language as part of my challenge. As I’m running behind I plan to go with one that I don’t think will be too unfamiliar, either Go or Fantom. I’ll keep you posted!

Older Posts »