PyWebShot – Generate website thumbnails using Python

April 11, 2010

There have been lots of links to automatic website thumbnail generators on sites like reddit and hacker news today, including webkit2png and CutyCapt. Well it just so happens that a few weeks ago I wrote my own website thumbnail generator, and today I got around to putting it on GitHub.

The code is based on Matt Biddulph’s screenshot-tng script, but heavily modified to be more user friendly and provide more options. It uses embedded mozilla for rendering, and therefore requires the python-gtkmozembed package.

You can specify a resolution to take the screenshot at, and also a resolution for the thumbnail. When generating the thumbnail the aspect ratio will be preserved. You can also specify a delay, so that the screenshot is only taken so many seconds after loading the page. Here’s an example of running PyWebShot with 3 URLs, and the resulting images:

$ ./pywebshot.py -t 500x250 http://www.coderholic.com http://geomium.com/update/598/ http://jobs.plasis.co.uk
Loading http://www.coderholic.com... saved as www.coderholic.com.png
Loading http://geomium.com/update/598/... saved as geomium.com.update.598..png
Loading http://jobs.plasis.co.uk... saved as jobs.plasis.co.uk.png

It you have a huge list of URLs you’d like to generate screenshots for you can put them all into a file and generate images for them all with the following command:

$ cat urls.txt | xargs ./pywebshot.py

For more details and the source code see the PyWebShot project page on GitHub.

Server Monitoring with Munin

October 21, 2009

Munin is an excellent open source tool for monitoring and graphing server performance metrics. It can be configured to send out alert emails when something goes wrong with your server, and the graphs make it easy to view trends over time: You could see that your site gets much less traffic on a Sunday, for example, or that the number of database queries performed per day has doubled in the last 2 months.

On a Debian-based system installing Munin is as simple as running the following command, and then going to http://your-server/munin/ in your browser:

sudo aptitude install munin

Munin comes with lots of monitoring plugins by default, including those for MySQL, PostgresSQL, Apache, Tomcat, Squid, and for things such a CPU and memory usage, load average, network traffic, and many more. You can also find lots of user submitted plugins on sites like Munin Exchange.

Munin doesn’t have to be used solely for monitoring server performance though. Being so easy to extend Munin is also a great tool for tracking non-server performance related trends over time. In just a few lines of code you could write plugins to track the following stats about your website:

  • Number of User signups
  • Google PageRank
  • Pages in Google’s index
  • Number of backlinks
  • Number of twitter mentions
  • Alexa traffic rank

The number of pages in Google’s index is actually a plugin I’ve written. Simple put the following code in your /etc/munin/plugins directory to see it in action:

#!/bin/sh
# Munin Plugin to display the number of pages in the
# google index for all of the given websites
# Ben Dowling - www.coderholic.com

# Change this to whatever sites you're interested in
websites="www.yahoo.com www.google.com www.twitter.com"

if [ "$1" = "autoconf" ]; then
        echo yes
        exit 0
fi

if [ "$1" = "config" ]; then

        echo 'graph_title Number of Pages in Google Index'
        echo 'graph_args --base 1000 -l 0 '
        echo 'graph_vlabel number of pages'
        echo 'graph_category google'
        echo 'graph_info This graph shows the number of pages in the Google index for a given website.'

        i=0
        for site in $websites
        do
                name="site_${i}"
                echo "${name}.label ${site}"
                echo "${name}.draw LINE2"
                echo "${name}.info The number of pages in the google index."
                i=$((i+1))
        done
        exit 0
fi

i=0
for site in $websites
do
        name="site_${i}"
		value=$(wget -q --user-agent=Firefox -O - "http://www.google.com/search?q=site:${site}" | grep -E "of about [0-9,]+" -o | grep -E "[0-9,]+" -o | sed "s/,//g")
        echo "${name}.value ${value}"

        i=$((i+1))
done

For more details about Munin see their homepage, which also includes detailed documentation on writing your own plugins.

Let me know if you can think of any more Munin plugins that could be interesting, or if you’ve used any yourself!

linewatch – an alternative to linux’s watch

July 18, 2009

I often use the linux watch command to monitor the status of certain commands. When I’m copying lots of files say, I’d watch the files in the target directory to see what files have already been copied across with the following command:

watch ls -l

The watch program clears the screen and displays the output of “ls -l” every 2 seconds.

Sometimes I’ll want to monitor a command that only outputs a single line. If I wanted to see the total number of files in a directory rather than the files themselves I could use the command “ls -l | wc -l”. The fact that watch clears the whole screen can be a little annoying here though, because the command is only outputting a single line. That is why I came up with the following small bash script, linewatch.

Linewatch repeatedly calls any arguments passed to it every 2 seconds (in the same way watch does), but only clears a single line rather than the whole screen. Here is the code:

#!/bin/bash
clearline="\b\033[2K\r"
command=$@

while true
do
    eval "$command"
    sleep 2
    echo -n -e "$clearline"
done

And here is an example of how to call it:

$ ./linewatch "ls -l | wc -l"
24

The number of files in the current directory (24 in the example) will keep update every 2 seconds. Just hit Ctrl-C when you want to quit,

SVN Change Monitoring Script

July 2, 2009

I came up with the following shell script recently to monitor code changes in a subversion repository. On the first run it will emails out the 10 most recent changes. After that the script mails out all changes since the last time it was run. You can set it up to run as a daily cron job which mails you all changes made to you favourite open source project!

It wouldn’t take much to get it working with other version control systems such as Git or Bazaar, or to do some nice formatting of the output instead of outputting the raw svn log as-is. Let me know if you find it useful!

#!/bin/bash
# Shell script to email the latest changes in an SVN
# repsitory to a specified email address.
# Ben Dowling - wwww.coderholic.com

svnUrl="http://anonsvn.wireshark.org/wireshark/trunk/"
lastRevisionFile="./.last-revision"
mailto="ben@coderholic.com"

function getCurrentRevision {
  # Get the current SVN revision, eg. "r4670"
  currentRevision=$(svn log "$svnUrl" -r HEAD 2>/dev/null | head -n2 | grep -v -- "-------" | awk '{ print $1 }')
  # Strip off the 'r'
  currentRevision="${currentRevision:1}"
  echo "$currentRevision"
}

currentRevision=$(getCurrentRevision)

# If we've run this program before then we've stored the SVN revision at the time
if [ -f "$lastRevisionFile" ]
then
  lastRevision=$(cat "$lastRevisionFile")
  #  Check what the current revision is, and exit if there
  # haven't been any changes since we last checked
  if [ $currentRevision -lt $lastRevision ]
  then
      echo "No changes since last check"
      exit
  fi
else
  # We haven't run this program before, so set the last revision to the current revision - 10
  lastRevision=$(echo "$currentRevision - 10" | bc)
fi

# Mail the SVN changes
svn log "$svnUrl" -r "HEAD:${lastRevision}" | mail -s "SVN changes for $svnUrl" $mailto

# Store the current revision + 1 as the last revision
revision=$(echo "$currentRevision + 1" | bc)
echo "$revision" > "$lastRevisionFile"

The Ultimate Scalability Presentation

April 29, 2009

At work we’re experiencing some fairly rapid growth, and our single production server is starting the feel the strain. I’ve been doing a lot of investigation into how we can scale the site, and thankfully there is lots of information out there.

The “Do you Scale” presentation I saw at PHP London a couple of months ago gave a good high level overview of scalability issues, and included some useful techniques to help you scale.

I think I’ve found the ultimate scalability presentation though: “Real World Web: Performance & Scalability”. The 189 slides contained within this presentation cover almost everything I’ve read elsewhere, and it’s packed full of practice advice!

Older Posts »