Sunday, 20 February 2011

Checking an ISBN

International Standard Book Numbers consist of a string of digits followed by a checksum digit to help detect typos, not unlike schemes used in credit card numbers. Here I show a small Python function to calculate this check digit.

You can pass it a string consisting of 9 or 10 digits. If you pass in a 10 digit number the last digit is ignored. The line returned consists of the first 9 digits plus a final digit that is the checksum (this digit might be an X character).

The most common use case is to check whether a given isbn is well formed by passing it to isbnchecksum() an comparing the result to the original:

if myisbn == isbnchecksum(myisbn):
    ... proceed ...

For the last 5 or 6 years 10 digit isbn has been replaced by 13 digit isbn starting with a 978 or 979 sequence. This 13 digit sequence in compatible to so called ean numbers used for marking all kinds of goods, not just books and the first 3 digits represent a country code. For books this country is the fictional Bookland. The checksum for these isbn-13 codes is calculated differently and not shown here but the algorithm can be found on Wikipedia together with additional information on isbn. The small snippet shown below is part of a larger module that I wrote to retrieve information from sources like the Library of Congress and Amazon. That module can be found on homepage.

import string

def isbnchecksum(line):
    Calculate the checksum for an isbn-10 number.
    if (len(line) == 10):
        line = line[0:9]
    if (len(line) != 9):
        raise AttributeError('ISBN should be 9 digits, excluding checksum!')
    sum = 0
    count = 0
    for ix in line:
        sum = sum + (10 - count) * string.atoi(ix)
        count = count + 1
    sum = sum % 11
    if (sum != 0):
        sum = 11 - sum
    if (sum == 10):
        line = line + 'X'
        line = line + string.digits[sum]
    return line

After I published this I realized the solution wasn't all that Pythonic. A more elegant implementation (with slightly different semantics) might be the following bit of code (although some people argue that the ternary x if c else y operator is not Pythonic in any way. Note that the we used the atoi() function from the locale module for even more portability.

from string import digits
from locale import atoi

def isbn10checksum(isbn):
    if len(isbn)!=10 : raise AttributeError('isbn should be 10 digits')
    return 0 == sum((10-w)*atoi('10' if d == 'X' else d)
                    for w,d in enumerate(isbn.upper()))%11

Just for completeness sake the code for an isbn-13 check. It not very elegant but it is a nice example of Python strides in action:

def isbn13checksum(isbn):
    if len(isbn)!=13 : raise AttributeError('isbn should be 13 digits')
    c=(10-(sum(atoi(d) for d in isbn[0:12:2])
                 +sum(3*atoi(d) for d in isbn[1:12:2]))%10)%10
    return atoi(isbn[12]) == c

Saturday, 12 February 2011

Python 3 Web Development, beginners guide, the RAW version

My new book is available in a RAW version

The people at Packt Publishing decided to put my new book on their website in its RAW version, meaning you can get early access to it and at a discounted price as well.
The coming weeks I'll be working hard together with the editorial team at Packt to get the book ready for its final version. The book is a beginners guide on developing web applications in Python with a bit of help of Javascript and jQuery on the client side. And although we take small steps, we do take a lot of them and end with a quite elaborate framework that will help you develop quite sophisticated web applications with suprisingly little effort.

Monday, 7 February 2011

Singleton objects

Singleton objects are not some fancy concept but a very practical solution to representing a specific value as an object as Pythons None object shows.

Many programming languages provide syntactical solutions to provide constants. Literals like 123 or "a string" are almost always constants but if you want to give a meaningful name to a constant you need a different approach.

Referring to value by name is simple enough, after all a variable assignment does just that, but you need some way to indicate that that variable shouldn't be altered after the assignment. Python does not provide a way to specify a constant but there are ways around this, see for example (this recipe).

Paradoxically, it is comparatively simple to define a class that allows only a single instantiated object. There are arguments for and against this Singleton Design Pattern and this article is a good starting point if you want to read about it. And whatever the merits of this design pattern, you can't avoid it because many constants in Python are implemented as singleton classes, for example True, False, NotImplemented and None.

And the None implementation illustrates another practical advantage: when comparing a value against a singleton we can check whether they are identical (with the is operator) instead of comparing their values (with the == operator) and although this might not be the primary purpose, it does give us a clear speed advantage as the next snippets shows:

import timeit

s1 = "123 == None"
s2 = "a == None"
s3 = "123 is None"
s4 = "a is None"

t = timeit.Timer(stmt=s1,setup='a=123')
print (s1, "%.2f usec/pass" % (count * t.timeit(number=count)/count))
t = timeit.Timer(stmt=s2,setup='a=123')
print (s2, "%.2f usec/pass" % (count * t.timeit(number=count)/count))
t = timeit.Timer(stmt=s3,setup='a=123')
print (s3, "%.2f usec/pass" % (count * t.timeit(number=count)/count))
t = timeit.Timer(stmt=s4,setup='a=123')
print (s4, "%.2f usec/pass" % (count * t.timeit(number=count)/count))
The results of running this bit of code on my Samsung NC10 netbook give me the following output:
123 == None 0.29 usec/pass
a == None 0.31 usec/pass
123 is None 0.20 usec/pass
a is None 0.21 usec/pass
This is a significant difference and although it doesn't seem to amount to much, this might shave several seconds of your algorithm when you compare for example the results of a large database query, as databases tend to represent NULL values as None.