Ikanobori Weblog

Advanced Python users might know generators and generator comprehensions, this article is meant for people who are just getting started with Python and want to speed up some of their tasks.

Firstly I’ll start off with a little introduction of what list comprehensions are. List comprehensions are a nifty little syntactic sugar to make the creation of lists a lot easier.

Take this list for example:

lst = []
for x in xrange(10):
lst.append(x)

That is an awful lot of code for a for creating a simple list. We could do better:

lst = [x for x in xrange(10)]

Now that is a whole lot better. Easier to read, and far shorter to write. You can make list comprehensions as nifty as you like. Take this for example:

lst = [check(x) for x in xrange(10) if x % 2]

This creates a list of the outcome of check() on all uneven number from zero to ten. You could even go berserk and start nesting list comprehensions but that would just be to fondle your inner perl coder.

lst = [x for x in [y for y in xrange(10)]]

Or build a general list if you add them to together:

lst = [(x,y,z) for x in xrange(10) for y in xrange(10) for z in xrange(10)]

Which generates a nice list of a thousand items.

If you create a list comprehension like this, python executes the statements to form the list and puts that whole list in memory.

Now let’s go on to a real world test case:

I have a text file containing numbers, a random amount of number per line and the numbers are of random size. Can you quickly give me a sum of all numbers? The file has to be read from stdin.

Sounds hard? Well I’ll show you the code straight away:

#!/usr/bin/env python
import sys

data = [[int(i) for i in line.split()] for line in sys.stdin]

total = 0

for line in data:
  total  += sum(line)

print 'Grand total %d' % total

As you can see I use a nested list comprehension. The first list comprehension (the outer one) reads all data from stdin and creates a second list of every number on that line.

The output of the list comprehension would be like this:

[[1,2,3],
[4,5,6]]

And the total sum would be: 17677692470 (I used this 17MB text file which I generated). I’ve ran this script ten times and got an average of 4.571 seconds to execute it, not too bad considering it has to spit throught 17 MB’s of text and 322456 lines.

However, we can make that faster, 35.33% faster.

Python also knows generators and generator comprehensions. This is not the article to go into depth about generators and the specifics of generator comprehensions (read the two PEP’s I linked to get a feel for them). I’ll just show you the edited code which uses generator comprehensions to do the same trick.

#!/usr/bin/env python
import sys

data = ([int(i) for i in line.split()] for line in sys.stdin)

total = 0

for line in data:
  total  += sum(line)

print 'Grand total %d' % total

That’s all there is to it, I’ve changed the outer [ and ] to ( and ) and the comprehension has become a generator. Generators are lazy, so the next value does not get calculated until their .next() method gets called. This makes Python not read the whole of stdin to memory, it will just read a line when it’s necessary.

The effects on this file are already dramatic, it takes 2.956 seconds on average to parse the same input file with the generator.

However, if you would be working with even larger input files (let’s say 2 GB) you might not even be able to do this with lists. Do you want to put 2 GB in memory? I don’t think so.

Read up on iterators if you are working with this kind of stuff a lot, especially parsing log files.

Any comments on the code of this article can be left in the comments, I’ll check them!

23 COMMENTS
Jason Baker
April 9, 2009

If anyone wants to know more about how generator comprehensions (iterators) are faster than list comprehensions (lists), there’s a good email on the mailing list: http://markmail.org/message/t2a6tp33n5lddzvy

Ronny Pfannschmidt
April 9, 2009

how does sum(int(number) for line in file for number in line.split()) perform against your file?

April 9, 2009

@Ronny Pfannschmidt:
When I do one run on your solution it takes 2.79 seconds to parse it, that is marginally faster then the generator comprehension.

Care to elaborate why you’d do it that way?

Masklinn
April 9, 2009

If you’re interested in “system” data manipulation and extraction using generators, you should check out “Generator Tricks for System Programmers” (http://www.dabeaz.com/generators-uk/) which is a *very* cool presentation on this very subject by David Beazley.

Also, check Ronny Pfannschmidt’s suggestion, it’s slightly faster than your final one (4.89s down to 4.69s on my machine, with your initial listcomp at 6.30s and a listcomp-based computation flattened the same way at 5.40s)

(and what this means is that nested comprehensions *really* shouldn’t be used when you can avoid them)

Pythoneer
April 9, 2009

Here’s another one:

total = sum(sum(map(int, line.split())) for line in sys.stdin)

Figuring out why this is ~30% faster than Ronny’s version is left as an exercise.

snuxoll
April 9, 2009

#!/usr/bin/env ruby

total = 0

STDIN.each_line { |line| total += line.split(” “).inject(0) { |sum, x| sum + x.to_i } }

puts “GRAND TOTAL: #{total}”

April 10, 2009

@pythoneer:
Yours takes 2.25 seconds. Because the sum does it’s calculations in C reducing opcode overhead?

@snuxoll:
Ruby 1.8: 19.91 seconds
Ruby 1.9: 4.04 seconds

So the fastest Ruby up until now is 4.04 seconds which is faster then the slowest Python implementation in here but almost twice as slow as the fastest Python implementation in here.

Anyone care to see if it goes faster still?

RayNbow
April 10, 2009

Fairly fast Haskell version:

http://raynbow.pastebin.com/f69881d8b

snuxoll
April 10, 2009

@raynbow
I can’t get that to compile with ghc6.

RayNbow
April 10, 2009

@snuxoll
I’m using GHC 6.10.1. You’ll need the ByteString package from http://hackage.haskell.org/.

However, if you plan on installing more Haskell packages in the future, I recommend getting the Cabal tool first:

1. Download the following files:
http://hackage.haskell.org/packages/archive/HTTP/4000.0.5/HTTP-4000.0.5.tar.gz
http://hackage.haskell.org/packages/archive/zlib/0.5.0.0/zlib-0.5.0.0.tar.gz
http://www.haskell.org/cabal/release/cabal-install-0.6.2/cabal-install-0.6.2.tar.gz

2. Compile and install each of these packages, saving Cabal install for last since it depends on the first two:

$ runghc Setup.hs configure
$ runghc Setup.hs build
$ runghc Setup.hs install

Note: the setup file is sometimes called Setup.lhs

3. You should now have the Cabal install tool. You can use it to install ByteString:

$ cabal update
$ cabal install bytestring

If you run into problems, let me know. :)
(Or you could always ask for help in #haskell on Freenode)

April 10, 2009

I’ve ran RayNbow’s solution. It averages at 2.084 seconds for a ten time run. That’s only marginally faster than the fastest Python solution.

I would have thought the difference would have been bigger, maybe the limiting factor is disk IO and we are closing it now.

snuxoll
April 10, 2009

@ikanobori
His solution loads the whole file into memory and then operates on it, though I don’t think there’s an easy way to do streaming IO in haskell.

RayNbow
April 10, 2009

No, it can be faster. Here’s plain vanilla x86 code:
 
http://raynbow.pastebin.com/f62a90643
 
$ gcc -O2 linesum.s -o x86linesum
$ time x86linesum < input_list_vs_gen.txt
Grand total 17677692470
 
real 0m0.471s
user 0m0.456s
sys 0m0.012s

I’ve also tested this asm program against a C program written by luite (from irc.tweakers.net), and his C program was actually about 0.04s seconds faster.
 
(This shows that gcc knows more about the complex x86 hardware than I do ;) )

RayNbow
April 10, 2009

@snuxoll
If you were referring to my Haskell solution, it does not load the whole file in memory.
 
(I used Lazy ByteStrings)

snuxoll
April 10, 2009

@raynbow
Ah, I haven’t had much of a chance to play with haskell yet, so I was unaware how ByteString’s worked.

luite
April 10, 2009

apparently, the previous version of my program (which RayNbow tested), spent most of the time in the getchar calls. A simple optimization is:

http://pastebin.com/f3c78db10

this program heavily uses 64 bit arithmetic. The following version is a bit faster on 32 bit systems:

http://pastebin.com/f10f47f00

RayNbow
April 10, 2009

I’ve disassembled luite’s fastest version and compared it to my slower x86 asm, for those who are interested:
 
http://raynbow.pastebin.com/f7a06b1ea

sjoerd
April 10, 2009

PHP:
$s=0; $f = fopen(’php://stdin’,'r’);
while (!feof($f))
$s += array_sum(explode(” “,fgets($f,4096)));
echo $s;

snuxoll
April 10, 2009

@sjoerd
Yours may work on the test file, but what happens if a line expands past 4096 bytes? That’s a very bad assumption.

snuxoll
April 11, 2009

I did it in vala, runs in just over a second on my VPS.

http://gist.github.com/93333

[...] one of my posts caused quite a bit of attention so I am revisiting the subject. I’ll start by [...]

Clean
April 12, 2009

How are you guys passing that huge file to the standard input? I am not getting that :(

sjoerd
April 15, 2009

@snuxoll
I assumed this contest was about summing up the numbers in the testfile, not a generic sum-of-file script. After reading the full blogpost I understand your concerns about the 4KB buffer.

Post a comment