Thursday, September 24, 2009

The Beauty of Python

OOffice pastes tables into plain text 1 cell per line with no indication of row breaks. Dumb. this prompted me to convert the text into a list of tuples in python. I got so far as getting a list of cell data, then I needed to take every 4 items in the list and make a tuple out of them.

Here's the problem I was faced with:

input: ["a", "b", "c", "d", "w", "x", "y", "z", ...]
output: [("a", "b", "c", "d"), ("w", "x", "y", "z"), ...]

After asking around in the python irc channel, Andy got the following solution, and while writing this post, I also found it in Python's documentation for the zip function:

output = zip(*[iter(input)]*4)

It's hard to overstate my satisfaction, this is such an intricate, concise solution.

What is going on

from the inside out:
iter(input): this makes an iterator object that traverses the input list. The iterator stores the state of which item it's on, and what comes next. [iter(input)]: this makes a list with 1 item in it, the iterator.
[iter(input)]*4: this takes the list, duplicates it 4 times, and concats it all together. ex: [1,2]*3 => [1,2,1,2,1,2]. An important thing to note in this step is that just the one iterator object with its state information occupies all four slots in the list.
zip: python's documentation. Note that the function takes lists as separate arguments
zip(*[iter(input)]*4): the * expands the list to its right into an argument list for the zip function. So it's equivalent to giving 4 arguments to the zip function. Here's where the magic happens.

the zip functions traverses each of its four arguments in parallel. Before an iterator has been moved, its "next" item is the list's first item. The zip function asks the first list's iterator for the "next" item, giving the first item in the list. Then the zip function asks the second iterator for its "next" item, which is typically the first item in the second list. However, the second iterator is the same object as the first iterator, and its "next" item is the second item in the list, not anyone's first item. And so the zip function is tricked into thinking that it is traversing 4 separate lists of length 2 each when really it is traversing over a single list of length 8. The result is exactly what we want.

1 comment:

Jordan said...

Programmer: "Python!!! I need you to intelligently separate an input list, duplicate said list a specified number of times and concatenate said lists together, but you remember those lists and traverse them in this compiled list simultaneously."

Python: "Baling ba ding ding zip()."

Programmer: "You are made of win."

Python: "I know."