Archive for the 'Programming' Category

Crow Divination 2nd in Search Results

May 29 2010 Published by under Programming

I came up with the idea to make a program for divining crow cries from jcrow’s. With slight modifications over time, it has come second to the original source of information.

Screenshot

No responses yet

Coursetree Prerequisites

Apr 05 2010 Published by under CourseTree,Programming

While working on the tree part of the coursetree project, I ran into the question of how to display the course dependencies. I have written a recursive function that returns a recursive list of course prerequisites:

[u'SE 112', [u'MATH 135', []]]

I wanted to turn it into a diagram of course prerequisites and decided 2 flat lists could retain the data:

  • a list for the levels
  • a list for connections between courses

For example,

tree
would translate to

[[1],[2,6],[3,5,7],[4]]

and

[(1,2),(2,3),(2,5),(3,4),(1,6),(6,7)]

Separating the Levels

We start off with a recursive list representing the tree structure:

[1,[2,6,[3,5,[4,[]],[]],[7,[]]]]

To tackle the problem, I first solved a similar one: flattening a list.
This can easily be done in Scheme, as syntax and type declarations do not get in the way.

(define (flatten sequence)
  (cond ((null? sequence) '()) ; simplest case: ()
        ((list? (car sequence)) ; a list in front: ((1) ...)
         (append (flatten (car sequence))
                 (flatten (cdr sequence))))
        (else (cons (car sequence) ; an atom in front: (1 ...)
                    (flatten (cdr sequence))))))

> (flatten '(1 (2 3)))
(list 1 2 3)

Here’s the same code translated into Python:

car = lambda lst: lst[0]
cdr = lambda lst: lst[1:]
def flatten(seq):
    if not seq:
        return list()
    elif isinstance(car(seq), list):
        return flatten(car(seq)).extend(flatten(cdr(seq)))
    else:
        return [car(seq), flatten(cdr(seq))]

Unfortunately, flatten in Python produces a hypernested structure for flat lists, as (1 2 3) in Scheme is equibalent to (cons 1 (cons 2 (cons 3 empty))) or (1 . (2 . 3)).

>>> flatten([1,2,3])
[1, [2, [3, []]]]

The more serious problem is that it exposes a quirk in Python:

>>> [1].extend([])
>>>

Yes, that means the list disappears after being extended with an empty list . . . one of those unpleasant surprises.
So let’s redefine a flat list in Python:

def flatten(tree):
        result = []
        for node in tree:
                if isinstance(node, list):
                        result.extend(flatten(node))
                else:
                        result.append(node)
        return result

>>> flatten([1,2,3])
[1,2,3]
>>> flatten([1,[2,3]])
[1,2,3]

There is a similarity between this approach and the recursive one: different actions are taken for a list node and other nodes. I used a combined functional and imperative approach to solve the problem:

car = lambda lst: lst[0]
cdr = lambda lst: lst[1:]

'''
depth: returns the maximum nesting level of a list

Given:
    ls, a list

Result:
    an integer
'''

def depth(ls):
    if not ls:
        return 0
    elif isinstance(car(ls),list):
        return max(depth(car(ls))+1,depth(cdr(ls)))
    else:
        return max(1,depth(cdr(ls)))


'''
strip: returns the list elements of a list

Given:
    ls, a list

Result:
    ls, the modified list
'''

def strip(ls, top):
    if top:
        for item in top:
            if item in ls:
                ls.remove(item)
    elif cdr(ls):
        ls = car(ls) + strip(cdr(ls), top) # case like [[1], ...]
    else:
        ls = car(ls)  # case like [[1]]
    return ls


'''
level: returns the top level elements of a list

Given:
    ls, a list

Result:
    a new list
'''

def level(ls):
    if not ls:
        return []
    elif not isinstance(car(ls),list):
        return [car(ls)] + level(cdr(ls))
    else:
        return level(cdr(ls))

'''
levelize: returns a list of lists, each list is contains the items of a level

Given:
    ls, a list

Result:
    a new list
'''

def levelize(ls):
    result = []
    a = list(ls)
    for i in range(2*depth(ls)):
        if not i%2:
            result.append(level(a))
        a = strip(a, level(a))
    return result

>>> levelize([1,[2,6,[3,5,[4,[]],[]],[7,[]]]])
[[1], [2, 6], [3, 5, 7], [4]]

Connecting the Nodes

We start off with a recursive list representing the tree structure, slightly different from the list for separating the levels:

[1,[2,[3,[4],5],6,[7]]]

Again, a mix of recursion and iteration easily solves the problem:

'''
pair: returns a list of lists, each list has an odd and even pair

Given:
        ls, a list

Result:
        a list
'''

def pair(ls):
    result = []
    while ls:
        result.append(ls[0:2])
        ls = ls[2:]
    return result

'''
connect: returns a list of tuples, each tuple represents an edge of the graph

Given:
        ls, a list

Result:
        a list of tuples
'''

def connect(ls):
    result = []
    if cdr(ls):
        if cdr(cdr(ls)):
            for item in pair(ls):
                result.extend(connect(item))
        else:
            second = car(cdr(ls))
            for item in level(second):
                result.append((car(ls),item))
            result.extend(connect(second))
    return result

>>> connect([1,[2,[3,[4],5],6,[7]]])
[(1, 2), (1, 6), (2, 3), (2, 5), (3, 4), (6, 7)]

Hopefully, you’ve had fun reading this article and meanwhile came up with a better way to represent the tree as a flat structure.

No responses yet

UW Course Calendar Scraper

Mar 15 2010 Published by under CourseTree,Programming

I’ve had the idea of making a self-updating, navigable tree of Waterloo courses. This is the first step. (Actually, not the first step for me. It started with Django, which had to do with my last work report’s comparison to Zen Cart. Some credit goes to Thomas Dimson for inspiration. He made the Course Qualifier.) The main idea for this step is to gather all the information to be stored in a database. With that (the idea and plan) begins the coding phase:

from scrapy.item import Item, Field

class UcalendarItem(Item):
    course = Field()
    name = Field()
    desc = Field()
    prereq = Field()
    offered = Field()

I wanted to gather the course (“SE 101”), name (“Introduction to Methods of Software Engineering”), desc (“An introduction …”), prereq (“Software Engineering students only”), and offered (“F”) each as separate fields
snapshot4
In order to do that, I wrote a spider to crawl the page:

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from ucalendar.items import UcalendarItem

class UcalendarSpider(BaseSpider):
    domain_name = "uwaterloo.ca"
    start_urls = [
        "http://www.ucalendar.uwaterloo.ca/0910/COURSE/course-SE.html"
    ]

    def parse(self, response):
    hxs = HtmlXPathSelector(response)
    tables = hxs.select('//table[@width="80%"]')
    items = []
        for table in tables:
        item = UcalendarItem()
        item['desc'] = table.select('tr[3]/td/text()').extract()
        item['name'] = table.select('tr[2]/td/b/text()').extract()
        item['course'] = table.select('tr[1]/td/b/text()').re('([A-Z]{2,5} \d{3})')
        item['offered'] = table.select('tr[3]/td').re('.*\[.*Offered: (F|W|S)+,* *(F|W|S)*,* *(F|W|S)*\]')
        item['prereq'] = table.select('tr[5]/td/i/text()').re('([A-Z]{2,5} \d{3})')
            items.append(item)
    return items

SPIDER = UcalendarSpider()

There are several things to note:

  • The prereq field here cannot identify “For Software Engineering students only”. The regular expression only matches the course code.
  • Offered, unlike other fields, can contain more than one item
  • Prereq may be empty

Finally, the spider pipes its results to an output format. CSV format meets the requirements, as it can be inserted into a database.

import csv

class CsvWriterPipeline(object):

    def __init__(self):
        self.csvwriter = csv.writer(open('items.csv', 'wb'))

    def process_item(self, spider, item):
    try:
        self.csvwriter.writerow([item['course'][0], item['name'][0], item['desc'][0], item['prereq'][0], item['offered'][0]])
    except IndexError:
        self.csvwriter.writerow([item['course'][0], item['name'][0], item['desc'][0], ' '.join(item['prereq']), ' '.join(item['offered'])])
        return item

Two gotchas:

  • Because prereq might be empty, there needs to be an exception handler
  • Offered may be variable length. The list needs to be joined to output all of the terms the course is offered.

This part of the project was done in 2 hours with Scrapy. The project can be found in the downloads section.

No responses yet

Debugging with the Scientific Method

Mar 12 2010 Published by under Programming

Recently, I was assigned to fix bugs on a large project done by 2 previous co-op students. I came across the Debugging section of Steve McConnell’s Code Complete 2 while looking up maintenance in the index. He suggests using the scientific method, summarized here:

  1. Stabilize the error
  2. Gather data
  3. Analyze the data
  4. Brainstorm hypothesises
  5. Carry out the test plan
  6. Did the experiment prove or disprove the hypothesis?

I applied this to a recent debugging problem. The bug is as follows, reported by a tester:

I entered data into an ouctome and assigned a rating.

Then used the Prev Next button to goto the next page and forgot to save the previous data.

Can there be a warning you have unsaved data before I lose this.

this should be for everyone student/employer/staff that enter data.

I started by trying to understand the problem, which I interpreted as:

The user expected to be able to use the prev and next buttons when filling out the forms. There was already a save button, which took them back to a view of all the outcomes instead of the next one. The user thought they had to click save for the page to remember their input.

Understanding the problem in terms of the business goals of the application opens up many possibilities, I chose to make the next and previous buttons save the input and take the user to a different page. This is where the debugging begins. First, I checked the HTML output for the save button and looked up its id in php scripts using grep. Finding the correct place to edit the code was easy. Making the right modification was the hard part. The code looked like this:

$(document).ready(function () {
    $('#form_save_button').click(function () {
        $('#form_submitted').val('0');
        $('#form').submit();
    });
});

Fortunately, I knew enough Javascript and functional programming to understand what the code was doing, but only on the surface. When the button is clicked, the function is executed. It sets a hidden form field to submitted status so that php stores the input in the database. So I wrote what looked like innocent lines of code:

$('#next').click(function () {
    if(document.getElementsByName('rating')[0]
    && document.getElementsByName('info')[0]){
        $('#form').submit();
    }
});
$('#prev').click(function () {
    if(document.getElementsByName('rating')[0]
    && document.getElementsByName('info')[0]){
        $('#form').submit();
    }
});

It worked immediately in Firefox. I thought I was done. Next, I got a report from one of the testers that it did not work for them. I tried it again in IE with the expected result. At this point, superstition came into play. I had previous experience with IE where a javascript error prevented an independent section of code from working. I hypothesized it could be the if statement, because it may not be allowed with jQuery. I tested my hypothesis by taking out the if statement. As no progress was made, I checked the error console in Firefox. It gave an error about $(‘#next’).click on object which does not exist. So I moved the script down below the area where the next link was created. It still did not work in IE. I decided the brute force approach was to learn jQuery and understand exactly what the code was doing. The tutorial was surprisingly short. I made sure my code used the correct jQuery syntax. When I read the documentation on the click method, an idea came to me that IE went away from the page without executing the registered event. There was other evidence supporting this hypothesis in Firefox, clicking next rather than save took many times longer. At this point, I doubted mousedown would work, as I already tried onclick. Luckily, I did look up documentation on mousedown. It looked like it was the correct way to prove or disprove my hypothesis. Switching from click to mousedown did verify my hypothesis. To my surprise, hitting next saved data in IE, with the same speed as hitting save.

No responses yet

Gray Matter Matters

Aug 27 2009 Published by under Programming

I asked a question on probably one of the higher traffic forums, and it was answered within an hour. It still took 3 people to put together the answer because one line of code couldn’t work without the other. I had to have about the same level of technical knowledge to make use of the information and ask the question. What made this answer question style different was that the question was answered immediately unlike some other forums I’ve used. You can see it for yourself here:
Python FTP Most Recent File – Stack Overflow

Coincidence? It was on this website that I first heard of dive into python, the online book I used to learn the language. Except I learned dive into python 3 and asked the question from Google. A lot can be said about the use of the democratic process where knowledge matters.

No responses yet

Shifting Unicode Codepoints

Aug 26 2009 Published by under Programming

While working on a word search generator that translated letter symbols (sort of like cryptograms, except this one’s impossible to solve without the key), php ran out of memory on a rather ordinary function.

function uniord($c) {
    $h = ord($c{0});
    if ($h <= 0x7F) {
        return $h;
    } else if ($h < 0xC2) {
        return false;
    } else if ($h <= 0xDF) {
        return ($h & 0x1F) << 6 | (ord($c{1}) & 0x3F);
    } else if ($h <= 0xEF) {
        return ($h & 0x0F) << 12 | (ord($c{1}) & 0x3F) << 6
                                 | (ord($c{2}) & 0x3F);
    } else if ($h <= 0xF4) {
        return ($h & 0x0F) << 18 | (ord($c{1}) & 0x3F) << 12
                                 | (ord($c{2}) & 0x3F) << 6
                                 | (ord($c{3}) & 0x3F);
    } else {
        return false;
    }
}
function unichr($c) {
    if ($c <= 0x7F) {
        return chr($c);
    } else if ($c <= 0x7FF) {
        return chr(0xC0 | $c >> 6) . chr(0x80 | $c & 0x3F);
    } else if ($c <= 0xFFFF) {
        return chr(0xE0 | $c >> 12) . chr(0x80 | $c >> 6 & 0x3F)
                                    . chr(0x80 | $c & 0x3F);
    } else if ($c <= 0x10FFFF) {
        return chr(0xF0 | $c >> 18) . chr(0x80 | $c >> 12 & 0x3F)
                                    . chr(0x80 | $c >> 6 & 0x3F)
                                    . chr(0x80 | $c & 0x3F);
    } else {
        return false;
    }
}
function asc_shift($string, $amount) {
 $key = substr($string, 0, 1);
 if(strlen($string)==1) {
   return unichr(uniord($key) + $amount);
 } else {
   return unichr(uniord($key) + $amount) . asc_shift(substr($string, 1, strlen($string)-1), $amount);
 }
}

Now I understand why there is a need for a lower level implementation of UTF-8 strings by default in php 6. Python support of UTF-8 support has already been included in Python 3.0 released December 3rd, 2008.

No responses yet

A lot of steps for just getting one file

Aug 26 2009 Published by under Programming

If you ever wanted to include PEAR.php in a script, but never installed it before on a shared host, a google search for pear.php download should have solved the problem. On the contrary, following the first links on google is a huge waste of time. In my situation, I only needed pear.php to instantiate a class. The intuitive solution is to just download the file and ftp it to the same location as the file that includes it. That works, except the official installation guide never says that.

No responses yet

Notepad++ Autocompletion

Jul 03 2009 Published by under Programming

There’s a handy feature in Notepad++ that’s not enabled by default.
screenshot.7_380x392
This works on Windows 7 or Vista, but the pop-up window blinks. Here’s the way to enable it:

  1. Settings -> Preferences -> Backup/Auto-competion
  2. check enable auto-completion
  3. right-click and edit properties for Notepad++

On another note, you can make it load faster by adding the -noPlugin flag to it:
screenshot.8

No responses yet

« Prev