March, 2010 | Minds and Machines

SearchJump Updated with Embedded Favicons

Mar 28 2010 Published by Yuguang Zhang under SearchJump

Now you will notice that the icons load a lot faster and all at the same time if you are using either the minimal version or the plain version.

Note: the new version has a new namespace, so the old version must be uninstalled manually from Greasemonkey

No responses yet

SearchJump Fixes

Mar 25 2010 Published by Yuguang Zhang under SearchJump

instead of clicking twice, the panel hides itself after a single click.
by default, the script is only active on Google
an icon has been added for Clusty
the event listener for toggling the panel is only applied to the hide link
visual aspects are more balanced

No responses yet

UW Course Calendar Scraper

Mar 15 2010 Published by Yuguang Zhang under CourseTree,Programming

I’ve had the idea of making a self-updating, navigable tree of Waterloo courses. This is the first step. (Actually, not the first step for me. It started with Django, which had to do with my last work report’s comparison to Zen Cart. Some credit goes to Thomas Dimson for inspiration. He made the Course Qualifier.) The main idea for this step is to gather all the information to be stored in a database. With that (the idea and plan) begins the coding phase:

from scrapy.item import Item, Field

class UcalendarItem(Item):
course = Field()
name = Field()
desc = Field()
prereq = Field()
offered = Field()

I wanted to gather the course (“SE 101”), name (“Introduction to Methods of Software Engineering”), desc (“An introduction …”), prereq (“Software Engineering students only”), and offered (“F”) each as separate fields

In order to do that, I wrote a spider to crawl the page:

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from ucalendar.items import UcalendarItem

class UcalendarSpider(BaseSpider):
domain_name = "uwaterloo.ca"
start_urls = [
"http://www.ucalendar.uwaterloo.ca/0910/COURSE/course-SE.html"
]

def parse(self, response):
hxs = HtmlXPathSelector(response)
tables = hxs.select('//table[@width="80%"]')
items = []
for table in tables:
item = UcalendarItem()
item['desc'] = table.select('tr[3]/td/text()').extract()
item['name'] = table.select('tr[2]/td/b/text()').extract()
item['course'] = table.select('tr[1]/td/b/text()').re('([A-Z]{2,5} \d{3})')
item['offered'] = table.select('tr[3]/td').re('.*\[.*Offered: (F|W|S)+,* *(F|W|S)*,* *(F|W|S)*\]')
item['prereq'] = table.select('tr[5]/td/i/text()').re('([A-Z]{2,5} \d{3})')
items.append(item)
return items

SPIDER = UcalendarSpider()

There are several things to note:

The prereq field here cannot identify “For Software Engineering students only”. The regular expression only matches the course code.
Offered, unlike other fields, can contain more than one item
Prereq may be empty

Finally, the spider pipes its results to an output format. CSV format meets the requirements, as it can be inserted into a database.

import csv

class CsvWriterPipeline(object):

def __init__(self):
self.csvwriter = csv.writer(open('items.csv', 'wb'))

def process_item(self, spider, item):
try:
self.csvwriter.writerow([item['course'][0], item['name'][0], item['desc'][0], item['prereq'][0], item['offered'][0]])
except IndexError:
self.csvwriter.writerow([item['course'][0], item['name'][0], item['desc'][0], ' '.join(item['prereq']), ' '.join(item['offered'])])
return item

Two gotchas:

Because prereq might be empty, there needs to be an exception handler
Offered may be variable length. The list needs to be joined to output all of the terms the course is offered.

This part of the project was done in 2 hours with Scrapy. The project can be found in the downloads section.

No responses yet

Debugging with the Scientific Method

Mar 12 2010 Published by Yuguang Zhang under Programming

Recently, I was assigned to fix bugs on a large project done by 2 previous co-op students. I came across the Debugging section of Steve McConnell’s Code Complete 2 while looking up maintenance in the index. He suggests using the scientific method, summarized here:

Stabilize the error
Gather data
Analyze the data
Brainstorm hypothesises
Carry out the test plan
Did the experiment prove or disprove the hypothesis?

I applied this to a recent debugging problem. The bug is as follows, reported by a tester:

I entered data into an ouctome and assigned a rating.

Then used the Prev Next button to goto the next page and forgot to save the previous data.

Can there be a warning you have unsaved data before I lose this.

this should be for everyone student/employer/staff that enter data.

I started by trying to understand the problem, which I interpreted as:

The user expected to be able to use the prev and next buttons when filling out the forms. There was already a save button, which took them back to a view of all the outcomes instead of the next one. The user thought they had to click save for the page to remember their input.

Understanding the problem in terms of the business goals of the application opens up many possibilities, I chose to make the next and previous buttons save the input and take the user to a different page. This is where the debugging begins. First, I checked the HTML output for the save button and looked up its id in php scripts using grep. Finding the correct place to edit the code was easy. Making the right modification was the hard part. The code looked like this:

$(document).ready(function () {
$('#form_save_button').click(function () {
$('#form_submitted').val('0');
$('#form').submit();
});
});

Fortunately, I knew enough Javascript and functional programming to understand what the code was doing, but only on the surface. When the button is clicked, the function is executed. It sets a hidden form field to submitted status so that php stores the input in the database. So I wrote what looked like innocent lines of code:

$('#next').click(function () {
if(document.getElementsByName('rating')[0]
&& document.getElementsByName('info')[0]){
$('#form').submit();
}
});
$('#prev').click(function () {
if(document.getElementsByName('rating')[0]
&& document.getElementsByName('info')[0]){
$('#form').submit();
}
});

It worked immediately in Firefox. I thought I was done. Next, I got a report from one of the testers that it did not work for them. I tried it again in IE with the expected result. At this point, superstition came into play. I had previous experience with IE where a javascript error prevented an independent section of code from working. I hypothesized it could be the if statement, because it may not be allowed with jQuery. I tested my hypothesis by taking out the if statement. As no progress was made, I checked the error console in Firefox. It gave an error about $(‘#next’).click on object which does not exist. So I moved the script down below the area where the next link was created. It still did not work in IE. I decided the brute force approach was to learn jQuery and understand exactly what the code was doing. The tutorial was surprisingly short. I made sure my code used the correct jQuery syntax. When I read the documentation on the click method, an idea came to me that IE went away from the page without executing the registered event. There was other evidence supporting this hypothesis in Firefox, clicking next rather than save took many times longer. At this point, I doubted mousedown would work, as I already tried onclick. Luckily, I did look up documentation on mousedown. It looked like it was the correct way to prove or disprove my hypothesis. Switching from click to mousedown did verify my hypothesis. To my surprise, hitting next saved data in IE, with the same speed as hitting save.

No responses yet