Course Planner Prerequisite Parsing Fixes

Nov 24 2013 Published by under CourseTree,Programming

One of the bugs still in the Waterloo Course Planner is the handling of prerequisite sentences that end with “* students only.”. The fix I made was rather simple. Though converting existing test cases to unit tests did not help because none of the older grammar rules were changed and therefore the tests were not broken, it did help in the development of new grammar rules. Python unit tests does a nice job of pinpointing the exact place where the expected results differ from running the code.

Failure
Traceback (most recent call last):
  File "N:\Projects\ply\prereqyacc.py", line 182, in testQuirks
    self.assertEqual(results, map(parser.parse, prereqs))
AssertionError: Lists differ: ['MATH 127/MATH 128/MATH 137/M... != ['MATH 127/MATH 128/MATH 137/M...

First differing element 1:
MATH 115/MATH 119
MATH 115, MATH 119

- ['MATH 127/MATH 128/MATH 137/MATH 147', 'MATH 115/MATH 119']
?                                                  ^

+ ['MATH 127/MATH 128/MATH 137/MATH 147', 'MATH 115, MATH 119']
?                                                  ^^

“* students only.” appearing in a string results in the tokenizer printing out a list of invalid tokens. There are two ways of ignoring strings in PLY:

  1.  t_ignore_ in the tokenizer
  2. change t_ignore_ to a token and add a new rule that returns an empty string in the parser

I used the second method this time, since the semicolon preceding “* students only.” would also need to be ignored. Semicolons are treated as a signal for an “and” clause otherwise. The new rule looks like the following:

def p_restriction(p):
    'semi_restriction : SEMI STUDENTS_ONLY'
    p[0] = ''

No responses yet

Coding Horror and the Illumination

Nov 03 2010 Published by under CourseTree

Can you imagine anything useful coming out of this line of code?

prereqPattern='''([A-Z]{2,5} \d{3}/[A-Z]{2,5} \d{3}\s*and\s*[A-Z]{2,5} \d{3}/[A-Z]{2,5} \d{3})|([A-Z]{2,5} \d{3}\s*and\s*\(*[A-Z]{2,5} \d{3}\)*)|(\)\s*and\s*\()|(One of [A-Z]{2,5} \d{3}\s*[, or\d{3}]*;|[A-Z]{2,5} \d{3}[, \d{3}]*;|.*,\s*[A-Z]{2,5} \d{3}.*;|[A-Z]{2,5} \d{3} or [A-Z]{2,5} \d{3}|[A-Z]{2,5} \d{3} or \d{3}|[A-Z]{2,5} \d{3})'''

I can, because I’ve seen it. Here’s a small slice:

MATH 137 or 147
 ACTSC 231, STAT 230 or 240
MATH 136 or 146
ACTSC 232
ACTSC 371
AFM 272;ACTSC 291
STAT 330, 333
STAT 330, 333
 ACTSC 331, STAT 330
AFM 372;ACTSC 391;ACTSC 231;ACTSC 231 and BUS 393;STAT 330;STAT 334
AFM 372;ACTSC 391;ACTSC 231;ACTSC 231 and BUS 393;STAT 333 or 334
ACTSC 331
AFM 101 or BUS 227
AFM 101

A master piece at its time, and quite decent to maintain with the documentation:

'CS 134 or 145 or a mark of 60% or higher in 136 or 138; Honours Mathematics or Software Engineering students only.'
'([A-Z]{2,5} \d{3}) or (\d{3})*'
('CS 134', '145')

'One of CS 134, 136, 138, 145; Computer Science students only.'
'([A-Z]{2,5} \d{3})([, \d{3}]*;)'
('CS 134', ', 136, 138, 145;')

'Computer Science students only.'
'(\D* students only)'
'Computer Science students only'

'CS 350 or ECE 354'
'([A-Z]{2,5} \d{3}) or ([A-Z]{2,5} \d{3})*'
('CS 350', 'ECE 354')

'CM 339/CS 341 or CS 350'
'([A-Z]{2,5} \d{3})/([A-Z]{2,5} \d{3})( or [A-Z]{2,5} \d{3})*'
('CM 339', 'CS 341', ' or CS 350')

'DAC 201/ENGL 203 and DAC 202/ENGL 204'
 '([A-Z]{2,5} \d{3})/([A-Z]{2,5} \d{3}).*([A-Z]{2,5} \d{3})/([A-Z]{2,5} \d{3})*'
('DAC 201', 'ENGL 203', 'AC 202', 'ENGL 204')

And a light comes through. There is a better way, better than I ever imagined. In fact, the ugly phase of evolution comes before the flowering. A metaphor for this would be the pollution during industrialization and a bloom of clean industries following it due to the emphasis on knowledge rather than machine labor.
Simply pass on the parenthesis in the lexer and rewrite the grammar so that it can see more of the input before having to make a decision for shift/reduce conflicts. This is similar to using a parser with more lookahead, an oracle that sees infinitely many steps ahead.

Yet, no matter how sweet the code,

it is time to part ways.

What comes next,

no one knows.

This age is past,

and the ground is laid for the next one.

No responses yet