Coursetree 2.0: An Intelligent Backend Coming Soon

Jun 24 2011

The goal of coursetree 2.0 is to leverage the current cloud infrastructure to deliver semantic applications that help users find the information they are looking for.

Features currently planned:

  1. Course search that understands what the user wants
  2. Filtering of irrelevant links
  3. Pattern based degree data mining

Draft implementation strategy:

  1. Let Google search index Wikipedia and video links
  2. Bayesian classifier will be used to categorize link content into subjects
  3. Template induction and template scraping

Features under consideration:

  1. Adaptable prerequisite semantic analysis
  2. Fully automated template learning and template extraction
  3. Relevant course links/suggested courses

Tenative ideas:

  1. Genetic algorithm for grammar rule generation with fitness score assigned according to the total number of parse errors
  2. Use hashing algorithms to detect similarity in sections of a page, feed similar sections using wrapper induction to generate template
  3. Build map of courses using anti-requisites and display nearest neighbors

Unsuccessful incubation features:

  1. Using genetic algorithms to generate templates for wrapper induction
  2. Switch to parse trees extract noun phrases for Wikipedia link candidates
  3. YQL for video link scraping

Lessons learned and salvaged:

  1. Don’t use genetic programming methods where scores cannot be assigned to each individual “program”, as many of the templates were simply fails with zero scores
  2. Although in some cases successful (with a comma separated list of noun phrases), in other cases single words were marked as noun phrases in the parse tree instead of a more desirable longer phrase
  3. Due to frequent changes in video sites, nested JavaScript callbacks with closures to glue previews to links made the code a target to be recycled

No responses yet