Coursetree 2.0: An Intelligent Backend Coming Soon
The goal of coursetree 2.0 is to leverage the current cloud infrastructure to deliver semantic applications that help users find the information they are looking for.
Features currently planned:
- Course search that understands what the user wants
- Filtering of irrelevant links
- Pattern based degree data mining
Draft implementation strategy:
- Let Google search index Wikipedia and video links
- A Bayesian classifier will be used to categorize link content into subjects
- Template induction and template scraping
Features under consideration:
- Adaptable prerequisite semantic analysis
- Fully automated template learning and template extraction
- Relevant course links/suggested courses
Tenative ideas:
- Genetic algorithm for grammar rule generation with fitness score assigned according to the total number of parse errors
- Use hashing algorithms to detect similarity in sections of a page, feed similar sections using wrapper induction to generate template
- Build map of courses using anti-requisites and display nearest neighbors
Unsuccessful incubation features:
- Using genetic algorithms to generate templates for wrapper induction
- Switch to parse trees extract noun phrases for Wikipedia link candidates
- YQL for video link scraping
Lessons learned and salvaged:
- Don’t use genetic programming methods where scores cannot be assigned to each individual “program”, as many of the templates were simply fails with zero scores
- Although in some cases successful (with a comma separated list of noun phrases), in other cases single words were marked as noun phrases in the parse tree instead of a more desirable longer phrase
- Due to frequent changes in video sites, nested JavaScript callbacks with closures to glue previews to links made the code a target to be recycled