NSF Sponsored Project 1350364

TEACHER: A Pilot Study on Mining the Web for Customized Curriculum Planning

School of Computer Science, Carnegie Mellon University

Principal Investigators: Jaime Carbonell (PI), Yiming Yang (Co-PI)

PhD Students: Hanxiao Liu, Wanli Ma


With massive quantities of educational materials freely available on the web, the vision of universal education appears within our grasp. General-purpose search engines are insufficient as they do not focus on educational materials, objectives, pre-requisite relations, etc., nor do they stitch together multiple sources to create customized curricula for students' goals and current knowledge. To address these issues, our project focuses on:

  1. Extracting educational units from diverse web sites and representing them in a large directed graph, whose nodes are content descriptors and whose edges encode pre-requisite and other relations
  2. Conducting multi-field topic inference via a new family of learning models to infer relations among educational units and enrich the graph
  3. Automated curricular planning, namely providing sequences of lessons, courses, exercises and other education units for a student to achieve his/her educational goals, conditioned on current skills. The curriculum planner enriches a graph traversal path, with alternate paths, reinforcement options, and conditional branches.

TEACHER can apply to learn specific job skills, to reinforce classroom instructions, or as stand-alone academic support to address, for instance, the huge percentage of students who attempt MOOCs but never complete them due to lack of requisite skills and lack of guidance on how to acquire them.

Graph Learning Framework

We use the observed prerequisite relations among courses to learn a directed universal concept graph, and then use the induced concept graph to predict unobserved prerequisite relations among a broader range of courses. The system can be particularly useful to induce prerequisite relations among different providers of courses such as the universities or MOOCs. Meanwhile, the induced directed graph among concepts is crucial for any further customized curriculum planning.

The figure above illustrates our framework of two-scale directed graphs: The higher-level graphs have courses (nodes) with perquisite relations (links). The lower-level graph consists of universal concepts (nodes) and pairwise preference in learning or teaching concepts. The links between the two levels are system-assigned weights of concepts to each course.


We have proposed a novel ranking-based formulation for learning the concept graph and predicting the strength of prerequisite relationship between each pair of courses. Unlike conventional ranking-based methods in which each training example involves only a pair of data points, each of our training example is constructed from a triplet of courses thus the training size can be huge. We designed an efficient parameter reduction technique to address the scalability issue. Our algorithm is able to handle one million training examples with extremely high-dimensionality in a few minutes on a single desktop machine, while showing promising predicting performance under various experiment settings.

Representation Schemes

We explore different answers with four alternate choices: Word-based Representation, Sparse Coding of Words, Distributed Word Embedding and Category-based representation. The first two representation only involve our collected educational datasets. The last two representations take advantage of a large amount of academic Wikipedia articles.

New Datasets and Algorithms

We collected course listings, including course descriptions and available pre-requisite structure from MIT, Caltech, CMU and Princeton. The first two were complete course catalogs, and the latter two required spidering and scraping, and hence we collected only Computer Science and Statistics for CMU, and Mathematics for Princeton.

We summarize the datasets statistics in the following table

University # Courses # Prerequisites # Words
MIT 2322 1173 15396
Caltech 1048 761 5617
CMU 83 150 1955
Princeton 56 90 454

For more details, see a sample course in our collected dataset and the visualization of a subgraph from MIT Open Courseware (red vertices: EECS courses; green vertices: Mathematics courses).

The current version of dataset can be downloaded from Here.

Algorithm implementations and the associated publications for this project:

Illustrative Examples

Concept Graph Learning

We list several examples of linked Wikipedia categories in our system-induced concept graph. Our system "suggests" the concept in the 2nd column should be taught before the corresponding concept in the first column.

1st Wikipedia Category Linked Wikipedia Category
"Partial differential equations" "Ordinary differential equations"
"Ethnography" "Geography"
"Extrasolar Planets" "Chaos Theory"
"British Industrial designers" "Graphic Design"

The following figures are visualizations of our system-induced concept graphs of the MIT dataset and the Princeton dataset (all courses are from Mathematics department) based on Wikipedia category representation of the course concepts. The darker edge indicates the stronger prerequisite strength from the source vertex to the target vertex.

Transfer Learning

The following bar graph summarizes the performance of our proposed algorithm in within-university (red) and cross-university (blue) prerequisite prediction tasks. A higher MAP score indicates a better performance. By comparing the red bars (when the training university and the test university is the same) and the blue bars (when the training university is not the same as the test university), we see some performance loss in the transfer learning, which is expected, but nonetheless we have significant positive transfer.

Curriculum Planning

We represent a student's background by a set of courses that she/he has taken, and her/his goal by another set of courses that she/he wants to learn. For example, we consider a student with only some basic knowledge about mathematics, who wants to pursue a master degree in Economics, or prepare to work as an economist. We represent her background by a set of courses from the MIT Opencourseware:

{ 18.01: Single Variable Calculus, 18.014: Calculus with Theory, 18.100C: Real Analysis, 18.06: Linear Algebra }

and her goal by

{ 14.461: Advanced Macroeconomics I, 14.452: Economic Growth, 14.384: Time Series Analysis, 14.54: International Trade }

A preliminary version of our curriculum planner can generate a plan for the student as:

{18.01, 18.014, 18.06} --> 14.121: Microeconomic Theory I --> 14.122: Microeconomic Theory II --> 14.461

{18.014, 18.100C} --> 14.451: Dynamic Optimization Methods with Applications --> 14.452

{18.01, 18.014, 18.100C, 18.06} --> 14.01: Principles of Microeconomics --> 14.54

The arrows indicate the predicted prerequisite links in our concept graph. The planner would also suggest that the student might have inadequate knowledge to take the course 14.384: Time Series Analysis, since her background set does not include any courses related to stochastic processes or probabilities.

We are investigating search-based planning methods (e.g. breadth-first-search, A*) but applied at the course level or at the concept level, and projected to the course level. We are also investigating planning with probabilistically-inferred prerequisite links.

Last modified at Mon Nov 17 04:10:36 EST 2014