11-741/11-641/11-441 Machine Learning for Text Mining

Instructor: Yiming Yang

TA: Hanxiao Liu, Jiachen Li

Time and Location: TR, 12:00 - 1:20pm, HH B131

Prerequisites:

·    CS courses on data structures and algorithms, strong programming capabilities, linear algebra and intro probability;

·    Intro Machine Learning is not required but helpful

                Syllabus and Detailed Course Materials [Here]: You need a CMU account to access on campus or via VPN

Course Description

This is a full-semester lecture-oriented course (12 units) for students at the PhD-level, MS-level and undergraduate students who meet the pre-requisites.  It offers a blend of core theory, algorithms, evaluation methodologies and applications of scalable data analytic techniques.  Specifically, it focuses on the following topics:

·    Clustering

·    Link analysis

·    Collaborative Filtering (Recommender systems)

·    Matrix Factorization

·    Social media analysis

·    Web-scale text classification

·    Learning to rank for document retrieval

·    Statistical significance tests

Notice that 11-741 and 11-641 are 12-unit courses for graduate students, but 11-441 is a 9-unit course for undergraduate students.  Although the lectures and exams are the same in all these courses, the required work load by students differs by course. That is, the required course work in11-441 is a subset of that in 11-641, and the latter is a subset of that in 11-741. See the detailed distinctions in the Grading section. 11-741 is among the required courses for PhD candidates in the Language Technologies of Institute while 11-641 only counts as a master-level course.  Graduate students can choose either 11-741 or 11-641, depending on their career goals and the backgrounds.  Undergraduate students should take 11-441; exception is possible if approved by the instructor.

Should you take this course?  Yes, if:

·    You're a CS student interested in machine learning techniques for large-scale text mining

·    You like AI, machine learning, and/or theoretical CS, and want to apply them to a hard real-world problems 

·    You're a non-CS student who can program well, have mathematical abilities and interested in machine learning and its applications to text and social media

·    You're a language technology minor (this course is an elective option)

·    You're are interested in broad applications of machine learning such as web-scale classification, structure discovery from massive data, learning to recommend, social-community discovery, sentiment analysis, trend detection over time sequences, etc.

·    You’re curious about statistical significance tests for machine learning and have the background.