11-443/11-643 Machine Learning for Text Analysis

(Previously Scalable Analytics: 11-443 for undergrads, 11-643 for graduates)

Instructor: Yiming Yang

TA: Andrew Hsi

Time: Fall 2014, Mondays and Wednesdays 3-4:20pm

Office hours: by Appointment

Location:

Prerequisites:

·    CS courses on data structures and algorithms, strong programming capabilities, linear algebra and intro probability;

·    Intro Machine Learning is not required but helpful

                Details, Syllabus, Lectures and Assignments: [Here]  

Course Description

This is a full-semester lecture-oriented course (12 units), intended for students in MS programs and undergraduates who meet the pre-requisites.  Replacing and expanding the 2nd half of former 11-641/11-441, Search Engines and Web Mining, it offers a blend of core theory, implementation and application of scalable data analytic techniques.  Specifically, it covers the following topics:

·    Clustering techniques

·    Recommender systems

·    Web-scale text classification

·    Authority detection in social media

·    Trend detection in social media

·    Sentiment analysis in social media

·    Learning to rank for document retrieval

·    Statistical significance tests

·    Dimensionality reduction with PCA and SVD

·    Feature selection/induction techniques

Grading

Students will be evaluated by homework assignments (6 assignments, totaling 60%), a closed-book midterm (20%) and an open-book final exam (20%) in the form of Capstone Project Proposal (CPP Guidelines). Undergraduate students can waive one out of one the 6 homework assignments (but if they do all, they will receive extra credits). Also, undergraduate students can take the option of pass/fail in the final exam (i.e., you get the full 20% if you pass).

Should you take this course?  Yes, if:

·    You're a CS student interested in machine learning techniques for large-scale text mining

·    You like AI, machine learning, and/or theoretical CS, and want to apply them to a hard real-world problems 

·    You're a non-CS student who can program well, have mathematical abilities and interested in machine learning and its applications to text and social media

·    You're a language technology minor (this course is an elective option)

·    You're are interested in broad applications of machine learning such as web-scale classification, structure discovery from massive data, learning to recommend, social-community discovery, sentiment analysis, trend detection over time sequences, etc.

·    You’re curious about statistical significance tests for machine learning and have the background.