11-443/11-643 Machine Learning for Text Analysis

(Starting from Spring 2015, this course will be renamed and cross-listed as 11-741/11-641/11-441)

Instructor: Yiming Yang

TA: Andrew Hsi

Time: Fall 2014, Mondays and Wednesdays 3-4:20pm

Office hours: by Appointment

Location:

Prerequisites:

    CS courses on data structures and algorithms, strong programming capabilities, linear algebra and intro probability;

    Intro Machine Learning is not required but helpful

††††††††††††††† Details, Syllabus, Lectures and Assignments: [Here]

Course Description

This is a full-semester lecture-oriented course (12 units), intended for students in MS programs and undergraduates who meet the pre-requisites.Replacing and expanding the 2nd half of former 11-641/11-441, Search Engines and Web Mining, it offers a blend of core theory, implementation and application of scalable data analytic techniques.Specifically, it covers the following topics:

    Clustering techniques

    Recommender systems

    Web-scale text classification

    Authority detection in social media

    Trend detection in social media

    Sentiment analysis in social media

    Learning to rank for document retrieval

    Statistical significance tests

    Dimensionality reduction with PCA and SVD

    Feature selection/induction techniques

Grading

Students will be evaluated by homework assignments (6 assignments, totaling 60%), a closed-book midterm (20%) and an open-book final exam (20%) in the form of Capstone Project Proposal (CPP Guidelines). Undergraduate students can waive one out of one the 6 homework assignments (but if they do all, they will receive extra credits). Also, undergraduate students can take the option of pass/fail in the final exam (i.e., you get the full 20% if you pass).

Should you take this course?Yes, if:

    You're a CS student interested in machine learning techniques for large-scale text mining

    You like AI, machine learning, and/or theoretical CS, and want to apply them to a hard real-world problems

    You're a non-CS student who can program well, have mathematical abilities and interested in machine learning and its applications to text and social media

    You're a language technology minor (this course is an elective option)

    You're are interested in broad applications of machine learning such as web-scale classification, structure discovery from massive data, learning to recommend, social-community discovery, sentiment analysis, trend detection over time sequences, etc.

    Youíre curious about statistical significance tests for machine learning and have the background.