Statistical learning: Algorithms and theory


Class Times: Tuesdays and Thursdays 1:15-2:30
Location: 2240b CIEMAS
Instructors: Sayan Mukherjee
Office Hours: By appointment
Email Contact : sayan at stat dot duke dot edu
Similar class: Statistical learning theory

Course description

The problem of supervised learning will be developed in the framework of statistical learning theory. Two classes of machine learning algorithms that have been used successfully in a variety of applications will be studied in depth: regularization algorithms and voting algorithms. Support vector machines (SVMs) are an example of a popular regularization algorithm and AdaBoost is an example of a popular voting algorithm. The course will
1) introduce these two classes of algorithms
2) illustrate practical uses of the algorithms
via problems in computational biology and computer graphics
3) state theoretical results on the generalization and consistency of these algorithms.

Prerequisites

Familiarity with probability, functional analysis, and linear algebra will be very helpful. We try to keep the mathematical prerequisites to a minimum, but we will introduce complicated material at a fast pace.

Grading

Three problem sets for 50% of the grade. A final project for 50% of the grade.

Syllabus

The subject contained in each class is (hopefully) contained in the lecture notes that I am preparing. These lecture notes contain much greater detail than I will cover. Note, I have not yet had a chance to add references.
  • S. Mukherjee Statistical Learning: algorithms and theory.


  • Date Title
    Class 01 Tue 30 Aug Course at a glance
    Class 02 Thur 1 Sept Learning problem in perspective
    Class 03 Tue 6 Sept Regularization and Reproducing Kernel Hilbert Spaces
    Class 04 Thur 8 Sept Kernel ridge-regression
    Class 05 Tue 13 Sept Support Vector Machines for classification
    Class 06 Thur 15 Sept Spline models and regularization networks
    Class 07 Tue 20 Sept SVMs applied (tricks of the trade for practitioners)
    Class 08 Thur 22 Sept Probably approximately correct (PAC) framework and the boosting hypothesis
    Class 09 Tues 27 Sept Adaptive boosting (Adaboost)
    Class 10 Thur 29 Sept Adaboost: what statisticians say
    Class 11 Tue 4 Oct Adaboost: geometry and dynamical systems
    Class 12 Thur 6 Oct Boosting: spam detectors and computational biology
    Oct 7-11 Fall break
    Class 13 Thur 13 Oct Splines: computer graphics
    Class 14 Tues 18 Oct Multiclass classification and text classification
    Class 15 Thur 20 Oct Generalization and consistency
    Class 16 Tues 25 Oct One-dimensional concentration inequalities
    Class 17 Thur 27 Oct Vapnik-Chervonenkis classes, shattering dimensions, and covering numbers
    Class 18 Tues 1 Nov Vapnik-Chervonenkis classes, shattering dimensions, and covering numbers
    Class 19 Thurs 3 Nov Kolmogorov chaining and Dudley's entropy integral
    Class 20 Tues 8 Nov Symmetrization and Rademacher averages
    Class 21 Thurs 10 Nov Stability of Tikhonov regularization
    Class 22 Tues 15 Nov Feature selection and learning gradients
    Class 23 Thurs 17 Nov Regularization in a coherent Bayesian framework
    Nov 22-27 Thanksgiving
    Class 24 Tues 29 Nov Geometry and topology in learning
    Class 25 Thur 1 Dec Project presentations

    Math Camp 1 TBD Analysis and basic probability theory
    Math Camp 2 TBD More analysis and probability theory

    Reading List

    The books and papers listed below are useful general reference reading, especially from the theoretical viewpoint.