Class Times: Tuesdays and Thursdays 2:50-4:05 Location: Physics 128 Instructors: Sayan Mukherjee
Office Hours: By appointment Email Contact : sayan at stat dot duke dot edu Course description
The use of statistical methods and tools from applied probability to address problems in computational molecular biology. Biological problems in sequence analysis, structure prediction, gene expression analysis, phylogenetic trees, and statistical genetics will be addressed. The following statistical topics and techniques will be used to address the bioloigcal problems: classical hypothesis testing, Bayesian hypothesis testing, Multiple hypothesis testing, extremal statistics, Markov chains, continuous Markov processes, Expectation Maximization and imputation, classification methods, and clustering methods. Along the way we'll learn about gambling, card shuffling, and coin tossing.Problem sets
Problem set #1 Problem set #2 Problem set #3 Prerequisites
STA 213: Introduction to Statistical Methods, basic knowledge of biology, MTH 104: Linear Algebra and ApplicationsGrading
There will be four problem sets that will account for 40% of the grade, a midterm exam that will account for 20% of the grade, a final exam for 40% of the grade (students that have an A after the midterm, both exam and homeworks, will have an option to complete a final project in lieu of the final exam)Syllabus
The subject contained in each class is (hopefully) contained in the lecture notes that I am preparing. Most of this material will be taken from the five books listed in the reading list.S. Mukherjee Course notes S. Mukherjee Course notes for classification and regression (sections 2,3, mainly 5)
Date Title Class 01 Thur 12 Jan Course at a glance Class 02 Tue 17 Jan Overview of probability and stats and notation (I) Class 03 Thur 19 Jan Overview of probability and stats and notation (II) Class 04 Tue 24 Jan Statistical inference Class 05 Thur 26 Jan More statistical inference Class 06 Tue 31 Jan Classical hypothesis testing (with applications) Class 07 Thur 2 Feb Bayesian hypothesis testing (with applications) Class 08 Tue 7 Feb Multiple hypothesis testing Class 09 Thur 9 Feb Intro to theory of Markov chains and random walks (I) Class 10 Tue 14 Feb BLAST and its cousins (I) Class 11 Thur 16 Feb BLAST and its cousins (II) Reading List
- W.J. Ewens and G.R. Grant. Statistical Methods for Bioinformatics:an intreoduction.
- T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction.
- D. Sorensen and D. Gianola. Likelihood, Bayesian, and MCMC Methods in Qualitative Genetics.
- M.S. Waterman. Introduction to Computational Biology.
- K. Lange. Mathematical and Statistical Methods for Genetic Analysis.
- P. Davis and D.H. Kenyon. Of Pandas and People: The Central Question of Biological Origins.