Useful Links
 JHU library
 JHU academic calendar: class dates, finals dates, holidays, etc…
 JHU course listings, descriptions and which courses are offered by semester and department
HighDimensional Approximation, Probability, and Statistical Learning
Location & Time: Shaffer 300, TueThu, 10:3011:45 a.m.
Office hours: Wed 14:4515:45 in Wyman bldg. N438 with instructor; also, with TAs: Ian McPherson, Wyman S425, Friday 13:0014:30, Aranyak Acharya, Wyman S425, Tuesday 15:0016:30, except for Tuesday Feb 6th, when it is going to be online (Zoom link in my email) from 16:30 to 18:00.
TA: Aranyak Acharyya, Ian Oliver McPherson.
Exams: Mar. 28th. Final: will consist of submission of report on the final project: May 6th. Oral presentation of the project will be in final week of classes.
All materials for the course will be made available here.
Synopsis:
The course covers fundamental mathematical ideas for approximation and statistical learning problems in high dimensions. We start with studying highdimensional phenomena, through both the lenses of probability (concentration inequalities) and geometry (concentration of measure phenomena). We then consider a variety of techniques and problems in highdimensional statistics and machine learning, ranging from dimension reduction (random projections, embeddings of metric spaces, manifold learning) to classification and regression (with connections to approximation theory, Fourier analysis and wavelets, Reproducing Kernel Hilbert Spaces, treebased methods, multiscale methods), estimation of probability measures in high dimensions (density estimation, but also estimation of singular measures, and connections with optimal transport). We then consider graphs and networks, Markov chains, random walks, spectral graph theory, models of random graphs, and applications to clustering. Finally, we discuss problems at the intersection of statistical estimation, machine learning, and dynamical/physical systems, in particular Markov state space models, interacting particle/agent systems, as well as model reduction for stochastic dynamical systems. Computational aspects will be discussed in all topics above (including, e.g., randomized linear algebra, fast nearest neighbors methods, basic optimization).
Homework: available on Gradescope.
Homework 1, Homework 2, Homework 3
References: we will use materials from several Lecture notes, books and papers, including:
 Foundations of Data Science, A. Blum, J. Hopcroft, and R. Kannan.
 High Dimensional Probability, An Introduction with Applications in Data Science, R. Vershynin.
 High Dimensional Statistics, P. Rigollet (see also here).
 Universal Algorithms for Learning Theory Part I: Piecewise Constant Functions, Binev et al.
 Estimation in high dimensions: a geometric perspective (part of this material is now part of the textbook above by the same author), R. Vershynin.
 Measure Concentration lecture notes, A. Barvinok.
 Nonlinear Approximation, R. A. DeVore.
 Ten Lectures and FortyTwo Open Problems in the Mathematics of Data Science, A. Bandeira
 Introduction to nonparametric estimation, A. Tsybakov
 A distributionfree theory of nonparametric regression, L. Gyorfi, M. Kohler, A. Krzyzak, H Walk
 Lectures on Spectral Graph Theory, F.R.K. Chung
 Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps, Coifman et al..
Mathematical and Computational Foundations of Data Science
Location & Time: Shaffer 300, TueThu, 12:0013:15 p.m.
Office hours: Thu 13:3014:30, Wyman bldg. N438 with the instructor; on Fri 12:0013:00 in Krieger 201 with TA Elham Matinpour.
TA: Elham Matinpour.
Exams: Mar. 28th + possible popquizzes. Final: will consist of presentation/poster on the final project: May 13th, 9a.m.12p.m.
Synopsis:
The course covers several topics in Data Science, focusing on key mathematical and statistical concepts and technical ideas common to many developments in the field, as well as on algorithms and their computational aspects. The emphasis is on fundamental mathematical ideas (including basic functional analysis and approximation theory, concentration inequalities from a probabilistic and geometric point of view, analysis of and on graphs), core statistical techniques (e.g. linear regression, parametric and nonparametric methods), machine learning techniques for unsupervised (e.g. clustering, manifold learning), supervised (classification, regression), and semisupervised learning. Algorithmic and computational aspects of the above and their foundations, including basics of numerical linear algebra, and of linear and nonlinear optimization, to implement solutions to the problems above in a computationally efficient fashion. Applications will include statistical signal processing, imaging, inverse problems, graph processing, and problems at the intersection of statistics/machine learning and physical/dynamical systems (e.g. learning interaction kernels in agentbased models, model reduction for stochastic dynamical systems).
Linear Algebra done Right, Sheldon Axley
Numerical Linear Algebra, L.N. Trefethen and D. Bau (the first few chapters provide some of the background in linear algebra)
Finite Dimensional Vector Spaces, Holmes.
Linear Algebra and Learning from Data, Gilbert Strang.
The Elements of Statistical Learning, HastieTibshiraniFriedman (electronic version available at the link above, within JHU network/with JHU login)
A basic introduction to Markov Chains, by J. Rocca
Markov Chains, by J.R. Norris
High Dimensional Probability, An Introduction with Applications in Data Science, R. Vershynin, for the material regarding concentration inequalities and geometry of highdimensional random vectors [advanced]
Foundations of Data Science, A. Blum, J. Hopcroft, and R. Kannan: we discuss some of the materials on geometric concentration phenomena in Chp. 2. It could also serve as a references for SVD/PCA (Chp. 3) [moderately advanced]
Matlab tutorial: from Mathworks here. Many more are available online, just use your favourite search engine to look for “matlab tutorial”. Cleve Moler book (selected chapters): Numerical computing with MATLAB and Experiments with MATLAB.a
HighDimensional Approximation, Probability, and Statistical Learning
Location & Time: Maryland 109, TueThu, 10:3011:45
Office hours: Tue 13:3014:30 in Wyman bldg. N438 with instructor; also Wed 14:0016:00 in Wyman bldg. S425 with TA Bichen Zhu.
TA: Shishu Qiu, Bicheng Zhu
Final exam: May 16th, 14:0017:00
All materials for the course will be made available here. The course is designed for inperson instruction, with heavy use of blackboard/whiteboard and inclass questions, discussions, and discussions of projects.
Synopsis:
The course covers fundamental mathematical ideas for approximation and statistical learning problems in high dimensions. We start with studying highdimensional phenomena, through both the lenses of probability (concentration inequalities) and geometry (concentration of measure phenomena). We then consider a variety of techniques and problems in highdimensional statistics and machine learning, ranging from dimension reduction (random projections, embeddings of metric spaces, manifold learning) to classification and regression (with connections to approximation theory, Fourier analysis and wavelets, Reproducing Kernel Hilbert Spaces, treebased methods, multiscale methods), estimation of probability measures in high dimensions (density estimation, but also estimation of singular measures, and connections with optimal transport). We then consider graphs and networks, Markov chains, random walks, spectral graph theory, models of random graphs, and applications to clustering. Finally, we discuss problems at the intersection of statistical estimation, machine learning, and dynamical/physical systems, in particular Markov state space models, Hidden Markov Models, and interacting particle/agent systems, as well as model reduction for stochastic dynamical systems. Computational aspects will be discussed in all topics above (including, e.g., randomized linear algebra, fast nearest neighbors methods, basic optimization).
Syllabus
Code: examples in the first two lectures.
Homework:
Homework set 1, due Feb. the 10th.
Homework set 2, due Feb. the 17th.
Homework set 3, due Feb. the 24th.
Homework set 4, due Mar. the 3rd.
Homework set 5, due Mar. the 10th.
Homework set 6, due Mar. the 17th.
References: we will use materials from several Lecture notes, books and papers, including:
 Foundations of Data Science, A. Blum, J. Hopcroft, and R. Kannan.
 High Dimensional Probability, An Introduction with Applications in Data Science, R. Vershynin.
 High Dimensional Statistics, P. Rigollet (see also here).
 Universal Algorithms for Learning Theory Part I: Piecewise Constant Functions, Binev et al.
 Estimation in high dimensions: a geometric perspective (part of this material is now part of the textbook above by the same author), R. Vershynin.
 Measure Concentration lecture notes, A. Barvinok.
 Nonlinear Approximation, R. A. DeVore.
 Ten Lectures and FortyTwo Open Problems in the Mathematics of Data Science, A. Bandeira
 Introduction to nonparametric estimation, A. Tsybakov
 A distributionfree theory of nonparametric regression, L. Gyorfi, M. Kohler, A. Krzyzak, H Walk
 Lectures on Spectral Graph Theory, F.R.K. Chung
 Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps, Coifman et al..
Introduction to Harmonic Analysis and its Applications
Location & Time: Maryland 109, TueThu, 12:0013:15 p.m.
Office hours: Thu 13:30, Wyman bldg. N438 with the instructor; Thu 16:0018:00 in Krieger 204 with C. Quinn (TA).
TA: Casey Quinn
Final exam: May 15th, 9:00 12:00
All materials for the course will be made available here. The course is designed for inperson instruction, with heavy use of blackboard/whiteboard and inclass questions, discussions, and discussions of projects.
Syllabus
Homework:
Homework set 4, due Mar. the 3rd.
Homework set 5, due Mar. the 10th.
Homework set 6, due Mar. the 17th.
References: We will use materials from several books and papers, including:
 Fourier Analysis, E. S. Stein and R. Shakarchi
 A wavelet tour of signal processing, S. Mallat
 Real Analysis, G.B. Folland
 Ten Lectures on Wavelets, I. Daubechies
 Harmonic Analysis for Engineers and Applied Scientists, G.S. Chirikjian and A.B. Kyatkin
 Mathematics of Medical Imaging, C. Epstein
HighDimensional Approximation, Probability, and Statistical Learning
Location & Time: Spring Semester, Homewood Campus Shaffer 100, TueThu 10:30AM11:15AM
Office hours: Tue, 1:30PM, Krieger 308, starting Feb. 1st; Zoom meeting ID 910 9593 5339.
TA: Jianyu Lin (office hours: Wed 34pm, Zoom only, with meeting ID 970 1934 3339, usual password); Qiangang Fu.
Final exam: May 11, 2pm, in Shaffer 300.
All materials for the course will be made available here (in particular, there is no blackboard or other place with materials for this course).
While the course is designed for inperson instruction, with heavy use of blackboard/whiteboard, occasional remote participation is possible via Zoom, with the meeting ID 910 9593 5339 (see email(s) for the (usual) password).
Synopsis:
The course covers fundamental mathematical ideas for approximation and statistical learning problems in high dimensions. We start with studying highdimensional phenomena, through both the lenses of probability (concentration inequalities) and geometry (concentration of measure phenomena). We then consider a variety of techniques and problems in highdimensional statistics and machine learning, ranging from dimension reduction (random projections, embeddings of metric spaces, manifold learning) to classification and regression (with connections to approximation theory, Fourier analysis and wavelets, Reproducing Kernel Hilbert Spaces, treebased methods, multiscale methods), estimation of probability measures in high dimensions (density estimation, but also estimation of singular measures, and connections with optimal transport). We then consider graphs and networks, Markov chains, random walks, spectral graph theory, models of random graphs, and applications to clustering. Finally, we discuss problems at the intersection of statistical estimation, machine learning, and dynamical/physical systems, in particular Markov state space models, Hidden Markov Models, and interacting particle/agent systems, as well as model reduction for stochastic dynamical systems. Computational aspects will be discussed in all topics above (including, e.g., randomized linear algebra, fast nearest neighbors methods, basic optimization).
Link to syllabus
Slides for the first 2 lectures (Keynote, pptx, pdf with no animations)
Shared code (demos ran in class + possibly useful for some homework problems)
Homework 1 (due Feb. 15th), also (and primarily) available on Gradescope.
Homework 2 (due Feb. 28th) available on Gradescope.
Homework 3 (due Mar. 9th) available on Gradescope.
Homework 4 (due Mar. 18th) available on Gradescope.
Project Tracks: contains several suggestions/tracks for possible final projects. Final project report due on May 15th.
References: we will use materials from several Lecture notes, books and papers, including:
 Foundations of Data Science, A. Blum, J. Hopcroft, and R. Kannan.
 High Dimensional Probability, An Introduction with Applications in Data Science, R. Vershynin.
 High Dimensional Statistics, P. Rigollet (see also here).
 Universal Algorithms for Learning Theory Part I: Piecewise Constant Functions, Binev et al.
 Estimation in high dimensions: a geometric perspective (part of this material is now part of the textbook above by the same author), R. Vershynin.
 Measure Concentration lecture notes, A. Barvinok.
 Nonlinear Approximation, R. A. DeVore.
 Ten Lectures and FortyTwo Open Problems in the Mathematics of Data Science, A. Bandeira
 Introduction to nonparametric estimation, A. Tsybakov
 A distributionfree theory of nonparametric regression, L. Gyorfi, M. Kohler, A. Krzyzak, H Walk
 Lectures on Spectral Graph Theory, F.R.K. Chung
 Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps, Coifman et al..
Mathematical and Computational Foundations of Data Science
Location & Time: Spring Semester, Homewood Campus Maryland 109, TueThu 12:00 PM 01:15 PM
Office hours: Thu, 1:30PM, Krieger 304, starting Feb. 3rd; also via Zoom ID 974 7082 2323.
TAs: Quanjun Lang. Office hours: Tue 34pm, Krieger 201 and Zoom (meeting ID 91413833003, usual password).
Final exam: May 16, 9am, in Maryland 109.
All materials for the course will be made available on this page (in particular, there is no blackboard or other place with materials for this course).
Synopsis:
The course covers several topics in Data Science, focusing on key mathematical and statistical concepts and technical ideas common to many developments in the field, as well as on algorithms and their computational aspects.
The emphasis is on fundamental mathematical ideas (including basic functional analysis and approximation theory, concentration inequalities from a probabilistic and geometric point of view, analysis of and on graphs), core statistical techniques (e.g. linear regression, parametric and nonparametric methods), machine learning techniques for unsupervised (e.g. clustering, manifold learning), supervised (classification, regression), and semisupervised learning.
Algorithmic and computational aspects of the above and their foundations, including basics of numerical linear algebra, and of linear and nonlinear optimization, to implement solutions to the problems above in a computationally efficient fashion. Applications will include statistical signal processing, imaging, inverse problems, graph processing, and problems at the intersection of statistics/machine learning and physical/dynamical systems (e.g. learning interaction kernels in agentbased models, model reduction for stochastic dynamical systems).
Shared code (class demos + perhaps a starting point for some homework problems)
While the course is designed for inperson instruction, with heavy use of blackboard/whiteboard, occasional remote participation is allowed via Zoom with meeting ID 974 7082 2323 (see email(s) for the (usual) password).
Homework 1 (due Feb. 15th), also (and primarily) available on Gradescope.
Homework 2 (due Feb. 28th) available on Gradescope.
Homework 3 (due Mar. 9th) available on Gradescope.
Homework 4 (due Mar. 18th) available on Gradescope.
Linear Algebra and Learning from Data, Gilbert Strang.
Numerical Linear Algebra, L.N. Trefethen and D. Bau.
Finite Dimensional Vector Spaces, Holmes.
The Elements of Statistical Learning, HastieTibshiraniFriedman (electronic version available at the link above, within JHU network/with JHU login)
A basic introduction to Markov Chains, by J. Rocca
Markov Chains, by J.R. Norris
High Dimensional Probability, An Introduction with Applications in Data Science, R. Vershynin, for the material regarding concentration inequalities and geometry of highdimensional random vectors [advanced]
Foundations of Data Science, A. Blum, J. Hopcroft, and R. Kannan: we discuss some of the materials on geometric concentration phenomena in Chp. 2. It could also serve as a references for SVD/PCA (Chp. 3) [moderately advanced]
Matlab tutorial: from Mathworks here. Many more are available online, just use your favourite search engine to look for “matlab tutorial”. Cleve Moler book (selected chapters): Numerical computing with MATLAB and Experiments with MATLAB.
HighDimensional Approximation, Probability, and Statistical Learning
Location & Time: Spring Semester, Homewood Campus Bloomberg 274 MW 01:30 PM 02:45 PM; Final exam: Fri May 8th, 25 PM
Office hours: Tue, 11:40AM, Whitehead 302D, Feb 4 cancelled ; TAs: Ting Chao, Sichen Yang. Mon 34PM on 2/3,2/17,3/2in Whitehead 205; Fri 121PM on 2/7,2/21,3/6in Whitehead 211D.
Online office hours: times and contact information has been posted on Piazza.
Synopsis: The course covers fundamental mathematical ideas for approximation and statistical learning problems in high dimensions. We start with studying highdimensional phenomena, through both the lenses of probability (concentration inequalities) and geometry (concentration phenomena). We then consider a variety of techniques and problems in highdimensional statistics and machine learning, ranging from dimension reduction (random projects, embeddings of metric spaces, manifold learning) to classification and regression (with connections to approximation theory, Fourier analysis and wavelets, Reproducing Kernel Hilbert Spaces, treebased methods, multiscale methods), estimation of probability measures in high dimensions (density estimation, but also estimation of singular measures, and connections with optimal transport). We then consider graphs and networks, markov chains, random walks, spectral graph theory, models of random graphs, and applications to clustering. Finally, we discuss problems at the intersection of statistical estimation, machine learning, and dynamical/physical systems, in particular Markov state space models, Hidden Markov Models, and interacting particle/agent systems. Computational aspects will be discussed in all topics above (including, e.g., randomized linear algebra, fast nearest neighbors methods, basic optimization).
Syllabus.Midterm: March 25th.Video Lectures:
Week of 3/23/20 (pdf),
Week of 3/30/20 (pdf),
Week of 4/6/20: 4/6 (pdf), 4/8 (pdf),
Week of 4/13/20 (+Matlab demo video and code).
Week of 4/20/20 (pdf)
Week of 4/27/20: 4/27 (pdf), 4/29 (pdf)
These are links to videos and notes for the lectures. I strongly recommend downloading the videos for watching locally, as the video player embedded in the JHUprovided OneDrive can significantly deteriorate the quality of the videos, depending on JHU outgoing and your incoming internet speed, to a point where the writing becomes unreadable. I processes the videos to a resolution of 720p which seemed a good tradeoff between quality and size of the download. It you encounter troubles let me know.
Homework Sets: 1, 2, 3, 4, 5, 6 (updated), 7, final projectsReferences: we will use materials from several Lecture notes, books and papers, including:
 Foundations of Data Science, A. Blum, J. Hopcroft, and R. Kannan.
 High Dimensional Probability, An Introduction with Applications in Data Science, R. Vershynin.
 High Dimensional Statistics, P. Rigollet (see also here).
 Universal Algorithms for Learning Theory Part I: Piecewise Constant Functions, Binev et al.
 Estimation in high dimensions: a geometric perspective (part of this material is now part of the textbook above by the same author), R. Vershynin.
 Measure Concentration lecture notes, A. Barvinok.
 Nonlinear Approximation, R. A. DeVore.
 Ten Lectures and FortyTwo Open Problems in the Mathematics of Data Science, A. Bandeira
 Introduction to nonparametric estimation, A. Tsybakov
 A distributionfree theory of nonparametric regression, L. Gyorfi, M. Kohler, A. Krzyzak, H Walk
 Lectures on Spectral Graph Theory, F.R.K. Chung
 Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps, Coifman et al..
Mathematical and Computational Foundations of Data Science
Location & Time: Spring Semester, Homewood Campus Bloomberg 274 MW 12:00 PM 01:15 PM; Final exam: Tue May 12th, 912 AM
Office hours: Tue, 10:50AM, Whitehead 302D (my office), [Feb 4 cancelled; extra hr on Feb 24th 9AM];
TAs: Luochen Zhao, Mon 910AM Krieger 201; also available at help room on Fri 35PM, Krieger 213.
Online office hours: times and contact information has been posted on Piazza.
Week of 3/23/20 (pdf),
Week of 3/30/20 (pdf),
Week of 4/6/20: 04/06/20 (pdf), 04/08/20 (pdf),
Week of 4/13/20, with video lecture on examples and code.
Week of 4/20/20 (pdf)
Week of 4/27/20: 4/27 (pdf), 4/29 (pdf)
Homework Sets: 1,2,3,4,5,6,7,final projects
Linear Algebra and Learning from Data, Gilbert Strang.
Numerical Linear Algebra, L.N. Trefethen and D. Bau.
Finite Dimensional Vector Spaces, Holmes.
The Elements of Statistical Learning, HastieTibshiraniFriedman (electronic version available at the link above, within JHU network/with JHU login)
High Dimensional Probability, An Introduction with Applications in Data Science, R. Vershynin, for the material regarding concentration inequalities and geometry of highdimensional random vectors.
Foundations of Data Science, A. Blum, J. Hopcroft, and R. Kannan: we discuss some of the materials on geometric concentration phenomena in Chp. 2. It could also serve as a references for SVD/PCA (Chp. 3).
Matlab tutorial: from Mathworks here. Many more are available online, just use your favourite search engine to look for “matlab tutorial”. Cleve Moler book (selected chapters): Numerical computing with MATLAB and Experiments with MATLAB.
Introduction to Harmonic Analysis and its Applications [Spring]
Synopsis
Office hours: Krieger 405, on Mon. at 5:15pm, starting on Feb. 4th.
Final exam date: decided by registrar, still not available as of 1/31/19!
We will use materials from several Lecture notes, books and papers, including:
 Fourier Analysis, E. S. Stein and R. Shakarchi
 A wavelet tour of signal processing, S. Mallat
 Real Analysis, G.B. Folland
 Ten Lectures on Wavelets, I. Daubechies
 Harmonic Analysis for Engineers and Applied Scientists, G.S. Chirikjian and A.B. Kyatkin
 Mathematics of Medical Imaging, C. Epstein
Homework sets:
 Homework 1, due 2/13
 Homework 2, due 2/20
 Homework 3, due 3/6
 Homework 4, due 3/13
 Homework 5, due 4/3
HighDimensional Approximation, Probability, and Statistical Learning [Fall]
Office hours: TBD
Final exam date: Dec. 13th, 9am12pm (decided by the registrar)
We will use materials from several Lecture notes, books and papers, including:
 Avrim Blum, John Hopcroft, and Ravindran Kannan Foundations of Data Science textbook.
 Roman Vershynin’s High Dimensional Probability, An Introduction with Applications in Data Science, draft textbook
 Philippe Rigolet’s High Dimensional Statistics lecture notes
 Binev et al.’s paper Universal Algorithms for Learning Theory Part I: Piecewise Constant Functions
 Roman Vershynin’s Estimation in high dimensions: a geometric perspective (much of this material is now part of the textbook above)
 Alexander Barvinok’s Measure Concentration lecture notes
 Ronald A. DeVore’s Nonlinear Approximation
 Afonso Bandeira’s Ten Lectures and FortyTwo Open Problems in the Mathematics of Data Science
 Introduction to nonparametric estimation, A. Tsybakov
 A distributionfree theory of nonparametric regression, L. Gyorfi, M. Kohler, A. Krzyzak, H Walk
 F.R.K. Chung’s Lectures on Spectral Graph Theory.
 Coifman et al.’s Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps.
Homework sets:
Code snippets:
 Examples from Lecture 1
Introduction to Harmonic Analysis and its Applications [Spring]
Office hours: Krieger 405, on Wed. after 4:15pm, by appointment
Section: Thu 6:006:50pm, Krieger 205
Final exam date: decided by registrar
We will use materials from several Lecture notes, books and papers, including:
 Fourier Analysis, E. S. Stein and R. Shakarchi
 A wavelet tour of signal processing, S. Mallat
 Real Analysis, G.B. Folland
 Ten Lectures on WaveletsI, I. Daubechies
 Harmonic Analysis for Engineers and Applied Scientists, G.S. Chirikjian and A.B. Kyatkin
 Mathematics of Medical Imaging, C. Epstein
Homework sets:
 Homework 1, due 2/14
 Homework 2, due 2/21
 Homework 3, due 3/7
 Homework 4, due 4/11
 Review problems
HighDimensional Approximation, Probability, and Statistical Learning [Fall]
Course number: AS.110.675, EN.553.738
Classroom: Hodson 311, MonWed 1:30pm2:45pm. IMPORTANT: class on 9/20 will be in Hackerman 320.
Office hours: Krieger 405, Tue 4:15pm (instructor); Wed 46pm in Whitehead 212 with Zachary Lubberts, TA.
Final exam: Thu 12/14, 25pm, usual classroom (as per this page)
Synopsis. We will introduce fundamental techniques for approximations of funcitons in low and high dimensions, using Fourier, wavelet and other multiscale techniques. Both linear and nonlinear approximation techniques will be introduced. We will discuss some of the application of these techniques to signal processing and imaging. We will then connect these techniques with probability, and apply them to fundamental problems in statistical and machine learning theory. Tools such as concentration inequalities and basic analysis of random matrices will be introduced and applied.
We will also discuss other highdimensonal estimation and inverse problems, and their geometric interpretation.
Finally, we will discuss dimension reduction, random projections, manifold learning, optimal transportations and estimation problems in the space of probability measures via optimal transport.
Prerequisites.
Linear algebra will be used throughout the course, as will multivariable calculus and probability (discrete and continuous random variables).
Basic functional analysis (e.g. Lebesgue spaces, linear operators), or at the very least firm grasp of basic properties of vector spaces and linear operators.
Basic experience in programming in C or MATLAB or R or Octave, in order to perform basic numerical experiments.
Grading.
Grade to be based on assignments (roughly biweekly), and a final presentation and report. The latter will be about a paper or topic selected in agreement with the instructor, and depending on the paper/topic it may be worked on by a team rather than a single student.
Students from all areas of science, mathematics and applied mathematics, engineering, computer science, statistics, quantitative biology, economics that need advanced level skills in solving problems involving probability and stochastic processes in high dimensions, signal processing, statistical modeling, often arising from the analysis or modeling of highdimensional data are encouraged to enroll.
The Instructor is available during his office hours, in Krieger 405, at a time to be mutually agreed upon at the beginning of the semester.
Our Teaching Assistant, Mr. Zachary Lubberts, is available during his office hours, at a time to be mutually agreed upon at the beginning of the semester.
We will use materials from several Lecture notes, books and papers, including:
 Roman Vershynin’s High Dimensional Probability, An Introduction with Applications in Data Science, draft textbook
 Philippe Rigolet’s High Dimensional Statistics lecture notes
 Binev et al.’s paper Universal Algorithms for Learning Theory Part I: Piecewise Constant Functions
 Roman Vershynin’s Estimation in high dimensions: a geometric perspective (much of this material is now part of the textbook above)
 Alexander Barvinok’s Measure Concentration lecture notes
 Avrim Blum, John Hopcroft, and Ravindran Kannan Foundations of Data Science textbook.
 Ronald A. DeVore’s Nonlinear Approximation
 Afonso Bandeira’s Ten Lectures and FortyTwo Open Problems in the Mathematics of Data Science
 Introduction to nonparametric estimation, A. Tsybakov
 A distributionfree theory of nonparametric regression, L. Gyorfi, M. Kohler, A. Krzyzak, H Walk
 Universal Algorithms for Learning Theory Part I : Piecewise Constant Functions, P. Binev, A. Cohen, W. Dahmen, R. DeVore, V. Temlyakov
 Fourier Analysis, E. S. Stein and R. Shakarchi
 A wavelet tour of signal processing, S. Mallat
 Real Analysis, G.B. Folland
 Ten Lectures on WaveletsI, I. Daubechies
 Harmonic Analysis for Engineers and Applied Scientists, G.S. Chirikjian and A.B. Kyatkin
 Mathematics of Medical Imaging, C. Epstein
Homework sets:
 Homework 1. Due on Fri. Sep. 15th.
 Homework 2. Due on Fri. Sep. 29th.
Possible Topics for Final Project:
The final project consists in a presentation by the team (14 members) who worked on the project, each team member presenting for 6 minutes), together with a short, concise, clearlywritten report (6 single column pages would be a typical length) sent by email, with High Dimensional Approximation final report in the subject line. The report should be authored by all team members, each stating their role and contribution to the project and report. Please send me the presentation in pdf or keynote format in advance so we can proceed quickly through them.
Topics can range from Fourier and wavelets analysis to random matrices, to compressed sensing, for example:
 Random Matrices: the matrix Bernstein inequality, asymptotpic distribution of eigenvalues (Wigner’s semicircle law)
 Compressed Sensing: original Fourier, original polytopes, multiscale for imaging, for CT
 Chaining: parts of the corresponding chapter in R. Vershynin’s lecture nodes on High Dimensional Probability (see reference above)
 DvoretzkyMilman’s theorem: section 11.3.2 in R. Vershynin’s lecture nodes on High Dimensional Probability (see reference above)
 Spectral Clustering: any or all of the topics in the corresponding chapter of the Foundation of Data Science by A. Blum et al. (see reference above)
 Randomized SVD and other linear algebra algorithms, for example following this review paper
 Compressed Sensing from nonlinear observations, for example section 13 in R. Veryshnin’s Estimation in High Dimensions references above
 Windowed Fourier and its Application, for example following S. Mallat’s book on Wavelets and Signal Processing, or Daubechies’ book Ten Lectures on Wavelets
Introduction to Statistical Learning, Data Analysis and Signal Processing [Spring]
Course number: AS.110.446, EN.550.416.
A synopsis is available here.
A syllabus is available here.
Office hours: Krieger 405, Wednesday 4:155:15pm (instructor); Whitehead 212, Thursdays 4:306:30pm (Zachary Lubberts, TA).
Mid term exam: Wednesday 3/15. Extra office hours on Tuesday 24pm (Shangsi Wang) and 4pm5pm (Zachary Lubberts).
This wiki contains materials for the lectures, links to reference and references and data sets. [Not any more]
If you are unable to connect to the wiki, it is most likely because of one of the recurrent JHU network outages in my office, which makes my servers unavailable. I will transition the materials to my home server when I have time. You may try this alternative link (JHU local network only).
Synopsis of course content.
Introduction to high dimensional data sets: key problems in statistical and machine learning. Geometric aspects. Principal component analysis, linear dimension reduction, random projections. Concentration phenomena: examples and basic inequalities. Metric spaces and embeddings thereof. Kernel methods. Nonlinear dimension reduction, manifold models.
Regression. Vector spaces of functions, linear operators, projections. Orthonormal bases; Fourier and wavelet bases, and their use in signal processing and time series analysis. Basic approximation theory. Linear models, least squares. Bias and variance tradeoffs, regularization. Sparsity and compressed sensing. Multiscale methods.
Graphs and networks. Random walks on graphs, diffusions, page rank. Block models. Spectral clustering, classification, semisupervised learning.
Algorithmic and computational aspects of the above will be consistently in focus, as will be computational experiments on synthetic and real data.
Prerequisites.
Linear algebra will be used throughout the course, as will multivariable calculus and basic probability (discrete random variables). Basic experience in programming in C or MATLAB or R or Octave. Recommended: More than basic programming experience in Matlab or R; some more advanced probability (e.g. continuous random variables), some signal processing (e.g. Fourier transform, discrete and continuous).
Grading.
Grade to be based on weekly assignments, midterm and final exams, and a final team project. The final team project includes a report and the presentation of a poster on the topic of the project (typically involving studying a small number of papers and summarizing them and/or working on a data set). Weekly problem sets will focus on computational projects and theory.
Students from all areas of science, mathematics and applied mathematics, engineering, computer science, statistics, quantitative biology, economics that need advanced level skills in solving problems related to the analysis of data, signal processing, or statistical modeling are encouraged to enroll.
The Instructor is available during his office hours, in Krieger 405, on Tuesdays from 2pm to 3pm.
Our Teaching Assistant, Mr. Zachary Lubberts, is available during his office hours, in Whitehead 212 on Thursdays from 4:306:30pm
The Math help room, in Krieger 213 I believe, is staffed daily, roughly from 9am9pm. I found a schedule for last semester at http://www.math.jhu.edu/helproomschedule.pdf: there you may ask questions there about any math topic related to the course.
Homework sets:
 Homework 1. Due on Fri. Feb. 3rd.
 Homework 2. Due on Mon. Feb. 13th.
 Homework 3. Due on Mon. Feb. 20th.
 Homework 4. Due on Mon. Feb. 27th.
 Homework 5. Due on Mon. Mar. 6th.
 Homework 6. Due on Mon. Mar. 13th.
 Homework 7. Due on Fri. Apr. 7th.
 Possible topics and questions for final exam.
Math 431 – Advanced Calculus, I [Spring]
Office hours: TBA, Gross Hall This course will develop a rigorous theory of elementary mathematical analysis including differentiation, integration, and convergence of sequences and series. Students will learn how to write mathematical proofs, how to construct counterexamples, and how to think clearly and logically. These topics are part of the foundation of all of mathematical analysis and applied mathematics, geometry, ordinary and partial differential equations, probability, and stochastic analysis. Textbook: Fundamental Ideas of Analysis, by Michael Reed. The course will cover most, but not all, of the material in Chapters 16. Full Synopsis Homework sets:Math 561 – Scientific Computing, I [Fall]
Office hours: Mondays at 2:45 in Gross Hall 318
The first part of the course will cover basic numerical linear algebra, in particular matrix factorizations, solution of linear systems and eigenproblems. The suggested textbook is Trefethen and Bau’s Numerical Linear Algebra. The textbook by Heath, Scientific Computing, may be a useful references for exercises, of which it contains many.
The second part of the course will cover basic nonlinear optimization (gradient descent, Newton’s method, stochastic gradient descent), and basic notions in linear programming.
The third part of the course will cover the basics of Montecarlo algorithms.
A short syllabus is available here, and an extended one is here.
One midterm exam.
Final exam will include a takehome portion (algorithms, coding) and an inclass portion.
Homework: weekly homework.
Homework:
 Homework 1
 Homework 2
 Homework 3
 Homework 4
 Homework 5. Here is the code I used in class for least squares problems (polynomial fitting),
 Homework 6,
 Homework 7. This also includes practice questions for the midterm. The homework itself is due the week after the midterm
 Homework 8
 Homework 9
 Review exercises. Some review exercises on the second part of the course, that you may find useful for reviewing the materials (these are not homework).
The code we discussed in our last class on 11/23, that includes solving a discretized linear PDE in various ways, as well as computing eigenvalues/eigenvectors is available here
Useful links:
Sample code I used in the first class to produce some examples
John Trangenstein’s home page contains a link to his online book on Scientific Computing, as well as several useful links to programming guides for Fortrain, C, C++ and Lapack on his page for Math 225. William Allard’s home page also contains useful material, such as notes and links to online guides and materials.
Fortran tutorial: here and here.
Matlab tutorial: from Mathworks here, from the University of Florida here. Many more are available online, just use your favourite search engine to look for “matlab tutorial”.
Cleve Moler book (selected chapters): Numerical computing with MATLAB and Experiments with MATLAB.
Math 690 – Topics in statistical learning theory – [Spring]
Class: MW 10:0511:20, Location: Physics 11
Math 561 – Scientific Computing, I [Fall]
Math 561 – Scientific Computing, I – Fall 2014
Class: MW 1:252:40, Location: Gross 304B Office hours: Tue 12:001:00pm, Location: Gross 309 Here is the synopsis and its extended version in the syllabus. Review on 12/10 at noon in 304B Take home exam: I will email you a link to the exam on the night of Friday Dec. 5th. You pick a 6 hour period between then and the final inclass exam, and at the beginning of the period of your choice you download the exam and email me an electronic copy by the end of the 6 hour period. A (matching) paper copy is due at the end of inclass exam. You can use the textbook (no other books), your notes from class, and your homework. You cannot copy code directly or reuse code from your homework. Needless to say, do not discuss the problems with anybody else. The first part of the course will cover basic numerical linear algebra, in particular matrix factorizations, solution of linear systems and eigenproblems. The suggested textbook is Trefethen and Bau’s Numerical Linear Algebra. The textbook by Heath, Scientific Computing, may be a useful references for exercises, of which it contains many.Useful links:
John Trangenstein’s home page contains a link to his online book on Scientific Computing, as well as several useful links to programming guides for Fortrain, C, C++ and Lapack on his page for Math 225. William Allard’s home page also contains useful material, such as notes and links to online guides and materials. Fortran tutorial: here and here. Matlab tutorial: from Mathworks here, from the University of Florida here. Many more are available online, just use your favourite search engine to look for “matlab tutorial”. Cleve Moler book (selected chapters): Numerical computing with MATLAB and Experiments with MATLAB.Math 431 – Advanced Calculus, I [Spring]
Office hours: Mondays, 1:302:30pm, Gross Hall, Office 318 This course will develop a rigorous theory of elementary mathematical analysis including differentiation, integration, and convergence of sequences and series. Students will learn how to write mathematical proofs, how to construct counterexamples, and how to think clearly and logically. These topics are part of the foundation of all of mathematical analysis and applied mathematics, geometry, ordinary and partial differential equations, probability, and stochastic analysis. Textbook: Fundamental Ideas of Analysis, by Michael Reed. The course will cover most, but not all, of the material in Chapters 16. Full Synopsis Midterms: February 19th, and March 26th. Homework: Homework 1. Office hours moved to 11:30am, instead of 1:30pm on Monday 13th. Office: 319 Gross Chem.
 Homework 2. Office hours at 1:30pm on Monday 13th, 318 Gross Chem. [Updated pdf on 1/16/14 at 11:30pm: it contained a misspelled word and I corrected office hours in the .pdf]
 Homework 3
 Homework 4
 Homework 5
 Homework 6
 Homework 7
 Homework 8
 Homework 9
Math 790 – Random Matrices, Concentration Inequalities and Applications to Signal Processing and Machine Learning [Spring]
We discuss several related topics and techniques at the intersection between probability, approximation theory, highdimensional geometry, and machine learning and statistics. Synopsis. Wiki page
On leave.
Math 224 – Scientific Computing [Fall]
Office hours: Tue 45pm, my office is Room 293 in the Physics Bldg.
Here is the synopsis.
The first part of the course will cover basic numerical linear algebra, in particular matrix factorizations, solution of linear systems and eigenproblems, nonlinear equations in 1 dimensions. If time permits, we shall discuss recent randomized algorithms in numerical linear algebra.
Useful links:
John Trangenstein’s home page contains a link to his online book on Scientific Computing, as well as several useful links to programming guides for Fortrain, C, C++ and Lapack on his page for Math 225. William Allard’s home page also contains useful material, such as notes and links to online guides and materials.
Fortran tutorial: here and here.
Matlab tutorial: from Mathworks here, from the University of Florida here. Many more are available online, just use your favourite search engine to look for “matlab tutorial”.
Cleve Moler book (selected chapters): Numerical computing with MATLAB and Experiments with MATLAB.
Math 288 – Topics in Probability: Geometry, Functions and Learning in High Dimensions [Fall]
Here is a flier for the course.
We discuss several related topics and techniques at the intersection between probability, approximation theory, highdimensional geometry, and machine learning and statistics.
We build on basic tools in large deviation theory and concentration of measure and move to problems in nonasymptotic random matrix theory (RMT), such estimating the spectral properties of certain classes of random matrices.
We then use these tools to study metric proeprties of certain maps between linear spaces that are nearisometry, such as random projections. We then move to the setting of general metric spaces, and introduce multiscale approximation of metric spaces a la Bourgain, and also discuss tree approximations, and hint at the algorithmic applications of these ideas. Finally we move to the real of function approximation/estimation/machine learning for functions defined on highdimensional spaces. We discuss Reproducing Kernel Hilbert Spaces and learning iwth RKHS’s, and we also discuss multiscale techniques for function approximation in highdimensions. We discuss also geometric methods, both graph based (Laplacians, manifold learning) and multiscalebased. Finally, we discuss recent fast randomized algorithmic for certain numerical linear algebra computations, that use nonasymptotic RMT results discussed above.
Requirements: solid linear algebra and basic probability. Of help, but to be introduced in the course: metric spaces, function spaces, matrix factorizations.
A course wiki contains links to lecture notes, papers and other materials. May be edited by students in the class.
Math 139 – Real Analysis [Fall]
Office hours: Mon, 1:30pm3:30pm, or by appointment
Textbook: Fundamental Ideas of Analysis, by Michael Reed. The course will cover most, but not all, of the material in Chapters 16.
There will be a midterm exam, a final exam and weekly homework.
Evaluation: There will also be at least one lengthy assignment which challenges you to write carefully constructed proofs.
Your final letter grade will be based on these components weighted as follows: long assignment(s) 1015%, regular homework 2025%, midterm exam 25%, final exam 40%.
Homework is due at the beginning of class, stapled, written legibly, on one side of each page only and must contain the reaffirmation of the
Duke community standard.
Otherwise, it will be returned ungraded. The logic of a proof must be completely clear and all steps justified.
The clarity and completeness of your arguments will count as much as their correctness.
Some problems from the homework will reappear on exams. I will go over in detail the solution to any homework problem during office hours.
You may use a computational aid for the homework but I do not recommend it. Calculators and computers will not be allowed on the quizzes and exams.
The lowest homework score will be dropped. No late homework will be accepted. Duke policies apply with no exceptions to cases of incapacitating shortterm illness,
or for officially recognized religious holiday.
You may, and are encouraged to, discuss issues raised by the class or the homework problems with your fellow students and both offer and receive advice. However all submitted homework must be written up individually without consulting anyone else’s written solution.
Math 338 – Topics in Graph Theory, Random Matrices, and Applications [Spring]
A flier for the course, with summary of some of the topics.
Math 348 – Applied Harmonic and Multiscale Analysis [Fall]
Office hours: by appointment.
Students have access to a wiki and blog with materials for the course and to discuss topics and problems
The overarching theme is applied multiscale harmonic analysis, in various of its forms. Very rough outline (probably overambitious): – Basic Fourier analysis; Littlewood Paley theory (a.k.a. how to do multiscale analysis with Fourier); square functions & Khintchine inequality; applications to signal processing (audio/video) – Classical multiresolution analysis through wavelets; applications to Calder\’onZygmund integral operators and associated numerical algorithms; applications to signal processing (audio/video) – Multiscale analysis of random walks on graphs; applications to analysis of highdimensional data sets, regression and inference. – A sketch of some techniques in multiscale geometric measure theory, in particular the geometric version of square functions + concentration phenomena for measures in highdimensional spaces + results in random matrix theory. Applications to analysis of highdimensional data sets and inference, as well as numerical linear algebra.
Math 225 – Scientific Computing II [Spring]
Office hours: by appointment.
Here is the synopsis.

 Homework 1. Due Jan. 22.
 Homework 2. Due Jan. 29.
 Homework 3. Due Feb. 5th.
 Homework 4, and associated data sets. Due Feb. 17th.
 Homework 5. Due Feb. 24th.
 Homework 6. Due Mar. 19th.
 Homework 7. Due Mar. 26th.
Here are the examples that I ran in class.
 No homework for Apr. 2nd.
 Homework 8. Due Apr. 7th.
 Homework 9. Due Apr. 21st.
Cleve Moler book (selected chapters): Numerical computing with MATLAB and Experiments with MATLAB.
Math 133 – Introduction to Partial Differential Equations [Spring]
Office hours: Tuesday 34:30pm, or by appointment.
Here is the synopsis.
First test will on February 17th (usual class time). (Solution)
Reviews session will on April 23rd 4:30pm5:30pm, usual classroom. I will answer your questions.
Material will be added as the course proceeds.
Matlab tutorials are available from Mathworks,
UFL, cyclismo.org, Matlab resources and many others.
Math 378 – Minicourse – Introduction to Spectral Graph Theory and Applications [Spring]
We will discuss the basics of spectral graph theory, which studies random walks on graphs, and related objects such as the Laplacian and its eigenfunctions, on a weighted graph.
This can be thought as a discrete analogue to spectral geometry, albeit the geometry of graphs and their discrete nature gives rise to issues not generally considered in the continuous, smooth case of Riemannian manifolds.
We will present some classical connections between properties of the random walks and the geometry of the graph.
We will then discuss disparate applications: the solution of sparse linear systems by multiscale methods based on random walks; analysis of large data sets
(images, web pages, etc…), in particular how to find systems of coordinates on them, performing dimensionality reduction, and performing multiscale analysis
on them; tasks in learning, such as spectral clustering, classification and regression on data sets.
Materials:
 Code for the demo involving drawing a set a points, constructing an associated proximity graph, and computing and displaying eigenvalues/eigenfunctions of the Laplacian. You need to first download and install the general code for Diffusion Geometry, and then download and install this code for running the demo I ran in class, with some images already prepared. After installing the Diffusion Geometry package, please run DGPaths in order to set the Matlab paths correctly. The script for the demo is called GraphEigenFcnsEx_01.m, and it is fairly extensively commented. I will be happy to add your own examples here! The code works best with the Approximate Nearest Neighbor Searching Library by D. Mount and S. Arya. To install this code, simply untar in a directory and run make. This should produce the file libANN.a in the lib subdirectory. This file is already included in the Diffusion Geometry package, in the MEX directory, compiled on a Unix machine at Duke Math. Copy this library libANN.a in the MEX directory, under the directory where the Diffusion Geometry package, and from the Matlab prompt run “mex ANNsearch libANN.a” and “mex ANNrsearch libANN.a”. This will yield two .mexglx files, that are what Matlab will call. These two files are already included in the Diffusion Geometry package, after compilation on a Unix machine at Duke Math.
References:
 R. Diestel‘s book on Graph Theory is an excellent general reference. Availble online here.
 D. Spielman notes for his course on Spectral Graph Theory at Yale; several papers on specific applications, dependent on the attendant’s interests.
 D. Spielman notes on his course on Graphs and Networks at Yale. Some overlap with the above, but also other references and materials.
 F. Chung‘s book “Spectral Graph Theory”. She also wrote a book on “Complex Graphs and Networks”, mostly on random graphs and their degree distribution properties, and also some spectral results for them. Visit her homepage for lots of interesting material on graphs, spectral graph theory and its applications. In particular see the gallery of graphs.
 S. Lafon‘s web page has some cool tutorials and interactive demos on diffusion geometry.
Math 224 – Scientific Computing – Fall 2007
Office hours: Wed 4:305:30, Thu 1:102:10, or by appointment.
Here is the synopsis.
The first part of the course will cover basic numerical linear algebra, in particular matrix factorizations, solution of linear systems and eigenproblems. The second part of the course will cover nonlinear equations, numerical integration and differentiation, basic techniques for ODEs, and the Fast Fourier Transform.
Useful links:
John Trangenstein’s home page contains a link to his online book on Scientific Computing, as well as several useful links to programming guides for Fortrain, C, C++ and Lapack on his page for Math 225. William Allard’s home page also contains useful material, such as notes and links to online guides and materials.
Fortran tutorial: here and here.
Matlab tutorial: from Mathworks here, from the University of Florida here. Many more are available online, just use your favourite search engine to look for “matlab tutorial”.
Homework sets:
1 (solution), 2 (solution), 3 (solution), 4 (solution), 5 (solution), 6 (solution), 7, 8.
Partial solution to test 1.
Math 348 – Harmonic Analysis and Applications – Curr Res in Analysis – Spring 2007
Please find the synopsis here.
I plan to develop lecture notes as the course proceeds. Last update: 1/10/07.
The notes are still in a very preliminary should be downloaded and used by students of the course only, and should not be divulgated, replicated if not
for purposes related to the course.
When a more stable version will become available, certain of these restrictions will be removed.
This link will be updated regularly. Right now they are in an extremely preliminary state, and at times they may not even be accessible through the link provided.
A list of topics for presentation suggested for the course (by instructor or students), and the students currently working on them is available
here.
Presentations by students:
 Statistical Approach to Wavelet Shrinkage, Simon Lunagomez
 Semisupervised Learning on Graphs, Chungping Wang
 Markov Decision Processes, Rachel Thomas
 The Fast Multipole Method, Veronica Rozmiarek
 Multiscale Reconstruction of Hyperspectral Data , Kalyani Shivakumar and Cristina Fernandez.
 Stochastic Filtering, Zachary Harmany and William Lee.