Introduction to Data Science: CptS 483-06, Fall 2015

Schedule and Lecture Material

I will use the online portal OSBLE ( for posting lecture materials, assignments, class related announcements, etc, and handling submissions. On this page, I will maintain an overview of the schedule as the course proceeds.

Legend: DDS = Doing Data Science (O'Neil and Schitt), MLPP = Machine Learning: A Probabilistic Perspective (Murphy), MMDS = Mining of Massive Datasets (Leskovek, Rajaraman and Ullman), ISL = Introduction to Statistical Learning with Applications in R (James, Witten, Hastie and Tibshirani), NCM = Networks, Crowds and Markets (Easley and Kleinberg).

Mon, Aug 24 Course overview Motivation; Syllabus walk-through; Course work - Reading: Syllabus | Preface of DDS.
Wed, Aug 26 What is Data Science Big Data and Data Science hype and getting past the hype; Why now; Current landscape of perspectives - Reading: Chap 1 of DDS | Slides.
Fri, Aug 28 What is Data Science II Skill sets needed; Data scientist in industry; Data Scientist in Academia - Reading: Chap 1 of DDS | Slides.
- Assignment 1 out, Due Sep 3.
Mon, Aug 31 Statistical Inference Processes and data; Populations and samples; Population/sample vis-a-vis Big Data - Reading: Chap 2 of DDS | Slides.
Wed, Sep 2 Statistical Inference II Statistical Modeling; Probability Distributions; Fitting a Model - Reading: Chap 2 of DDS.
Fri, Sep 4 Power Laws and Lognormal Distributions Power Laws; Generative Models for Power Laws; Lognormal Distributions; Applications - Reading: Slides | Mitzenmacher article (posted).
Mon, Sep 7 No Class Labour Day
Wed, Sep 9 Exploratoy Data Analysis EDA: Approach, Tools, Philosophy. Contrast with conformatory data analysis. - Reading: Slides | Chap 2 of DDS.
Fri, Sep 11 R Graphics; misc topics - Reading: "R-Resources" document (posted).
- Assignment 2 out; Due Sep 16.
Mon, Sep 14 The Data Science Processs The process and roles of a data scientist
Feedback on Assignment 1.
- Reading: Slides | Chapter 2 of DDS.
Wed, Sep 16 Machine Learning Overview Supervised and unsupervised learning; Examples; Real-world Applications - Reading: Slides | Chap 1 of MLPP (posted).
Fri, Sep 18 Linear Regression Least Squares Estimation. - Reading: Chap 3 of DDS | White-board lecture notes.
Mon, Sep 21 Linear Regression II Extending beyond least squares. - Reading: Chap 3 of DDS | White-board lecture notes.
Wed, Sep 23 Linear Regression III - R Lab Worked with baseball dataset from "nutshell";
Simulated dataset exercise from DDS (page 70)
- Reading: Chap 3 of DDS.
Fri, Sep 25 k-Nearest Neighbors General idea; KNN process; distance metrics; evaluation metrics. - Reading: Slides | Chap 3 of DDS.
Mon, Sep 28 k-means Clustering (what, why, applications); k-means as a clustering method (how it works, properties and limitations) - Reading: Slides.
Wed, Sep 30 Class cancelled
Fri, Oct 2 Feedback on Assignment 2 Discussion of solution - Assignment 2 graded.
Mon, Oct 5 Naive Bayes Motivating application (spam filters); Bayes Theorem; Basics of Naive Bayes - Reading: Slides | Chapter 4 of DDS.
Wed, Oct 7 Naive Bayes II Details of Naive Bayes formulation; Laplace smoothing; More applications - Reading: Posted article | Slides | Chap 4 of DDS.
Fri, Oct 9 Data Science Seminar Attended the lecture in the Distinguished Speaker Series in Data Science given by Lorenz Biegler in lieu of the regualr class.
Mon, Oct 12 Data Wrangling Data Science Ecosystem (Data Sources, Data Wrangling, and Data applications). - Reading: Slides | The Ecosystem article linked in the slides.
Wed, Oct 14 Data Wrangling II
(with a focus on R)
Data Cleaning, Data Reshaping, Data Integration.
dplyr, tidyr.
- Reading: Slides | Data-wrangling-cheatsheet (posted).
Fri, Oct 16 Extracting Meaning from Data: Feature Generation Background; Motivating Application; Feature Generation. - Reading: Slides | Chap 7 of DDS.
Mon, Oct 19 Feature Selection Filters, Wrappers (best subset selection, stepwise forward selection, stepwise backward selection - Reading: Slides | Guyon03 Article, sec 1-3 (posted) | Chap 6 of ISL, just sec 6.1 (posted)
Wed, Oct 21 Semester project Finished discussion on wrappers.
Discussed semester project (what it is and how it is set up.)
- Assignment 3 graded.
- Project Description went out. Proposal Due Oct 30. Final Report Due Dec 4.
Fri, Oct 23 Decision Trees and Random Forests Decision trees; entropy; bagging; random forests; Decision trees in R. - Reading: Slides | Chap 7 of DDS.
Mon, Oct 26 Recommendation Systems A model for recommendation systems.
Content-based recommendations.
- Reading: Chap 9 of MMDS (posted).
Wed, Oct 28 Recommendation Systems II Collaborative filtering. ML algorithms. - Reading: Slides | Chap 8 of DDS.
Fri, Oct 30 Recommendation Systems III Dimensionality Reduction. SVD, UV-decomposition, Alternate Least Squares Algorithm. - Reading: Slides | Chap 9 of MMDS.
- Project Proposal Due.
Mon, Nov 2 Mining Social Networks: Graph Theory Essentials Graphs and networks; network types; degrees and paths. - Reading: Slides | (Optional): Chap 2 of the Network Science book by Barabasi (posted).
Wed, Nov 4 Mining Social Networks II: Graph Theory Continued Connectedness; Components.
Network analysis tools, resources and projects.
- Reading: Slides | Documents and websites linked to under "Resources".
Fri, Nov 6 Mining Social Networks III: Community structures in networks Strong ties and Weak Links; Girvan-Newman community detection algorithm; Modularity. - Reading: Slides | Chap 3 of NCM (posted)
Mon, Nov 9 Mining Social Networks IV: Centrality Centrality measures around distances and neighborhoods (degree centrality, eccentricity, closeness centrality); Centrality measures around shortest paths (betweenness centrality); Feedback centrality (Katz Index and eigenvector centrality). - Reading: Slides
Wed, Nov 11 No Class Veteran's Day
Fri, Nov 13 Video: The Joy Of Stats 1 hr BBC Documentary from 2011 featuring Hans Rosling
Mon, Nov 16 Data Visualization Telling Story with Data. Handling Data. Choosing Tools to Visualize Data. Visualizing Patterns Over Time. - Reading: Slides | Resources linked in the slides
- Reference/source: Visalize This by Nathan Yau.
Wed, Nov 18 Class cancelled Due to power outage
Fri, Nov 20 Data Visualization II Visualizing Proportions. Visualizing Relationships. - Reading: Slides | Resources linked in the slides.
- Reference/source: Visalize This by Nathan Yau.
- Additional Reading: Chap 9 of DDS.
- Assignment 4 went out. Due Nov 30.
- Project Report Writing Guideline went out.
Nov 23 -- 27 Thanksgiving Break
Mon, Nov 30 Project Presentations 1. Yunshu Du and Bei Peng
2. John Choi
3. Andrew Bates
4. David Hoekman and Luke Darrow
- Assignment 4 Due.
Wed, Dec 2 Project Presentations 1. Gabriel De la Cruz Jr, Matthew Godmere and James Irwin
2. Fuat Arslan and Helen Catanese
3. Armen Abnousi and Esna Ashari Esfahani Zhila
4. Roy Davjeet and Justin Hubbard
5. Lu Hao
Fri, Dec 4 No Class (Traveling) - Project Report Due.
Mon, Dec 7 Project Presentations 1. Alicia Sellsted
2. Mohamed Abdi, Viiresh Duuvvuti and Shivam Goel
3. Shashvath Bhaskar, Rajveer Singh and Sai Vignesh Yallanagari
4. Lin Peng and Yang Liangliang
5. Dylan Liebhauser
Wed, Dec 9 Course Review
Thur, Dec 17 Final Exam 1--3pm, Sloan 163