I will use the online portal OSBLE+ (https://plus.osble.org) for posting lecture materials, assignments, class related announcements, etc, and handling submissions. On this page, I will maintain an overview of the schedule as the course proceeds.

Legend:

DDS = Doing Data Science (O'Neil and Schutt)

ISLR = An Introduction to Statistical Learning with Applications in R
(James, Witten, Hastie and Tibshirani)

MLPP = Machine Learning: A Probabilistic Perspective (Murphy)

MMDS = Mining of Massive Datasets (Leskovek, Rajaraman, and Ullman)

Date | Topic | Details | Comments |
---|---|---|---|

Mon, Aug 22 | Course overview | Motivation; Syllabus walk-through; Course work | - Reading: syllabus |

Wed, Aug 24 | What is Data Science? | Big Data and Data Science hype and getting past the hype; Why now; | - Reading: Slides | Chap 1 of DDS. |

Fri, Aug 26 | What is Data Science? Part II | Current landscape of perspectives; skill sets needed; data scientist in industry; data scientist in academia. | - Reading: Slides | Chap 1 of DDS. - Reading: Vasant Dhar's 2013 Comm of the ACM article. - Pre-course survey out, due Aug 31. |

Mon, Aug 29 | Statistical Inference | Processes and data; Populations and samples; Population/sample vis-a-vis Big Data | - Reading: Slides | Chap 2 of DDS. |

Wed, Aug 31 | Lecture on R | R Resources overview; Getting and installing R | - Reading: R-resources document |

Fri, Sep 2 | Lecture on R | R Basics | - Reading: Slides | R scripts - Note: There will also be Shira Broschat's tutorial 3--4:30 (also in Sloan 5). |

Mon, Sep 5 | No Class | Labour Day | |

Wed, Sep 7 | Lecture on R | R Graphics | - Reading : Slides | R scripts and datasets |

Fri, Sep 9 | Statistical modeling | Probability distributions; fitting a model | - Reading : Slides | Chap 2 of DDS. |

Mon, Sep 12 | Power-Law Distributions and Normal Distributions | Properties of power laws; Generative models for power laws; Power-law distributions vs normal ditributions. | - Reading : Slides - Further reading 1: Mitzenmacher's article on generative models (Internet Mathematics, 2004). - Further reading 2: Lada Adamic's ranking tutorial on Zipfs, Power-laws and Pareto. |

Wed, Sep 14 | Exploratory Data Analysis | EDA: Approach, Tools, Philosophy. Contrast with Confirmatory Data Analysis. | - Reading : Slides | Chap 2 of DDS. |

Fri, Sep 16 | The Data Science Process. | Components of the Data Science Process and how they interrelate; Roles of the Data Scientist. | - Reading: Slides | Chap 2 of DDS. - Note: Assignment 2 is out. Due 9/23 by 6pm. |

Mon, Sep 19 | Machine Learning Overview | Supervised and unsupervised learning; Examples; Real-world applications | - Reading: Slides | Chap 1 of MLPP (posted) |

Wed, Sep 21 | Linear Regression | Simple linear regression; least squares coefficient estimates; assessing the accuracy of the coefficient estimates; assessing the accuracy of the model | - Reading : Slides | Chap 3 of ISLR. |

Fri, Sep 23 | Linear Regression | Multiple linear regression; Qualitative (discrete-valued) predictors; interactions; nonlinear relationships | - Reading : Slides | Chap 3 of ISLR. |

Mon, Sep 26 | Linear Regression | Linear Regression Lab Session | - Reading : R-script posted. |

Wed, Sep 28 | k-Nearest Neighbors | General idea; KNN process; distance metrics; evaluation metrics. | - Reading: Slides | DDS (pages 71--82) |

Fri, Sep 30 | k-Means | Clustering (what, why, applications); k-means as a clustering method (how it works, properties and limitations) | - Reading: Slides. - Note: Assignment 3 went out. Due 10/09/2016 by 6pm. |

Mon, Oct 3 | Hierarchical clustering | Hierarchical clustering idea, algorithms, examples. | - Reading: Slides | Sec 10.3 of ISLR. |

Wed, Oct 5 | Principal Components Analysis | What are principal components; computation of principal components; geometric interpretation; illustration; R-lab session. | - Reading: Slides | Sec 10.2 of ISLR. |

Fri, Oct 7 | Status Review | Review of topics; feedback on assignment 2; preview of upcoming topics | |

Mon, Oct 10 | Data Wrangling I | Data cleaning, data reshaping | - Reading: Slides - Note: Assignment 3 went out. Due 10/15/2016 by 6pm. |

Wed, Oct 12 | Data Wrangling II | Data integration, data reduction | - Reading: Slides |

Fri, Oct 14 | Data Wrangling Lab | dplyr, tidyr | - Reading: posted R codes. |

Mon, Oct 17 | Naive Bayes classifier | Basic idea, the algorithm, examples of application | - Reading: Slides | Chapter 4 of DDS. |

Wed, Oct 19 | Semester Project Set Up | Description; Requirements | - Reading: Project Description document. - Note: project proposal went out. Due October 28. |

Fri, Oct 21 | Project Ideas discussion | A set of 10 ideas presented; own proposal welcomed. | - Reading: Project Ideas document. |

Mon, Oct 24 | Feature Generation | Background (data science competitions, crowdscourcing); Feature generation general approaches (brain storming, imagination, domain expertise). | - Reading: Slides | Chap 7 of DDS |

Wed, Oct 26 | Feature Selection | Filters; Wrappers (best subset selection, stepwise forward selection, stepwise backward selection) | - Reading: Slides | Chapter 8 of ISLR |

Fri, Oct 28 | Decision Trees and Random Forests | Decision trees; entropy; bagging; random forests; Decision trees in R. | - Reading : Slides | Chapter 8 of ISLR |

Mon, Oct 31 | Recommendation Systems | Motivation; collaborative filtering; ML algorithms | - Reading : Slides | Chap 8 of DDS |

Wed, Nov 2 | Recommendation System II | Dimensionality reduction: SVD and UV decomposition. | - Reading : Slides | Cahp 8 of DDS | (Optional: Chap 9 of MMDS) |

Fri, Nov 4 | Data visualization | Telling story with data; choosing tools to visualize data; visualizing patterns over time. | - Reading: slides |

Mon, Nov 7 | Course review | for mid-term preparation | - Reading: Study guide |

Wed, Nov 9 | Project consultation | ||

Fri, Nov 11 | No class | Veterans Day | |

Mon, Nov 14 | Mid-Term Exam | ||

Wed, Nov 16 | Data visualization II | Visualizing proportions; visualizing relationships; visualizing text information | - Reading : Slides |

Fri, Nov 18 | Social Network Analysis: Centrality | Motivating examples for various centrality metrics | - Reading : Slides |

Nov 21 -- 25 | Thanksgiving Break
| ||

Mon, Nov 28 | Centrality II | formal metrics: degree centrality; eccentricity; closeness/transmission centrality; betweenness centrality; Katz index | - Reading: Slides |

Wed, Nov 30 | Ethics and course wrap-up | Look-back at topics; next-gen data scientists; a word on ethics. | - Reading : Slides | Chap 16 of DDS. |

Fri, Dec 2 | Project Presentation | 1. Abdu Sayed Chowdhury and Mukti Sharma Predicting Emotions, Sentiments and Demographics from Tweets 2. Md Kamruzzaman and Siyang Li Analysis of Crop Phenotypic Behavior 3. Yang Hu, Yang Zheng and Yuan Zhi Accurate Recovery of Missing Values in PMU Measurements | |

Mon, Dec 5 | Project Presentations | 1. Zachary Allen, Carla De Lira and Siddhant Srivastava Analysis and Visualization of Hashtags in Twitter Data 2. Ehdieh Khaledian and Anand Raghuraman Using the Twitter API to capture Tweets 3. Gridhar Manoharan and Aditi Deepak Thuse Analytics of Bank Marketing Data
| |

Wed, Dec 7 | Project Presentations | 1. Keegan Caruso and Adam Skoog Article Classification 2. Aidan Lancaster and Jared Meade Article Classification 3. Insun Lee, Kim Nguyen and Chao Zeng Food clustering
| |

Fri, Dec 9 | Project Presentations | 1. Mohammad Hossein Namaki and Keyvan Sasani Predicting Response Times of Top-k Graph Queries 2. Mario Migliacio and Nehemia Salo Sport Analysis 3. Dustin Crossman and Kayl Coulston Modeling and Analysis of Customer Review Data
| |

Mon, Dec 12 | Project Report | Due by 2pm. |