1. I provide a practical introduction to using logistic regression for prediction (binary classification) using the Titanic data competition from Kaggle.com as an example. I use models to predict in missing data, estimate a logistic regression model on a training data set, and use the estimated model to predict survival on a test data set. The video covers just about everything you need to know to estimate, predict, and evaluate logistic regression models in R.

    # vimeo.com/69269984 Uploaded 8,141 Plays 5 Comments
  2. I demonstrate how to use a GBM in R for binary classification (predicting whether an event occurs or not). I also discuss basic model tuning and model inference with GBMs.

    # vimeo.com/71992876 Uploaded 17.1K Plays 1 Comment
  3. This presentation was given to the NYC Open Statistical Computing Meetup by Hadley Wickham, Assistant Professor of Statistics at Rice University, and creator of many of the most popular R packages in CRAN.

    It's often said that 80% of the effort of analysis is spent just getting the data ready to analyse, the process of data cleaning. Data cleaning is not only a vital first step, but it is oftenrepeated multiple times over the course of an analysis as new problems come to light. Despite the amount of time it takes up, there has been little research on how to do clean data well. Part of the challenge is the breadth of activities that cleaning encompasses, from outlier checking to date parsing to missing value imputation. To get a handle on the problem, this talk focusses on a small, but important, subset of data cleaning that I call data "tidying'": getting the data in a format that is easy to manipulate, model, and visualise.

    In this talk you'll see some of the crazy data sets that I've struggled with over the years, and learn the basic tools for making messy data tidy. I'll also discuss tidy tools, tools that take tidy data as input and return tidy data as output. The idea of a tidy tool is useful for critiquing existing R functions, and will help to explain why some tasks that seem like they should be easy are in fact quite hard. This work ties together reshape2, plyr and ggplot2 with a consistent philosophy of data. Once you master this data format, you'll find it much easier to manipulate, model and visualise your data.

    # vimeo.com/33727555 Uploaded 12.6K Plays 1 Comment

Data Science


Browse This Channel

Shout Box

Heads up: the shoutbox will be retiring soon. It’s tired of working, and can’t wait to relax. You can still send a message to the channel owner, though!

Channels are a simple, beautiful way to showcase and watch videos. Browse more Channels.