Kaggle Talk Series: Top 0.2% Kaggler on Amazon Employee Access Challenge

Kaggle Talk Series: Top 0.2% Kaggler on Amazon Employee Access Challenge

Posted by Jun Zhao

Updated: Aug 13, 2013

 

Many thanks go to Knewton for providing the space for this event!

Special thanks go to Yibo Chen for giving such a great workshop!

 

Slides:

--------------------------------

Meetup Announcement:

This is a step by step intensive 2 hours demonstration session of model building which focus on a ongoing kaggle competition.

Speaker:

Yibo Chen is a data analyst with experience in model building such as response model in CRM and credit score card. Recently he is interested in Kaggle's competitions. After participating in some of these competitions, he has learned some knowledge about the data mining, and also get a score not very bad (currently 231st of 104993 data scientists).

Outline:

Introduced our solution to the Amazon Employee Access Challenge.

  • Feature engineering(extraction and selection)

  • Modeling techniques

    Use classifiers including Gradient Boosting Machine, Random Forest, Regularized Generalized Linear Models and Support Vector Machine)

  • Ensembles.


Use stacking based on 5-fold cv for combining predictions of the base learners. The software we use is R (2.15.1) and some add-on packages including gbm, randomForest, glmnet, kernlab and Matrix.

--------------------------------

Other Useful Info Link:

Reference:

The Kaggle competition The Hewlett Foundation: Short Answer Scoring and the winners' solutions. http://www.kaggle.com/c/asap-sas/details/winners

Apply for the Upcoming NYC Data Science Bootcamp

The first step in becoming a data scientist is to complete your Data Science Bootcamp Application.  Just click the button to apply.  It's free and will only take you about 5 minutes.

 

Apply to NYC Data Science Bootcamp

Topics from this blog: Kaggle Meetup

Interested in becoming a Data Scientist?

Answer 3 Simple Questions and Get Immediate Course Recommendations.