Forest Cover Type Machine Learning Study
Posted by Thomas Kolasa
Updated: Mar 22, 2016
Contributed by Aravind Kolumum Raja and Thomas Kolasa. They are currently in the NYC Data Science Academy 12 week full time Data Science Bootcamp program taking place between January 11th to April 1st, 2016. This post is based on their fourth class project - machine learning (due on the 8th week of the program).
For our machine learning project, we classified forest cover types for data from Roosevelt National Forest in northern Colorado. The data consist of over forty categorical and continuous environmental and cartographical variables. With more than half a million observations, the data is apt for machine learning. We created visualizations, logistic regressions, neural networks, tree based methods, and ensembling using R.
Due to formatting constrictions involving three-dimensional plots and the size of the HTML, we published our blog post at the following URL:
After working in econometric consulting, Thomas began learning programming in order to pursue data science: the perfect combination of his interests in computer science, statistics, and business strategy. Thomas earned his B.A. in economics from Harvard University where he conducted research at the National Bureau of Economic Research (NBER). Although retired from world cup fencing, he still occasionally spars when he finds free time.View all articles