Your call-to-action heading goes in here. Make it a show stopper!
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.
Posted by Vivian Zhang
Updated: Oct 13, 2015
Data Science is built around 3 concepts: programming, statistics, and domain expertise. In preparing this prework guide, we focused on establishing a strong programming foundation that can later be enriched with statistical learning methods so you can apply them to your own domain expertise.
The focus of this guide is on R and Python. R is a statistical programming language developed by statisticians and Python is a more general programming language used in a variety of disciplines for solving a wide range of problems. Our curriculum focuses on leveraging both languages so their strengths can balance out each otherâs weaknesses and also prepare you to work as a data scientist in either environment.
We believe in the power of free and open-source content. The tech and data analytics communities are both moving towards a more democratic access to technology and learning. The bulk of this content will focus on these technologies with a short section at the end dedicated to proprietary products.
Depending on your experience, you can determine which sections will require more of your time. We have placed time guides to help you understand how much emphasis and time you should spend on each section. For bootcamp students, we want you to at least have read âAn Introduction To Statistical Learningâ and have had some exposure to the command line, git, and foundational knowledge in stats, Python, and R.
You should install R and Python on your computer.
In addition to R, you should also install RStudio which will be used as an integrated development environment (IDE) that makes programming in R faster and easier.
We will be using Python 2 since it will take several years for Python 3 to fully replace it. Python 2 is still used, and will be continued to be used across the data science industry until enough libraries have been updated and the new working ecosystem is ensured to be stable. The Anaconda distribution of Python contains most of the libraries you will need to get started.
2 to 4 hours
(The Command Line Crash Course)[http://cli.
2 to 4 hours
Version Control is a system that allows you to track changes and recall previous versions of old files. In the data science and tech communities, Git and Github have become the industry-wide standard. In many ways, Github accounts have become equivalent to technical resumes, especially for those who are trying to break into the industry.
5 to 8 hours
Make sure you understand these basic ideas:
Population / Sample
Distributions
Discrete / Continous
Distributon Functions
Null Hypothesis / Alternative Hypothesis / P-value
Mean / Variance / Skewness / Kurtosis / Percentile / Quantile
T-Test / F-Test / Chi-Square Test / ANOVA / Normality Test
If you have more time:
20 to 40 hours
Book Recommendations
20 to 40 hours
We recommend this book above all others. The book is available online for free or you can buy the book.
As much time as you have, after finishing the prior sections.
Prediction / Inference
Parametric / Nonparametric
Supervised / Unsupervised
Linear Regression / Logistic Regression
Regression / Classification
Optional
This section is not required and is solely to give you an idea of the proprietary platforms that exists should you read about or hear them in a conversation.
The first step in becoming a data scientist is to complete your Data Science Bootcamp Application. Just click the button to apply. It's free and will only take you about 5 minutes.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.
Vivian is the CTO and School Director of NYC Data Science Academy and CTO of SupStat. With her extensive experience working in the data science field, she developed expertise in multiple programming languages, including R, Python, Hadoop, and Spark. In August 2016, Forbes ranked her amongst one of the nine women leading the pack in data analytics. In 2013, she created the NYC Open Data Meetup group, which stands as one of the largest data science communities offering meetups, conferences, and a weekly newsletter. In her spare time, Vivian enjoys meeting people and sharing her motivational stories with our students and other professionals
View all articlesTopics from this blog: Community Data Science News and Sharing
Answer 3 Simple Questions and Get Immediate Course Recommendations.