NYC Data Science Corporate blog

What we learned from first offering NYC R programming class

Written by Vivian Zhang | Jan 18, 2014 1:43:24 AM

(The photo was from our first offering of R classes)

We are going to offer our Data Science by R (beginner level) course again in February. The goal of this class is to get students to a point where they are self-sufficient in R, are proficient at analyzing data and can take these skills back to their full-time jobs. You may sign up at https://www.meetup.com/NYC-Data-Science-Academy/events/148820532/or www.nycdatascience.com

We had a great first round of this course and are going to keep most of it the same this time around.

From the feedback we received from our students, it was apparent that their favorite parts of the course were the class exercises and short practice problems that we provided them with. Small problems can quickly reinforce learning, and are a proven method in introducing ideas. We demonstrated how to use R to solve real life problems such as tracking the availability of New York City Citibikes and racks at each station (Postgres database API and real time XML file), evaluating the performance of Knicks and other teams, generating local weather reports based on one’s IP address, scraping web pages through xml path or table structures , and etc. Moreover, our students learn how to craft inspiring visuals using Shiny and Rcharts. We also utilize Project Euler exercises.

In the 20 hours of the first offering of R course, we went over one specific topic each class. Days 1 and 2, we went over the programming basics of R. These include: data objects (arrays, matrices, data frames, lists), functions, loops, if-else statements, and vectorized operations. Day 3 we went over how to extract data from a web page, APIs, database portals, and reading excel files. Day 4, we focused on data manipulation, such as basic transformation (data sorting and merging, summarizing data, subsetting, and string manipulation), reshaping data, splitting and combining data and data aggregation. Day 5 we covered visualization with lattice and ggplot2, how to make maps, scatter plots, matrix-related plots and making publications ready and polished.

In our upcoming 35 hour course, we plan to make a few adjustments. First, we will give more in-class and homework exercises. As always, our goal is to solidify student understanding and retention of all key concepts and skills. Additionally, we will introduce more statistical analysis. This includes: basic statistical testing, regression and principal component analysis. If we get through the material early, we will cover decision trees, k-means clustering and other mainstream machine learning/data mining techniques.

After each course, we collect session feedback from each student to make improvements where they can be made. We encourage students to participate on our class dashboard, Piazza, to post questions, help to answer questions and share useful resources related to R programming. To share our desktop screen with students and invite students to share their solutions and interact with others we employ join.me. Students are always encouraged to ask questions; both general and specific.

The dream is to inspire our students to never stop learning.

If you are interested in learning more about data mining, you are strongly recommended to take our Data Science by R (intermediate level) from Mar 8th to April 15th https://www.meetup.com/NYC-Data-Science-Academy/events/152015792/

Vivian encourages all the students to join NYC Open Data Meetup group as supplementary study. This group offers free workshops every Monday and Thursday. All the workshop material (including video, slides, source code and attendants list) are on its website https://www.nycopendata.com. The topics includes programming by R, python, tableau, processing, node.js, D3.js, Angular, GitHub, location data query, iOS programming, Google fusion, Gephi and dedicated talks covering social media analytics, graph theory, social network, big data visualization, census data processing, data science for social good, health care open data projects, open data author panel, young coder panel, policy in practice talk series, kaggler talk series, citibike hack session, NYC open data portal intro, Data network, and Interactive and reproducible reporting and etc.

Frequently asked questions:

--What is the philosophy of this class?

We hope to strike a balance between breadth and depth. We will cover a range of topics, but in each we will focus on intuition -- learning why R does certain things is important to writing robust R code that can be trusted.

--Is this class designed for beginner?

Yes, we hope this class will help beginners to get started with R. In addition, to strive to make in-class and homework exercises challenging enough for those who have some programming background or some R experience. Every student is expected to read one to two introductory level R books before the class starts. It is required pre-work.

--What if I know a little R, will this class be helpful?

Yes, if you know a little, day 1 will be smooth for you. However, you still have to work hard to learn the majority of the content.

--What if I know some other programming language, will I pick up R faster?

Yes, and we will give extra challenging in-class exercises and homework to you.

--What do you expect from a student? And how can I get the most out of this class?

This course is meant to be fast-paced. Students are expected to review the slides and to work on exercises between sessions. Students should feel comfortable working in groups and participating in class. We hope you can sign up for the class at least 2 weeks before it starts and do some pre-work, including reading and online resources of R classes.

--How fast is your class? How much work I need to do?

In order to cover both breadth and depth, this class will move at a fast pace. We will cover around 70 slides each session and ask that students work on exercises between sessions; perhaps even for material we weren't able to get to during the class. We give slides out a week before each class. And you are expected to read and try the codes before the class.

--4 to 7 hours course time is pretty long, how can you help me to stay focus and be productive?
To keep students engaged, the class will rotate between presentation of slides and application of what was learned in exercises. The in-class exercises start with simple modifications of what was presented in the slides and build up to requiring more creative activities.
Students will be encouraged to work in groups, which is meant to give them practice with working in a team environment. The slides and exercises offer them practice with both built-in R datasets and other commonly used datasets such as the World Development Indicators.