Hadoop Workshop I: Configure Your First Hadoop Cluster on Amazon EC2

Hadoop Workshop I: Configure Your First Hadoop Cluster on Amazon EC2

Posted by Vivian Zhang

Updated: Apr 8, 2014

I was so happy to get many upvotes!


Many thanks go to Conductor Inc (Conductor makes the most widely used SEO platform - empowering enterprise marketers to take control of their search performance.)

Special thanks go to Caitlin Wilterdink, Jon Torodash, and Chris Lee (now Googler) for hosting us and giving us the wonderful space and assistance!


NYC Data Science Academy is offering two relative courses:
RSVP Hadoop Beginner level classes
RSVP Hadoop Intermediate level classes

The Intermediate level week 1 slides:


More info about this event on meetup.We followed the Tutorial repo during this workshop.

Here is a link with info that will help Windows users connect to EC2 instances using Putty for ssh.
(Thank Mandy for windows putty link)

You can also watch the videos to learn


Meetup announcement:
Speaker: Vivian Zhang, CTO and co-founder of SupStat Inc, organizer of NYC Open Data Meetup, Founder of NYC Data Science Academy. She teaches R and Hadoop.

Her data school hires the best working professionals to teach Python, D3.js and related Data Science skills. All the courses are designed to teach you employable skills. We teach the skills and toolkits in the class and assist you to do projects of students' own choice. Students will showcase their projects in this meetup group at the end of their courses.

Outline:
In Hadoop workshop I and II, I will walk you through the steps to configure a Hadoop cluster on Amazon EC2 and run two simple map-reduce jobs on the cluster.

Preparation:
1. Sign up for Amazon AWS acct
2. Get familiar with basic vi commands (if you don't know it, I can show you quickly. You are welcome to read more before coming.)
3. You don't need to know Java at this moment. If you know Java, you can program in Hadoop quickly in later workshops.

Vivian Zhang

Vivian is the CTO and School Director of NYC Data Science Academy and CTO of SupStat. With her extensive experience working in the data science field, she developed expertise in multiple programming languages, including R, Python, Hadoop, and Spark. In August 2016, Forbes ranked her amongst one of the nine women leading the pack in data analytics. In 2013, she created the NYC Open Data Meetup group, which stands as one of the largest data science communities offering meetups, conferences, and a weekly newsletter. In her spare time, Vivian enjoys meeting people and sharing her motivational stories with our students and other professionals

View all articles

Topics from this blog: Hadoop Meetup

Interested in becoming a Data Scientist?

Answer 3 Simple Questions and Get Immediate Course Recommendations.