Many thanks go to Thomas Levine for giving such a great workshop!
http://thomaslevine.com/!/data-about-open-data-talk-december-2-2013/
----------------------------------
Thomas Levine has downloaded 100,000 datasets from 100 open data portals, and this is what he learned.http://thomaslevine.com/open-data
He talked about all aspects of how he did this, and downloading was, of course, a big part of that. Here were two repositories that you could link to if you like. They lacked comprehensible documentation, though.
https://github.com/tlevine/socrata-download
https://github.com/tlevine/socrata-analysis
Playing with computers since he was young, Thomas Levine eventually developed back and wrist pain, so he started studying ergonomics and conducting quantitative ergonomics research. Then he realized that he’d accidentally become a data scientist. And his back and wrists now hurt less. He also has a band called CSV Soundsystem that makes music from spreadsheets.
For the first half of the session, he would talk about what he did and what he learned.
After that, he talked in more detail about how to conduct an analysis like this. The specifics depended on what interested participants,but topics could include
- Planning complicated data workflows/pipelines
- Storing data
- Tricks for making things run faster
In addition, He also talked a bit about brainstorming and six thinking hats. Then people did a couple of exercises.
- Choose an open data catalog. Diagram how a person could manually download all of the datasets. Then change the labels in the diagram so that it describes a computer program that downloads the datasets.
- Select a guideline from one of these lists, and brainstorm ways of testing it.
----------------------------------
You could try one exercise before you begin to see more details about this workshop.
http://thomaslevine.com/%21/data-about-open-data-talk-december-2-2013/#exercises
The first step in becoming a data scientist is to complete your Data Science Bootcamp Application. Just click the button to apply. It's free and will only take you about 5 minutes.