Stop Thinking, Just Do!

Sung-Soo Kim's Blog

Top 50 Data Science Resources


19 August 2015

Article Source

Top 50 Data Science Resources

The Best Blogs, Forums, Videos and Tutorials to Learn All about Data Science

The field of data science is constantly evolving and ever-advancing, with new technologies placing more valuable insights in the hands of modern enterprises. More data-driven organizations are hiring data scientists to drive their efforts to gather, analyze, and make use of Big Data in valuable ways.

Because the field of data science is so broad and sometimes challenging to navigate, we’ve compiled a list of 50 of the most helpful data science resources on the web. Whether you’re a student or new professional working in the field of data science, these resources are valuable for discovering the latest employment opportunities, finding tutorials for the processes and systems you’re using on a daily basis, learning hacks and tricks to boost your performance, and connecting with other professionals in your field.

Note: The following 50 resources are not ranked or rated in order of importance or value; rather, they are categorized to make it easy for you to locate the resources you need most. Click through to a specific category using the links in the Table of Contents below.

Table of Contents:

##Best Data Science Blogs

1. Edwin Chen’s Blog @echen


Edwin Chen is a San Francisco Bay-area data scientist who has worked for companies like Dropbox, Microsoft, and Clarium Capital Management and studied at MIT. Chen blogs on topics of interest to data scientists, including tutorials on crowdsourcing, modeling, moving beyond CTR with human evaluation, and more.

Three posts we like from Edwin Chen: 

2. Machine Learning (Theory)


John Langford, Director of Learning at Microsoft Research, manages this collaborative machine learning blog. Langford shares his knowledge and personal insights on learning theory, covers conferences and related events, and discusses everything from neuroscience to prediction theory, problems, reduction, and of course, machine learning.

Three posts we like from Machine Learning (Theory): 

3. FastML @fastml


FastML is “meant to tackle interesting topics in machine learning while being entertaining and easy to read and understand.” Run by Zygmunt Zając, FastML was born from a frustration with papers and documentation that aren’t easily understood by the average user who lacks both the time and interest in becoming a PhD-level expert in every machine learning topic. In other words, FastML breaks down technical material in an easy-to-understand manner.

Three posts we like from FastML: 

4. Statistical Modeling, Causal Inference, and Social Science @StatModeling


Statistical Modeling, Causal Inference, and Social Science is a blog run by Andrew Gelman, a professor of statistics and political science and director of the Applied Statistics Center at Columbia University.Topics include causal inference, decision theory, multilevel modeling, statistical computing, and statistical graphs, as well as other topics of interest to Gelman such as public health, sociology, and political science.

Three posts we like from Statistical Modeling, Causal Inference, and Social Science: 

5. Walking Randomly @walkingrandomly


Mike Croucher, Head of the EPS IS Applications team at the University of Manchester, shares his knowledge of a variety of programming languages and tools, spanning everything from linear algebra to MATLAB and Python.

Three posts we like from Walking Randomly: 

6. No Free Hunch 


Kaggle is turning data science into a sport with its platform for predictive modeling competitions. Kaggle’s competition and data science blog, No Free Hunch, covers all things related to the sport of data science.

Three posts we like from No Free Hunch: 

7. Data Mining Research @DataMiningBlog


Data Mining Research, originally started in 2006, covers research and applications in data mining. Sandro Saitta first started the blog as a PhD student at EPFL (Ecole Polytechnique Fédérale de Lausanne), Switzerland, at which time he discussed data mining research issues.

Three posts we like from Data Mining Research: 

8. SmartData Collective @SmartDataCo


SmartData Collective is an online community moderated by Social Media Today that provides information on the latest trends in business intelligence and data management. SmartData Collective serves as a platform for recognized, global experts to share their expertise and insights.

Three posts we like from SmartData Collective: 

9. What’s the Big Data? @GilPress


Gil Press has been involved in researching the growth of Big Data for years, and What’s the Big Data? is his blog focused on that very subject. Press spent more than two decades managing research, marketing, and communications projects and programs at NORC, DEC, and EMC. He now runs his own consulting practice and continues to blog at What’s the Big Data?, sharing his knowledge of the world of Big Data with readers.

Three posts we like from What’s the Big Data?: 

10. Mining the Social Web @SocialWebMining


Mining the Social Web is “transforming curiosity into insight.” The blog is a companion to a book by the same name, with the goal of taking social web mining mainstream. Find tutorials, analyses, hacks, excerpts from the book, and more.

Three posts we like from Mining the Social Web: 

11. Hilary Mason @hmason


A former data scientist at, Hilary Mason is the Founder of Fast Forward Labs and Data Scientist in Residence at Accel.  A self-proclaimed “data scientist and hacker,” Mason blogs about all things data, her experiences speaking and presenting on the subject, and more.

Three posts we like from Hilary Mason: 

12. Steve Miller’s Blog @infomgmt


Steve Miller blogs at Information Management, covering data science, predictive analytics, statistical learning, and the impacts of data science on economics and public policy.

Three posts we like from Steve Miller’s Blog: 

13. Becoming a Data Scientist @BecomingDataSci


Renee documents her path from “SQL Data Analyst pursuing an Engineering Master’s Degree” to “Data Scientist” at Becoming a Data Scientist, providing a valuable resource to those interested in pursuing data science as a career.

Three posts we like from Becoming a Data Scientist:

14. datascience@berkeley @BerkeleyData


The U.C. Berkeley School of Information runs the datascience@berkeley blog, featuring interviews, data science startups, event coverage, and other insights on data science and information technology.

Three posts we like from datascience@berkeley: 

15. John Foreman, Data Scientist @John4man


John Foreman is a data scientist at MailChimp, blogging his thoughts on data science as a profession and the state of the analytics industry as a whole.

Three posts we like from John Foreman, Data Scientist: 

16. FlowingData @flowingdata


FlowingData explores the ways in which data scientists, designers, and statisticians use analysis, visualization, and exploration to understand data and ourselves. Dr. Nathan Yau, Ph.D., authors the FlowingData blog, presenting data on concepts that help readers understand the world around them, such as trends on transportation, relationships, and more.

Three posts we like from FlowingData: 

17. Simply Statistics @simplystats


Three biostatistics professors (Jeff Leek, Roger Peng, and Rafa Irizarry) who are fired up about the new era where data are abundant and statisticians are scientists blog at Simply Statistics, where they post ideas on interesting subject matter, contribute to discussion of science and popular writing, share informative articles, and offer advice to up-and-coming statisticians.

Three posts we like from Simply Statistics: 

18. Introduction to Data Science, Columbia University @DSI_Columbia


The Data Science Institute at Columbia University maintains an ongoing blog to reflect on and add to discussions on key topics from the Introduction to Data Science course. Featuring weekly lectures, course topics and themes, thought experiments, and more, the blog is an informative read for both students and professionals alike.

Three posts we like from Introduction to Data Science, Columbia University: 

19. Insight Data Science @InsightDataSci


Insight Data Science is an intensive, six-week post-doctoral fellowship program bridging the gap between academia and data science. The Insight Data Science blog updates readers on the latest happenings with the program, in addition to offering informative data analyses, industry news, and tips for professionals in the data science field.

Three posts we like from Insight Data Science: 

20. Data Science Report  @TedOBrien93


The Data Science Report is the official blog and website of Starbridge Partners, an executive search and career advisory firm focused on the data science and Big Data analytics space. The Data Science Report has it all, from case studies and papers to online courses, lectures, and webinars, as well as ongoing news and discussion of happenings in the world of data science.

Three posts we like from Data Science Report: 


Data Science Communities

21. Galvanize @galvanize


Galvanize is a network of life-long learners and educators, a home for developers, data scientists, entrepreneurs, and anyone with an interest in the space. Through Galvanize, you can connect with your peers, get advice from expert mentors, attend workshops, or enroll in a gSchool program.

Three resources we like from Galvanize: 

22. Data Science Central @DataScienceCtrl


Data Science Central is one of the premier online communities for those immersed in the data science culture. Read blog posts from other members, participate in forum discussions, and stay abreast of the latest research on Data Science Central.

Three resources we like from Data Science Central:

23. KDnuggets @kdnuggets


From courses, education, and meetings, to news, features, and interviews, publications, and webcasts, KDnuggets is a comprehensive resource for anyone with a vested interest in the data science community, whether a student in pursuit of professional goals or a working professional whose role is impacted by data science.

Three resources we like from KDnuggets: 

24. Quora – Data Science @Quora


Quora is a popular question-and-answer site where anyone can ask questions, engage in discussion, and provide expertise on virtually any topic. The Quora Data Science community focuses on “the scientific approach to knowledge extraction from data.”

Three resources we like from Quora – Data Science: 

25. Cross Validated


A question-and-answer site for statistical topics, machine learning, data analysis, data mining, and data visualization, Cross Validated is a free resource for data scientists and those interested in the field.

Three resources we like from Cross Validated: 

26. Data Science Association


A non-profit professional group offering education, professional certification, conferences and meetups, and even a “Data Science Code of Professional Conduct,” the Data Science Association is a valuable community for data science professionals.

Three resources we like from Data Science Association: 


Data Science Educational Resources

27. The Open Source Data Science Masters @clarecorthell


The Open Source Data Science Masters is an open-source curriculum for learning data science. “Foundational in both theory and technologies, the OSDSM breaks down the core competencies necessary to make data useful.”

Three resources we like from The Open Source Data Science Masters: 

28. Learn Data Science @nitin


An open content portal for self-directed learning in data science, Learn Data Science was developed primarily by Nitin Borwankar, a seasoned database professional with more than two decades of experience.

Three resources we like from Learn Data Science: 

29. DataCamp @DataCamp_com


DataCamp is a resource for learning data analysis and R interactively. With DataCamp, you can learn by doing, at your own pace, choosing from a variety of courses.

Three courses we like from DataCamp: 

30. Data Science Academy @ds_ldn


Data Science Academy is a useful portal for finding free educational resources on data science and related topics. From Berkeley to MIT, Columbia and Stanford, you’ll find free data science resources from major educational institutions.

Three resources we like from Data Science Academy:

31. School of Data @SchoolOfData


School of Data “works to empower civil society organizations, journalists, and citizens with the skills they need to use data effectively.” Joining School of Data gives you access to a variety of courses designed for everyone, from the data science-newbie to the professional seeking inspiration.

Three resources we like from School of Data:


Data Science Conferences

32. Neural Information Processing Systems Foundation (NIPS)\ @NipsConference
December 7-12, 2015
Montreal, Quebec, Canada

Neural Information Processing Systems Foundation (NIPS)

The Neural Information Processing Systems (NIPS) Foundation, a non-profit corporation, hosts its annual conference to give attendees the opportunity to exchange research on neural information processing systems in the areas of biology, technology, mathematics, and theory. Featuring tutorials on December 7, conference sessions December 7-10, and workshops December 11-12, NIPS 2015 will be part of the foundation’s continuing series of professional meetings and is a highly regarded data science resource.

Cost to Attend: Contact for attendance cost

33. International Conference on Machine Learning (ICML)
July 6-11, 2015
Lille, France

International Conference on Machine Learning

The 32nd International Conference on Machine Learning (ICML), supported by the International Machine Learning Society (IMLS), includes two days of tutorials, two main conference days, and two days of workshops. Invited speakers include Léon Bottou of Microsoft Research New York, Jon Kleinberg of Cornell University, and Susan Murphy of the University of Michigan. A popular and informative event, ICML is a great data science resource.

Cost to Attend: Contact for attendance cost

34. Data Science Meetup
Dates TBD
Locations TBD

Data Science

The Data Science Forum holds meetings that focus on hot data science topics, machine learning, and business analytics. Free and open to all, the interactive summits join professionals and academics together and provide a place for networking and the sharing of ideas and information.

35. Data & Analytics in Life Sciences Forum
March 9-11, 2015
Boston, Massachusetts

Data & Analytics in Life Sciences

Billed as “the industry’s only forum dedicated to discussing successful integration of your organization’s largest pools of data to maximize research and development efforts and minimize cost,” Data & Analytics in Life Sciences Forum showcases clinical trials data, real world data, communicable diseases data, and genomic/personalized medicine data. This data science resource will help attendees utilize their organizations’ data to enhance their brands.

Cost to Attend: Discounts are available for teams

  • Primary All Access – Main conference plus all 3 workshops: $2,499
  • Primary Main Conference: $1,799
  • Vendor All Access – Main conference plus all 3 workshops: $2,999
  • Vendor Main Conference: $2,299
  • Workshop A: $549
  • Workshop B: $549
  • Workshop C: $549


Data Science Webinars

36. R and Data Science Webinar

R and Data Science

Joseph Rickert presents this webinar on R and Data Science that is an appropriate resource both for beginner and experienced R users. The webinar features code examples and reasons for the popularity and effectiveness of R, making it a useful data science resource.

Three key points we like from R-bloggers: R and Data Science Webinar:

  • R makes numerous machine learning and statistical algorithms available
  • R features visualization capabilities
  • R is a rich programming language with several tools for data manipulation

Cost: FREE

37. Statistical Computing – R and Tableau: Data Science at the Speed of Thought

Statistical Computing - R and

This webinar features Bora Beran, program manager of Tableau Software, and explores how to use Tableau with R to speed up data science projects to result in better, data-driven business decisions. An on-demand webinar, Statistical Computing – R and Tableau: Data Science at the Speed of Thought is 62 minutes in length.

Three key topics we like from Statistical Computing – R and Tableau: Data Science at the Speed of Thought:

  • Connecting R scripts to a wide variety of data files and databases
  • Building interactive slideshows and presentations of your data in just minutes
  • Using dashboards as a front end for R code and allowing viewers to intuitively interact with R models

Cost: Free, with registration form completion

38. Top Data Science Trends for 2015

Top Data Science Trends for

This webcast features Annika Jimenez, Kaushik Das, and Hulya Farinas, leaders of the Pivotal team, and their insights on the data science industry trends that will be key for 2015. These industry leaders and their insights create a data science resource that is not to be missed.

Three key topics we like from Top Data Science Trends for 2015:

  • New use cases at the vertical level
  • Analytical tool usage trends
  • Implications of the shift in focus to model operationalization

Cost: FREE

39. How Data Science is Changing the Way Companies Do Business

How Data Science is Changing the Way Companies Do

Presented by Colin White, Presdient of BI Research, this webinar explores the idea that data scientists are invaluable to organizations because they are analysts who play the role of data engineer, statistician, and business analyst. In this regard, data scientists are the link between simply doing advanced data analysis and actually using the findings to produce business results aligned with the organization’s goals. This webinar serves as a terrific data science resource.

Three key topics we like from How Data Science is Changing the Way Companies Do Business:

  • The role data science plays in business analytics
  • Data science techniques, technologies, and tools
  • Gaining a competitive advantage through data science

Cost: Contact for registration and viewing fees

40. The Data Science Behind Service Revenue


The Data Science Behind Maintenance

From Maintenance Net, a leader in data-driven science revenue generation, this webinar is a data science resource geared toward service revenue. The webinar explores strategies for establishing sound data intelligence practices while growing your services sales business.

Three key topics we like from The Data Science Behind Service Revenue:

  • Transforming installed base data into actionable business intelligence
  • Managing and exchanging data with your channel efficiently
  • Using advanced analytics to close more renewal, cross-sell, and up-sell opportunities

Cost: Contact for registration information

41. Using Data Science Technologies for Automatic Clustering of IT Infrastructure Alerts


Using Data

Part of the Data Science Central Webinar Series, Using Data Science Techniques for Automatic Clustering of IT Infrastructure Alerts is hosted by Tim Matteson of DSC. The webinar showcases a framework developed to automatically cluster alerts that report on the health of various critical components, making it an informative and useful data science resource.

Three key points we like from Using Data Science Techniques for Automatic Clustering of IT Infrastructure Alerts:

  • The thousands of alerts that are generated per day put a strain on IT departments
  • The cluster framework should be used in conjunction with Pivotal Greenplum Database and utilize approaches from Graph Theory and Hierarchical Clustering
  • The framework makes it possible for IT departments to understand and process data more efficiently

Cost: FREE


Data Science Videos

42. The Data Science Revolution

The Data Science

This little more than hour-long video features an expert panel that explores the data science revolution that is occurring. Considerations of the future of data science and the ethics involved with data analytics and enhanced predictive powers are just two captivating issues that arise in the video, making it an intriguing data science resource.

Three key points we like from The Data Science Revolution:

  • Today’s analytic capabilities are advanced and powerful enough to transform approaches to global challenges and human activities
  • Data scientists’ predictive powers have grown immensely
  • There are important pros and cons associated with data science and its data analytics and predictive powers

Cost: FREE

43. Introduction to Data Science with R Video Workshop

Introduction to Data Science with R video

A video from RStudio, which provides “open source and enterprise-ready professional software for the R community,” Introduction to Data Science with R Video Workshop features RStudio Master Instructor Garrett Grolemund. A short video introduction for the larger Introduction to Data Science with R Video Course, this data science resource gives a thorough overview of the course.

Three key topics we like from Introduction to Data Science  with R Video Workshop:

  • Data science incorporates three skill sets: computer programming (with R), manipulating data sets, and modeling data with statistical methods
  • R makes it possible to load, save, and transform data, plus generate graphs and fit statistical models to the data
  • R is an alternative to Excel, SAS, and other software

Cost: FREE

  • Contact for a quote for the entire Introduction to Data Science with R Video Course

44. Hadoop for Data Science\ @mortardata

Hadoop for Data

This data science resource is a detailed talk delivered by Donald Miner to the NYC Data Science Meetup about using Hadoop for data science. The best feature about this video is that Mortar Data has included a time-stamped summary so viewers may skip to specific sections if they so desire.

Three key topics we like from Hadoop for Data Science:

  • 4 reasons to use Hadoop for data science
  • Evaluating data cleanliness
  • R and Hadoop

Cost: FREE

45. Turning Big Data Into Big Analytics: Data Science

Turning Big Data Into Big

Most individuals in the data science field are familiar with Booz Allen Hamiltion, the group that provides management and tech consulting to the government, major corporations and institutions, and non-profit organizations. Their nearly four-and-a-half-minute video focuses on the opportunity their clients have when dealing correctly with their data and serves as a case study for data science professionals.

Three key points we like from Turning Big Data Into Big Analytics: Data Science:

  • Big data yields big insights
  • Data is the most valuable natural resource for companies and organizations
  • Corporations face challenges and opportunities with analyzing their big data and transforming it into analysis and actionable insights

Cost: FREE

46. The Importance of Data Science in Sales @MaintenanceNet


The Importance of Data Science in

This video showcases an excerpt of a recorded interview with Jason Huling, data scientist at MaintenanceNet. With a central focus on using data science in sales, the video makes the case for having the necessary data, tools, and resources to use data science productively and efficiently.

Three key points we like from The Importance of Data Science in Sales:

  • The overflow of data results in the necessity of organizations using data science to get value, insights, and opportunity from their existing data
  • With data science, companies get clearer insight into customer behaviors, product usage, and buying trends
  • The great benefit of machine learning is the ability account for a nearly limitless amount of scenarios

Cost: FREE

47. Berkeley Institute for Data Science @UCBIDS

Berkeley Institute for Data

The Berkeley Institute for Data Science is a comprehensive data science resource, because it provides research, various resources, and more than 10 videos relating to data science. Anyone looking for more information about data science is sure to find the Berkeley Institute for Data Science to be a great help, but we think their videos are some of the best choices for data science resources.

Three data science videos we like from the Berkeley Institute for Data Science:


Miscellaneous Data Science Resources

48. Data Science: An Introduction

Data Science, An

Made available by Wikibooks, Data Science: An Introduction is a Wikibooks that includes a basic introduction to data science. Geared toward advanced high school students or college freshmen with high-school level understandings of math, science, word processing, and spreadsheets, Data Science: An Introduction does not require a computer science background, making it an extremely accessible data science resource.

Three key points we like from Data Science: An Introduction:

  • The most successful data scientists adopt holistic attitudes toward data science
  • Data science requires proficiency in parallel processing, map-reduce computing, machine learning, advanced statistics, and complexity science, among other advanced skills and knowledge sets
  • Data science is a collaborative effort and is best performed in team situations

Cost: FREE

49. Kaggle – Competitions @kaggle


Kaggle is a platform for predictive modeling competitions and consulting. Consult Kaggle’s Wiki for answers to all your frequently asked questions about data science and Kaggle’s competitions, look for professional opportunities on the job board, and participate in discussions with other users in the forum.

Three resources we like from Kaggle – Competitions: 

50. Data Science Weekly


A free weekly newsletter that features curated news, articles, and data science job openings, Data Science Weekly is a must-receive news source for data scientists and related professionals delivered to your inbox every Thursday.

Three resources we like from Data Science Weekly: 


Data-Driven Solutions from NGDATA

comments powered by Disqus