poniedziałek, 28 lipca 2014

Introduction to Statistical Learning with Azure Machine Learning

Recently Microsoft announced release a preview version of a Azure Machine Learning service. The announcement appeared around the same time at The Official Microsoft Blog and Machine Learning Blog. Personally, I believe this is an important step forward because it fills the gap between data scientists capable of creating elaborate models and people who want to use these models in production environment.

The service itself is described as:

The problem? Machine learning traditionally requires complex software, high-end computers, and seasoned data scientists who understand it all. For many startups and even large enterprises, it's simply too hard and expensive. Enter Azure Machine Learning, a fully-managed cloud service for predictive analytics. By leveraging the cloud, Azure Machine Learning makes machine learning more accessible to a much broader audience. Predicting future outcomes is now attainable.

Getting started

The service is there, everyone can access it by either using existing Azure subscription or creating free trial account. There are some training materials but they all focus on how to use the system. They make assumption that the user is already familiar with the algorithms and models available in the platform. However that will not be always the case. The number of different models and their parameters is high. Therefore it is important to establish the link between them and the subject matter literature and show a path one can follow to master the platform. In the following posts I will try to do just that.

Introduction to Statistical Learning...

The learning path I would like to present was created by Trevor Hastie and Rob Tibshirani, two professors a Stanford University, who have been teaching statistical learning for many years and recently created an online course at Stanford Online. I highly recommend registering for this course! Not only is it free but additionally students get access to a pdf version of the textbook used in the course - An Introduction to Statistical Learning, with Applications in R by James, Witten, Hastie and Tibshirani (Springer, 2013).

This book is a great starting point for learning about machine learning. At the end of each chapter there is a lab section which shows how a newly introduced model can be exercised in R. This hands-on experience is critical to understand how the models behave and how to select the right values for their parameters to get the best results.

with Azure Machine Learning

All the labs in the book above are in R, but the models used are generally available and most of them can also be found in Azure Machine Learning. This gave me an idea for a series of posts which will use the same data and models but a different environment. Because the examples I will present will be covered also in the book it should be easier for the reader follow and get back to specific sections to get a deeper understanding about how specific models work.

References


This post and all the resources are available on GitHub:

https://github.com/StanislawSwierc/it-is-not-overengineering/tree/master

Brak komentarzy:

Prześlij komentarz