This laboratory was inspired by An Introduction to Statistical Learning, with Applications in R book, section 3.6.2 Simple Linear Regression at page 110. Please refer to it for for a detailed explanation of models and the nomenclature used in this post.
Previously we've seen how to load the
Boston from the
Now we will look into how we can fir a linear regression model.
We will try to predict median value of owner-occupied homes in $1000s (
medv) based on just a single predictor which is the lower status of the population in percent (
Fitting linear regression model in R
In R one can fit a linear regression model using
Its basic syntax is
lm(y~x, data, where
y is the response,
x is predictor and
data is the data set.
In order to fit the model to
Boston data we can call:
> lm.fit = lm(medv~lstat, data=Boston)
For basic information about the model we can type:
> lm.fit Call: lm(formula = medv ~ lstat, data = Boston) Coefficients: (Intercept) lstat 34.55 -0.95
It will print the function call used to creat the model as well as fitted coefficients.
In order to get more detailed information we can type:
> summary(lm.fit) Call: lm(formula = medv ~ lstat, data = Boston) Residuals: Min 1Q Median 3Q Max -15.168 -3.990 -1.318 2.034 24.500 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 34.55384 0.56263 61.41 <2e-16 *** lstat -0.95005 0.03873 -24.53 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 6.216 on 504 degrees of freedom Multiple R-squared: 0.5441, Adjusted R-squared: 0.5432 F-statistic: 601.6 on 1 and 504 DF, p-value: < 2.2e-16
This gives us information about residuals, p-values and standard errors for the coefficients, as well as statistics for the model.
Fitting linear regression model in Azure Machine Learning
In order to repeat the same experiment in Azure Machine Learning we will start with modules created last time.
In the first step we need to select the columns we want to work with.
Drag one 'Project Columns' module (
Data Transformation -> Manipulation) to the experiment canvas and connect it with existing
Execute R Script module:
In the properties pane click on the
Launch column selector:
With the right data we can proceed to fitting the model.
Linear Regression module (
Machine Learning -> Initialize Model -> Regression) to the experiment canvas.
To train the model we will also need one
Train Model (
Machine Learning -> Train).
Connect all the modules.
Train Model and in the properties pane click on
Lauch column selector to choose response column.
This type only
medv because that's the quantity we want to predict.
The complete model should look like that:
Run it to fit the model to the data.
You can visualize the output port of the
Train Model module to see the result.
We can see that the coefficient values obtain from Azure Machine Learning are different that what we got in R.
Instead of value 34.55 for the intercept (bias) we have 25.80.
Whereas coefficient for
lstat changed from -0.95 to -11.43.
The reason why we observed this discrepant is because Azure Machine Learning uses more advanced model with learning rate and regularization, which we will get to in the future laboratories when we reach chapter 6 Linear Model Selection and Regularization ISLR. For now we will disable these features to reach parity between two models we've seen so far.
Linear Regression module, go to the properties pane and select the following configuration.
Rerun the model and visualize the result.
Now we can see that the coefficient values match what we got at the beginning. Just as with R the model is described by its coefficients and we need to use other functions to get more information about its performance
In the next part
In the next part we will look into evaluating the trained model.
- Housing Values in Suburbs of Boston
- Microsoft Azure Machine Learning (Trial)
- Microsoft Machine Learning Blog
- Statistical Learning course at Stanford Online
- An Introduction to Statistical Learning with Applications in R (Springer, Amazon)
- The Comprehensive R Archive Network
This post and all the resources are available on GitHub: