niedziela, 7 września 2014

3.6.2 Simple Linear Regression - running prediction

This laboratory was inspired by An Introduction to Statistical Learning, with Applications in R book, section 3.6.2 Simple Linear Regression at page 110. Please refer to this resource for a detailed explanation of models and the nomenclature used in this post.

In the previous post we've seen how to train a linear regression model. This post explains how to use the model to make predictions on the new data.

Running prediction in R

Once we have trained model we can use predict function to produce a prediction.

> predict(lm.fit, data.frame(lstat = c(5, 10, 15)))

       1        2        3 
29.80359 25.05335 20.30310 

Alternatively, for lm models we can set the interval parameter to compute prediction intervals.

> predict(
    lm.fit,
    data.frame(lstat = c(5, 10, 15)), interval = "prediction")

       fit       lwr      upr
1 29.80359 17.565675 42.04151
2 25.05335 12.827626 37.27907
3 20.30310  8.077742 32.52846

From this results we can read that for example, the predicted value of the medv for the lstat of 10 is 25.05335 and its 95% prediction interval is (12.827626, 37.27907).

Running prediction in Azure Machine Learning

The process of running prediction in Azure is slightly different because it is optimized for the web. Instead of calling a function locally we will publish the trained model as an Azure Web Service.

At first it may seem ridiculous to call a web service to run prediction on such a simple model, but on the second thought, it is actually great. What we will get it a fully operational, very scalable service which we can use straight away. Thanks to the nicely factored API we can continue to improve models without the need to update our clients. Finally, as we will see, the services are instrumented. The information about usage patterns will be saved and available though Azure portal.

The process of publishing the model has been described well in the documentation.

The first step is to add a Score Model to the the experiment. It has two inputs, a trained model and a data set to score. This module has no configuration because it can infer it from the context.

Score Model

Next we need to click on its data set input and output and select "Set as Publish Input"

Set as Publish Input

and "Set as Publish Output" options accordingly.

Set as Publish Output

Once experiment has all the inputs and outputs set you can run it and the "Publish Web Service" command will be enabled.

Publish as Web Service

The system will create a service and redirect you to its management site.

Service management

From there you can test it directly in the browser or select API help page to see how to access it programaticallly.

Service API

At the bottom there are samples in C#, Python and R! Lets copy the code into R Studio. In order to pass server authorization we need to replace the dummy API key with a genuine key from the service site. We will also set the value of the lstat to 10.

library("RCurl")
library("RJSONIO")

# Accept SSL certificates issued by public Certificate Authorities
options(RCurlOptions = list(
    cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl")))

h = basicTextGatherer()
req = list(Id="score00001",
 Instance=list(FeatureVector=list(
    "lstat"= "10",
    "medv"= "0"
 ),GlobalParameters=fromJSON('{}')))

body = toJSON(req)
api_key = "abc123" # Replace this with the API key for the web service
authz_hdr = paste('Bearer', api_key, sep=' ')

h$reset()
curlPerform(
    url = "https://ussouthcentral.services.azureml.net/workspaces/...",
    httpheader=c(
        'Content-Type' = "application/json",
        'Authorization' = authz_hdr),
    postfields=body,
    writefunction = h$update,
    verbose = TRUE
    )

result = h$value()
print(result)

This will produce the following output. Please notice that curlPerform is called with verbose = TRUE, thus there will be a lot of diagnostic information. It can be very helpful during development but you will most likely want to suppress it when you create a client library that makes use of the service.

* About to connect() to ussouthcentral.services.azureml.net port 443 (#0)
*   Trying 191.238.226.212... * connected
* Connected to ussouthcentral.services.azureml.net (191.238.226.212)
    port 443 (#0)
* successfully set certificate verify locations:
*   CAfile: C:/Users/stansw/Documents/R/win-library/3.1/RCurl/CurlSSL/
   cacert.pem CApath: none
* SSL connection using AES128-SHA
* Server certificate:
*      subject: CN=ussouthcentral.services.azureml.net
*      start date: 2014-07-01 19:23:34 GMT
*      expire date: 2016-06-30 19:23:34 GMT
*      subjectAltName: ussouthcentral.services.azureml.net matched
*      issuer: C=US; ST=Washington; L=Redmond; O=Microsoft Corporation;
         OU=Microsoft IT; CN=Microsoft IT SSL SHA2
*      SSL certificate verify ok.
> POST /workspaces/fb65c4e602654cb6a9fe4aae12daf762/services/
    8a8527dd062548e5b600e6023c0a69a0/score HTTP/1.1
Host: ussouthcentral.services.azureml.net
Accept: */*
Content-Type: application/json
Authorization: Bearer abc123
Content-Length: 116

< HTTP/1.1 200 OK
< Content-Length: 28
< Content-Type: application/json; charset=utf-8
< Server: Microsoft-HTTPAPI/2.0
< x-ms-request-id: 44bbb8b4-cf0d-4b70-8ca0-83326c5265f5
< Date: Mon, 08 Sep 2014 05:30:49 GMT
< 
* Connection #0 to host ussouthcentral.services.azureml.net left intact
OK 
 0 

[1] "[\"10\",\"0\",\"25.0533473418032\"]"

The last line is the most interesting bit. It tells us that for lstat value 10 the model prediction value is 25.0533473418032. As expected, this value is precisely what we received when we run the model inside R.

Summary

  • In this laboratory we saw how to run the prediction both in R and in Azure Machine Learning Studio.
  • Both models returned the same value.
  • When working in R it was very easy to get some statistical information about the prediction such as the 95% intervals.
  • By publishing our experiment we created a fully operational Web Service hosted in Azure.

In the next part

In the next part we will expand the feature space and training a multiple linear regression model.

References


This post and all the resources are available on GitHub:

https://github.com/StanislawSwierc/it-is-not-overengineering/tree/master

sobota, 2 sierpnia 2014

3.6.2 Simple Linear Regression - fitting the model

This laboratory was inspired by An Introduction to Statistical Learning, with Applications in R book, section 3.6.2 Simple Linear Regression at page 110. Please refer to it for for a detailed explanation of models and the nomenclature used in this post.

Previously we've seen how to load the Boston from the MASS library. Now we will look into how we can fir a linear regression model. We will try to predict median value of owner-occupied homes in $1000s (medv) based on just a single predictor which is the lower status of the population in percent (lstat).

Fitting linear regression model in R

In R one can fit a linear regression model using lm() function. Its basic syntax is lm(y~x, data, where y is the response, x is predictor and data is the data set.

In order to fit the model to Boston data we can call:

> lm.fit = lm(medv~lstat, data=Boston)

For basic information about the model we can type:

> lm.fit

Call:
lm(formula = medv ~ lstat, data = Boston)

Coefficients:
(Intercept)        lstat  
      34.55        -0.95  

It will print the function call used to creat the model as well as fitted coefficients.

In order to get more detailed information we can type:

> summary(lm.fit)

Call:
lm(formula = medv ~ lstat, data = Boston)

Residuals:
    Min      1Q  Median      3Q     Max 
-15.168  -3.990  -1.318   2.034  24.500 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 34.55384    0.56263   61.41   <2e-16 ***
lstat       -0.95005    0.03873  -24.53   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 6.216 on 504 degrees of freedom
Multiple R-squared:  0.5441,    Adjusted R-squared:  0.5432 
F-statistic: 601.6 on 1 and 504 DF,  p-value: < 2.2e-16

This gives us information about residuals, p-values and standard errors for the coefficients, as well as statistics for the model.

Fitting linear regression model in Azure Machine Learning

In order to repeat the same experiment in Azure Machine Learning we will start with modules created last time.

In the first step we need to select the columns we want to work with. Drag one 'Project Columns' module (Data Transformation -> Manipulation) to the experiment canvas and connect it with existing Execute R Script module:

In the properties pane click on the Launch column selector:

Select columns: medv and lstat.

With the right data we can proceed to fitting the model. Drag the Linear Regression module (Machine Learning -> Initialize Model -> Regression) to the experiment canvas. To train the model we will also need one Train Model (Machine Learning -> Train).

Connect all the modules. Select Train Model and in the properties pane click on Lauch column selector to choose response column. This type only medv because that's the quantity we want to predict.

The complete model should look like that:

Run it to fit the model to the data.

You can visualize the output port of the Train Model module to see the result.

We can see that the coefficient values obtain from Azure Machine Learning are different that what we got in R. Instead of value 34.55 for the intercept (bias) we have 25.80. Whereas coefficient for lstat changed from -0.95 to -11.43.

The reason why we observed this discrepant is because Azure Machine Learning uses more advanced model with learning rate and regularization, which we will get to in the future laboratories when we reach chapter 6 Linear Model Selection and Regularization ISLR. For now we will disable these features to reach parity between two models we've seen so far.

Select Linear Regression module, go to the properties pane and select the following configuration.

Rerun the model and visualize the result.

Now we can see that the coefficient values match what we got at the beginning. Just as with R the model is described by its coefficients and we need to use other functions to get more information about its performance

In the next part

In the next part we will look into evaluating the trained model.

References


This post and all the resources are available on GitHub:

https://github.com/StanislawSwierc/it-is-not-overengineering/tree/master

wtorek, 29 lipca 2014

3.6.2 Simple Linear Regression - loading data set

This laboratory was inspired by An Introduction to Statistical Learning, with Applications in R book, section 3.6.2 Simple Linear Regression at page 110. Please refer to it for for a detailed explanation of models and the nomenclature used in this post.

In this laboratory we will use Boston data set which comes with the MASS library. It contains some information about housing values in suburbs of Boston.

Loading data set in R

In order to load the data set in R we can use the following commands:

>library(MASS)
>fix(Boston)

Then, we can use summary function to learn more about the data:

> summary(Boston)
      crim                zn             indus            chas        
 Min.   : 0.00632   Min.   :  0.00   Min.   : 0.46   Min.   :0.00000  
 1st Qu.: 0.08204   1st Qu.:  0.00   1st Qu.: 5.19   1st Qu.:0.00000  
 Median : 0.25651   Median :  0.00   Median : 9.69   Median :0.00000  
 Mean   : 3.61352   Mean   : 11.36   Mean   :11.14   Mean   :0.06917  
 3rd Qu.: 3.67708   3rd Qu.: 12.50   3rd Qu.:18.10   3rd Qu.:0.00000  
 Max.   :88.97620   Max.   :100.00   Max.   :27.74   Max.   :1.00000  
      nox               rm             age              dis        
 Min.   :0.3850   Min.   :3.561   Min.   :  2.90   Min.   : 1.130  
 1st Qu.:0.4490   1st Qu.:5.886   1st Qu.: 45.02   1st Qu.: 2.100  
 Median :0.5380   Median :6.208   Median : 77.50   Median : 3.207  
 Mean   :0.5547   Mean   :6.285   Mean   : 68.57   Mean   : 3.795  
 3rd Qu.:0.6240   3rd Qu.:6.623   3rd Qu.: 94.08   3rd Qu.: 5.188  
 Max.   :0.8710   Max.   :8.780   Max.   :100.00   Max.   :12.127  
      rad              tax           ptratio          black       
 Min.   : 1.000   Min.   :187.0   Min.   :12.60   Min.   :  0.32  
 1st Qu.: 4.000   1st Qu.:279.0   1st Qu.:17.40   1st Qu.:375.38  
 Median : 5.000   Median :330.0   Median :19.05   Median :391.44  
 Mean   : 9.549   Mean   :408.2   Mean   :18.46   Mean   :356.67  
 3rd Qu.:24.000   3rd Qu.:666.0   3rd Qu.:20.20   3rd Qu.:396.23  
 Max.   :24.000   Max.   :711.0   Max.   :22.00   Max.   :396.90  
     lstat            medv      
 Min.   : 1.73   Min.   : 5.00  
 1st Qu.: 6.95   1st Qu.:17.02  
 Median :11.36   Median :21.20  
 Mean   :12.65   Mean   :22.53  
 3rd Qu.:16.95   3rd Qu.:25.00  
 Max.   :37.97   Max.   :50.00  

Loading data set in Azure Machine Learning

The Boston data set is not available in the predefined set of Saved Datasets. However, it can easily be loaded using Execute R Script available under R Language Modules. Drag this module to the experiment canvas and set the following script to be executed:

# Load MASS library
library(MASS);

# Assign data set to the current workspace
data.frame <- Boston;

# Select frame to be sent to the output Dataset port
maml.mapOutputPort("data.frame");

Your experiment canvas should look like this:

Your properties pane should look like this:

Once you save and run the experiment you should be able to right-click on the output port and select Visualize:

This will open a new dialog with the basic information regarding the data:

Using Descriptive Statistics module

The default data set visualization in Azure does not show all the values that are printed by summary function in R. In particular first and third quartiles are missing. In order to get their values one can use Descriptive Statistics module. Drag it to the experiment surface and connect it with the existing Execute R Script module.

Your experiment canvas should look like this:

Now when you visualize the output port of the Descriptive Statistics module you will see more statistics including quartiles missed previously.

In the next part

In the next part we will look into selecting data for the one-dimensional regression.

References


This post and all the resources are available on GitHub:

https://github.com/StanislawSwierc/it-is-not-overengineering/tree/master

poniedziałek, 28 lipca 2014

Introduction to Statistical Learning with Azure Machine Learning

Recently Microsoft announced release a preview version of a Azure Machine Learning service. The announcement appeared around the same time at The Official Microsoft Blog and Machine Learning Blog. Personally, I believe this is an important step forward because it fills the gap between data scientists capable of creating elaborate models and people who want to use these models in production environment.

The service itself is described as:

The problem? Machine learning traditionally requires complex software, high-end computers, and seasoned data scientists who understand it all. For many startups and even large enterprises, it's simply too hard and expensive. Enter Azure Machine Learning, a fully-managed cloud service for predictive analytics. By leveraging the cloud, Azure Machine Learning makes machine learning more accessible to a much broader audience. Predicting future outcomes is now attainable.

Getting started

The service is there, everyone can access it by either using existing Azure subscription or creating free trial account. There are some training materials but they all focus on how to use the system. They make assumption that the user is already familiar with the algorithms and models available in the platform. However that will not be always the case. The number of different models and their parameters is high. Therefore it is important to establish the link between them and the subject matter literature and show a path one can follow to master the platform. In the following posts I will try to do just that.

Introduction to Statistical Learning...

The learning path I would like to present was created by Trevor Hastie and Rob Tibshirani, two professors a Stanford University, who have been teaching statistical learning for many years and recently created an online course at Stanford Online. I highly recommend registering for this course! Not only is it free but additionally students get access to a pdf version of the textbook used in the course - An Introduction to Statistical Learning, with Applications in R by James, Witten, Hastie and Tibshirani (Springer, 2013).

This book is a great starting point for learning about machine learning. At the end of each chapter there is a lab section which shows how a newly introduced model can be exercised in R. This hands-on experience is critical to understand how the models behave and how to select the right values for their parameters to get the best results.

with Azure Machine Learning

All the labs in the book above are in R, but the models used are generally available and most of them can also be found in Azure Machine Learning. This gave me an idea for a series of posts which will use the same data and models but a different environment. Because the examples I will present will be covered also in the book it should be easier for the reader follow and get back to specific sections to get a deeper understanding about how specific models work.

References


This post and all the resources are available on GitHub:

https://github.com/StanislawSwierc/it-is-not-overengineering/tree/master

niedziela, 30 czerwca 2013

Syntax highlighting with Markdown and Prettify

In my last post I described how TeX4ht can be used to do syntax highlighting of source code for a web publishing. Today I will tell you something about the solution I chose for my blog. In order to keep the text of my posts and the related code in a single place I am using Markdown.

Out of the box Markdown supports code block syntax which wraps up source code with pre and code tags. It looks nice but colors would definitely improve the readability.

class Program
{
    static void Main(string[] args)
    {
        // This is comment
        var text = "This is text";
        var number = 12345;
        Console.WriteLine(text + number.ToString());
    }
}

Google Code Prettify

Google Code Prettify is the project which can help us achieve just that. This is a Javascript library which when loaded, rewrites existing code sections and improves their style. One of the best features is that there is no need to explicitly specify the language as it can be automatically detected. If the snippet is too short for the detection to work you may specify it explicitly but most of the time it just works.

Load Prettify

In order to turn on Prettify on a site it is necessary to reference the script. It will load the CSS, Javascript modules and look for the marked code sections to fix.

<script src="https://google-code-prettify.googlecode.com/svn/loader/
    run_prettify.js"></script> 

Mark code sections

Prettify will only touch code sections which were marked. Two different markers are supported.

In the normal HTML it is best to add a prettyprint class to the <pre>, <code> or <xmp> elements:

<pre class="prettyprint">
source code here
</pre>

If you do not have access to the <pre> tag, which is a case for Markdown, there is another way. The code needs to be preceded with a special instruction:

<?prettify?>
<pre>
source code here
</pre>

Markdown and Prettify

According to the documentation, in order to add block-level HTML elements in Markdown they have to be surrounded with blank lines and they should not be indented with tabs or spaces. Knowing that one could try to enable prettify with the following markup.

<?prettify?>

    source code

Unfortunately <?prettify?> is not recognized by the Markdown translator and the effect is pretty far from what was intended. Prettify instruction gets translated to a HTML section. The left and right angle brackets get escaped. Because the marker is effectively gone, the source code stays plain.

<p>&lt;?prettify?&gt;</p>
<pre><code>source code</code></pre>

We could fix it with some Javascript which would run on the page load and translate <div class="prettify"> tags, which are recognized by Markdown, with <?prettify?>. This wouldn't be too hard, but there is much easier solution!

In the prettify.js, around the line 883 there is a very interesting comment about how the tags <?tag?> are parsed by the HTML 5. The part 'nt === 8' was just what we were looking for. It turns out that in some browsers it can be interpreted as a normal comment node <!--tag-->, thus both nodes have to be treated the same way by the library. This is a huge deal, especially for Markdown because comments are supported!

    var nt = preceder.nodeType;
    // <?foo?> is parsed by HTML 5 to a comment node (8)
    // like <!--?foo?-->, but in XML is a processing instruction
    var value = (nt === 7 || nt === 8) && preceder.nodeValue;

Conclusions

In order to mark a Markdown code block to be processed by a Prettify one can add <!--?prettify?--> element before the block. Of course there needs to be a single empty line for everything to work.

The following markup:

<!--?prettify?-->

    class Program
    {
        static void Main(string[] args)
        {
            // This is comment
            var text = "This is text";
            var number = 12345;
            Console.WriteLine(text + number.ToString());
        }
    }

produces the following result:

class Program
{
    static void Main(string[] args)
    {
        // This is comment
        var text = "This is text";
        var number = 12345;
        Console.WriteLine(text + number.ToString());
    }
}

Although this solution relies on some undocumented features I think that it is a reliable one. I will use it to write my blog.

References


This post and all the resources are available on GitHub:

https://github.com/StanislawSwierc/it-is-not-overengineering/tree/master

niedziela, 23 czerwca 2013

Syntax highlighting with TeX4ht

When I was evaluating different options for a blog development I spent some time on the TeX4th. Although, I haven't chosen this technology I found it very interesting and I would like to share its goodness.

One of the important aspect of all the blogs about programming is how they display source code snippets. As always there is no one answer how to do it. Some people just wrap their code in the <pre> and code tags. Others care more about the appearance of their posts and highlight the syntax accordingly to the programming language they use. I wanted the code I share to look good. That's why I draw my attention the TeX4ht.

Listings package

In LaTeX there is a listings package which can be used to format source code. It offers environment similar to verbatim but with many parameters to customize the output.

This is an example of how one can add a code block to a LaTeX article.

\documentclass[11pt]{article}
\usepackage[utf8]{inputenc}

\usepackage{listings}
\lstset{
    language=[Sharp]C,
    basicstyle=\ttfamily\small,
    identifierstyle=\sffamily,
    keywordstyle=\sffamily\bfseries,
    commentstyle=\rmfamily,
    stringstyle=\rmfamily\itshape,
    numberstyle=\scriptsize,
    showstringspaces=false,
    tabsize=2,
    numbers=left,
}

\begin{document}
\begin{lstlisting}[float, caption={Sample code}]
class Program
{
    static void Main(string[] args)
    {
        // This is comment
        var text = "This is text";
        var number = 12345;
        Console.WriteLine(text + number.ToString());
    }
}
\end{lstlisting}
\end{document}

Once compiled to PDF it looks very nice. Even though everything is black and white every part of the code has its unique style.

TeX4ht

The LaTeX document presented in the previous listing can be compiled to the HTML using TeX4ht with the following command

>>htlatex Sample.tex

Unfortunately the output produced by default is not as pretty as it was in the PDF. The fonts have their style but the code is no longer aligned. There is no space between numbers and text. Comments are not aligned with the rest of the code.

Listing 1: Sample code
1class Program 
2{ 
3    static void Main(string[] args) 
4    { 
5        // This is comment 
6        var text = This is text; 
7        var number = 12345; 
8        Console.WriteLine(text + number.ToString()); 
9    } 
10}

Listing package supports four different modes of alignment. By default it uses a fixed mode where a character is a single unit of output and they are aligned in columns. This mechanism does not port to HTML. in order to achieve the similar effect one should use monospace fonts. However this has its own problems because in LaTeX this corresponds to a typewriter (/ttfamily) font which cannot be styled.

As I mentioned it in the previous post the best solution I found was at the StaskExchange

Instead of trying to force TeX4ht to produce different styles for the listing generated with listings package it is easier to override the style used in the output. For this to work all the styles used in the listings should be unique (eg. basicstyle, identifierstyle, ...). If you look at the lstset definition of the first listing, you will see that it satisfies this requirement.

The next step was to define the CSS configuration. In order to do it I used Internet Explorer Developer Tools to select elements and capture their classes. Then I was able to create a private configuration File for the TeX4ht.

\Preamble{html} 
\begin{document} 
  % basicstyle
  \Css{div.lstlisting .cmtt-10 {font-family:monospace; color:DimGray}} 
  % identifierstyle
  \Css{div.lstlisting .cmss-10 {font-family:monospace; color:Black}} 
  % keywordstyle
  \Css{div.lstlisting .cmssbx-10 {font-family:monospace; color:Blue}} 
  % commentstyle
  \Css{div.lstlisting .cmr-10 {font-family:monospace; color:Green}} 
  % stringstyle
  \Css{div.lstlisting .cmti-10 {font-family:monospace; color:DarkRed}} 
  % numberstyle
  \Css{div.lstlisting .cmr-8 {display:inline-block; width:20px}} 
\EndPreamble 

Please notice custom style for the div.lstlisting block. This hasn't been mentioned on the StackExchange but it is required for the line numbering to work.

In order to include the configuration file I used slightly modified command line.

>>htlatex Sample.tex Sample.cfg

Finally it all worked. The listing produced has line numbering. All the elements of the syntax are highlighted and everything is aligned exactly the same way as in the source code.

Listing 1: Sample code
1class Program 
2{ 
3    static void Main(string[] args) 
4    { 
5        // This is comment 
6        var text = This is text; 
7        var number = 12345; 
8        Console.WriteLine(text + number.ToString()); 
9    } 
10}

This post with all the resources is available on GitHub:

https://github.com/StanislawSwierc/it-is-not-overengineering/tree/master

niedziela, 16 czerwca 2013

Blog development plan

I'm quite a new to blogging but I know a lot about the software development. Are these two activities that different? They look alike to me:

  • You write down your ideas in a language of your choice.
  • You need to adhere to some rules like grammar.
  • When a post is done it is pushed to a public site.

Hey, that's exactly what software developers do all the time!

Let's have a look at how this idea can be put in use.

Syntax

One of the most important decision is about the syntax used to write posts. Ideally the syntax should be very light so that the writer focuses on the content. Additionally it should have some capabilities to organize the document.

I've taken into consideration the following options:

  • HTML
  • LaTeX + LaTeX2HTML
  • Markdown

HTML

I know that my posts will need to be converted to HTML at some point in time so why not just start with it. This language has a great tooling with WYSWIG editors. Moreover, it is the most powerful option. With pure HTML I should be able to write anything I like.

The only problem is that it sometimes can be hard to read with many different tags obscuring the picture. This is particularly visible when it comes to embedded source code. The problem is even worse because in HTML there are two characters that demand special treatment: < and &. Left angle brackets are used to start tags whereas ampersands are used to denote HTML entities. In order to use them as literal characters, it is necessary to escape them as entities, e.g. &lt;, and &amp;. Even if they are inserted by an editor they will exist in the source, thus making it harder to read.

LaTeX + Tex4HT

An alternative solution, which should be attractive to all academics, is the LaTeX. With some tools like HTLaTeX it is possible to compile documents to HTML.

I studied at a university where using LaTeX it is not mandatory but most of the instructors preferred this format. Therefore, I learned how to write LaTeX articles before I grasped the HTML. I still had the right environment which was a portable distribution of MikTeX with some of my favourite packages. In order to evaluate this option I created a sample post which contained some source code listings and images.

During that process I found few resources which were particularly helpful:

Although it worked, it wasn't easy. The biggest problem I had was with the source code formatting. The listings package I used to produced beautiful listings in a PDF but when I run the 'tex4ht' they all looked much worse. Fonts had no longer constant width and nothing was aligned as it should. It turned out that listings is quite advanced and it has it's own algorithm to organize everything into columns. That's how this works with any any font you use but this solution didn't worked for HTML.

I fixed it by changing the base font to be typewriter ('/ttfamily'). Everything was aligned again but I lost the bold style of the keywords. It made me think about using using colors instead. After all, this document won't be printed!

I found a very interesting answer at the StackExchange.

This is the key idea:

Imho a more simple approach is with fonts: if every style is connected to a different font then tex4ht surrounds the chars with classes which you can set through css.

-- Ulrike Fischer

It worked like charm. I still had to compile the document few times and inspect the html to capture all the class names that I need to use in the css but it fast. Eventually I came up with a nicely colored source code listing.

In summary, the experience wasn't bad but I had a feeling that I had to search for solutions and workarounds too often. I decided to look for something different.

Markdown

The third option that I took into consideration is Markdown. Initially I've been using it at StackExchange without knowing its name. I would just write a question or an answer and discover the syntax accidentally by typing and looking at the 'Preview' section. It was possible because of the philosophy it was created with.

Philosophy

Markdown is intended to be as easy-to-read and easy-to-write as is feasible.

Readability, however, is emphasized above all else. A Markdown-formatted document should be publishable as-is, as plain text, without looking like it's been marked up with tags or formatting instructions.

-- John Gruber

One of the biggest advantage of this syntax, which was probably also admired by the creators of StackOverflow, is the code block. You can just paste your code inline, indent it by 4 spaces or 1 tab and it will be properly formatted. No tags, no commands, just indentation - perfect!

Most of the code I write is in C#. Because classes are defined inside a namespace block they are indented by default so there is no extra action needed. The code can be copied as it is and it will be converted into HTML.

This was one of the reasons why I selected Markdown as the language for my blog.

Version Control

I believe that the version control system plays very important role in every software project. Whenever I start something new and I know that it stick around for longer than a day I create a repository. Blog definitely falls into this category. The decision about which VC system to use was quite easy. For all the private work I use Git. I've got all the tools installed and account on a GitHub to backup my repository.

With a public repository there is theoretical a chance that somebody will send me a pull request to fix something in the post but I don't think this will happen. Not because the code I write is flawless but blog is a personal thing. Nevertheless, contributions are more than welcome.

Conclusions

In summary, I will write all my posts using Markdow syntax. They will under version control system and available in two different places. Sources will be saved in the GitHub and the HTML version will be published in the Blogger.