niedziela, 23 czerwca 2013

Syntax highlighting with TeX4ht

When I was evaluating different options for a blog development I spent some time on the TeX4th. Although, I haven't chosen this technology I found it very interesting and I would like to share its goodness.

One of the important aspect of all the blogs about programming is how they display source code snippets. As always there is no one answer how to do it. Some people just wrap their code in the <pre> and code tags. Others care more about the appearance of their posts and highlight the syntax accordingly to the programming language they use. I wanted the code I share to look good. That's why I draw my attention the TeX4ht.

Listings package

In LaTeX there is a listings package which can be used to format source code. It offers environment similar to verbatim but with many parameters to customize the output.

This is an example of how one can add a code block to a LaTeX article.

\documentclass[11pt]{article}
\usepackage[utf8]{inputenc}

\usepackage{listings}
\lstset{
    language=[Sharp]C,
    basicstyle=\ttfamily\small,
    identifierstyle=\sffamily,
    keywordstyle=\sffamily\bfseries,
    commentstyle=\rmfamily,
    stringstyle=\rmfamily\itshape,
    numberstyle=\scriptsize,
    showstringspaces=false,
    tabsize=2,
    numbers=left,
}

\begin{document}
\begin{lstlisting}[float, caption={Sample code}]
class Program
{
    static void Main(string[] args)
    {
        // This is comment
        var text = "This is text";
        var number = 12345;
        Console.WriteLine(text + number.ToString());
    }
}
\end{lstlisting}
\end{document}

Once compiled to PDF it looks very nice. Even though everything is black and white every part of the code has its unique style.

TeX4ht

The LaTeX document presented in the previous listing can be compiled to the HTML using TeX4ht with the following command

>>htlatex Sample.tex

Unfortunately the output produced by default is not as pretty as it was in the PDF. The fonts have their style but the code is no longer aligned. There is no space between numbers and text. Comments are not aligned with the rest of the code.

Listing 1: Sample code
1class Program 
2{ 
3    static void Main(string[] args) 
4    { 
5        // This is comment 
6        var text = This is text; 
7        var number = 12345; 
8        Console.WriteLine(text + number.ToString()); 
9    } 
10}

Listing package supports four different modes of alignment. By default it uses a fixed mode where a character is a single unit of output and they are aligned in columns. This mechanism does not port to HTML. in order to achieve the similar effect one should use monospace fonts. However this has its own problems because in LaTeX this corresponds to a typewriter (/ttfamily) font which cannot be styled.

As I mentioned it in the previous post the best solution I found was at the StaskExchange

Instead of trying to force TeX4ht to produce different styles for the listing generated with listings package it is easier to override the style used in the output. For this to work all the styles used in the listings should be unique (eg. basicstyle, identifierstyle, ...). If you look at the lstset definition of the first listing, you will see that it satisfies this requirement.

The next step was to define the CSS configuration. In order to do it I used Internet Explorer Developer Tools to select elements and capture their classes. Then I was able to create a private configuration File for the TeX4ht.

\Preamble{html} 
\begin{document} 
  % basicstyle
  \Css{div.lstlisting .cmtt-10 {font-family:monospace; color:DimGray}} 
  % identifierstyle
  \Css{div.lstlisting .cmss-10 {font-family:monospace; color:Black}} 
  % keywordstyle
  \Css{div.lstlisting .cmssbx-10 {font-family:monospace; color:Blue}} 
  % commentstyle
  \Css{div.lstlisting .cmr-10 {font-family:monospace; color:Green}} 
  % stringstyle
  \Css{div.lstlisting .cmti-10 {font-family:monospace; color:DarkRed}} 
  % numberstyle
  \Css{div.lstlisting .cmr-8 {display:inline-block; width:20px}} 
\EndPreamble 

Please notice custom style for the div.lstlisting block. This hasn't been mentioned on the StackExchange but it is required for the line numbering to work.

In order to include the configuration file I used slightly modified command line.

>>htlatex Sample.tex Sample.cfg

Finally it all worked. The listing produced has line numbering. All the elements of the syntax are highlighted and everything is aligned exactly the same way as in the source code.

Listing 1: Sample code
1class Program 
2{ 
3    static void Main(string[] args) 
4    { 
5        // This is comment 
6        var text = This is text; 
7        var number = 12345; 
8        Console.WriteLine(text + number.ToString()); 
9    } 
10}

This post with all the resources is available on GitHub:

https://github.com/StanislawSwierc/it-is-not-overengineering/tree/master

Brak komentarzy:

Prześlij komentarz