niedziela, 30 czerwca 2013

Syntax highlighting with Markdown and Prettify

In my last post I described how TeX4ht can be used to do syntax highlighting of source code for a web publishing. Today I will tell you something about the solution I chose for my blog. In order to keep the text of my posts and the related code in a single place I am using Markdown.

Out of the box Markdown supports code block syntax which wraps up source code with pre and code tags. It looks nice but colors would definitely improve the readability.

class Program
{
    static void Main(string[] args)
    {
        // This is comment
        var text = "This is text";
        var number = 12345;
        Console.WriteLine(text + number.ToString());
    }
}

Google Code Prettify

Google Code Prettify is the project which can help us achieve just that. This is a Javascript library which when loaded, rewrites existing code sections and improves their style. One of the best features is that there is no need to explicitly specify the language as it can be automatically detected. If the snippet is too short for the detection to work you may specify it explicitly but most of the time it just works.

Load Prettify

In order to turn on Prettify on a site it is necessary to reference the script. It will load the CSS, Javascript modules and look for the marked code sections to fix.

<script src="https://google-code-prettify.googlecode.com/svn/loader/
    run_prettify.js"></script> 

Mark code sections

Prettify will only touch code sections which were marked. Two different markers are supported.

In the normal HTML it is best to add a prettyprint class to the <pre>, <code> or <xmp> elements:

<pre class="prettyprint">
source code here
</pre>

If you do not have access to the <pre> tag, which is a case for Markdown, there is another way. The code needs to be preceded with a special instruction:

<?prettify?>
<pre>
source code here
</pre>

Markdown and Prettify

According to the documentation, in order to add block-level HTML elements in Markdown they have to be surrounded with blank lines and they should not be indented with tabs or spaces. Knowing that one could try to enable prettify with the following markup.

<?prettify?>

    source code

Unfortunately <?prettify?> is not recognized by the Markdown translator and the effect is pretty far from what was intended. Prettify instruction gets translated to a HTML section. The left and right angle brackets get escaped. Because the marker is effectively gone, the source code stays plain.

<p>&lt;?prettify?&gt;</p>
<pre><code>source code</code></pre>

We could fix it with some Javascript which would run on the page load and translate <div class="prettify"> tags, which are recognized by Markdown, with <?prettify?>. This wouldn't be too hard, but there is much easier solution!

In the prettify.js, around the line 883 there is a very interesting comment about how the tags <?tag?> are parsed by the HTML 5. The part 'nt === 8' was just what we were looking for. It turns out that in some browsers it can be interpreted as a normal comment node <!--tag-->, thus both nodes have to be treated the same way by the library. This is a huge deal, especially for Markdown because comments are supported!

    var nt = preceder.nodeType;
    // <?foo?> is parsed by HTML 5 to a comment node (8)
    // like <!--?foo?-->, but in XML is a processing instruction
    var value = (nt === 7 || nt === 8) && preceder.nodeValue;

Conclusions

In order to mark a Markdown code block to be processed by a Prettify one can add <!--?prettify?--> element before the block. Of course there needs to be a single empty line for everything to work.

The following markup:

<!--?prettify?-->

    class Program
    {
        static void Main(string[] args)
        {
            // This is comment
            var text = "This is text";
            var number = 12345;
            Console.WriteLine(text + number.ToString());
        }
    }

produces the following result:

class Program
{
    static void Main(string[] args)
    {
        // This is comment
        var text = "This is text";
        var number = 12345;
        Console.WriteLine(text + number.ToString());
    }
}

Although this solution relies on some undocumented features I think that it is a reliable one. I will use it to write my blog.

References


This post and all the resources are available on GitHub:

https://github.com/StanislawSwierc/it-is-not-overengineering/tree/master

Brak komentarzy:

Prześlij komentarz