Wednesday, November 28, 2007

Format Source Code for Web

Introduction

As both a programmer and blogger, one needs to publish source code on blogs frequently. So how to do it? As a newcomer to blogger community, I looked into the problem for a while. There are two issues related to this if you only want to display your code plainly, without syntax highlighting:

  1. HTML treats consecutive spaces, tabs, line breaks as a single space, which removes indentations of the code.
  2. Special characters like &, <, >, and " need escaping, especially for HTML code itself.

My initial thought is to use sed, since the above issues can be simply solved with line by line processing, which is what sed is designed for. Later on I realized that why not use Javascript? Then I can publish code everywhere, which means that I can mobilize web authoring, in addition to mobilizing web surfing.

Solution

Following is the tool to convert source code to HTML fragments to be inserted in blogs or normal web pages. Simply paste your source code into the 1st textarea, and copy the HTML code out from the 2nd textarea.

It should be noted that I use CSS to format the source code in <pre> element, therefore one attribute "code" appears in the 1st line of the HTML output. One may tweak it according to his/her own preference, e.g. change to inline style sheets.

Source code:
HTML output:

Implementation

The source code for above form and related script is shown below formatted with itself.

<form name="converter" action="">
  <table>
    <tr><td>Source code:<br/>
      <textarea name="source" onchange="formatCode();" rows="10" cols="80">
      </textarea>
    </td></tr>
    <tr><td>HTML output:<br/>
      <textarea name="output" rows="10" cols="80">
      </textarea>
    </td></tr>
  </table>
</form>

<script type="text/javascript">
  //<![CDATA[
  function formatCode () {
    var form = document.converter;
    var s = form.source.value.replace(/&/g, '&amp;');
    s = s.replace(/</g, '&lt;').replace(/>/g, '&gt;').replace(/"/g, '&quot;');
    form.output.value = "<pre class=\"code\">\n" + s + "\n</pre>\n";
    form.output.select();
  }
  //]]>
</script>      

We use HTML element <pre> to solve the 1st issue listed above. In the script, function formatCode escapes special characters; note that we should handle character & first.

Update (2008-1-17): for Emacs users, you may find htmlize useful for this task (with many cool features). Gentoo users can simple emerge htmlize. I just stumbled upon this cool extension when reading the comments for Steve Yegge's nice post.

No comments:

Post a Comment