An example ENHYPE.KW file

The enhype binary expects precisely one argument - a file describing the language it's supposed to be processing. It looks up a few things in this "enhype keyword" file, then copies standard input to standard output, making a few edits along the way.

The enhype keyword file consists of a number of names "sections". The first one enhype looks for is the "[General]" section, which tells it which HTML tages you'd like to use. THe defaults are shown below:

      [General]
      keyword tag = strong
      comment tag = em
      code tag = pre
    

After reading the [General] section, enhype then looks for two language-defining sections, one giving the approximate lexical syntax of the language, the other listing the keywords. At heart, enhype thinks all programming languages have the following characteristics:

(These assumptions seem so reasonable that it's perhaps a little surprising that I've yet to encounter any programming language that doesn't break at least one of them.)


By way of example, here are the sections describing C++. First, the characters section, which defines any interesting characters (and character sequences) in the language:

      [characters]
      comment = /* */
      comment = //
      comment = #
      letters = _
      quotes = '"
      escape = \
    

What's this telling us?

Firstly, there are three different comment styles, one with both start- and end- markers, and two of the "until end of line" style. (Note that # doesn't really mark a comment in C++, it's just that I prefer to have preprocessor lines formatted that way.)

By default enhype expects that identifiers contain letters and/or digits. If, as is often the case, your language allows other things, you can define them with a letters= line. Here I just add the underscore.

The next two lines are related to strings. We need to say which characters start and end strings, and which (if any) hide these characters once you're in a string.

OK - on with the keywords list:

      [keywords]
      if else
      do while for 
      break continue
      switch case default
      int float unsigned signed double char void long short
      const volatile
      typedef struct union enum
      static register auto extern
      return
      sizeof
      goto

      friend inline this virtual
      class private protected public
      template operator 
      new delete
      try catch finally
    

That was fairly painless.


The next example is to do with Rexx, which has the property of being case independent. This gives enhype the chance to look clever - I can ask it to display all keywords in upper case, and all variables in lower case. Oh, and, believe it or not, Rexx's comments nest:

      [characters]
      letter = _
      quotes = '"
      comments = /* */ nested
      keyword case = upper
      variable case = lower

      [keywords]
      if then else
      select when otherwise
      do to by for while until forever end leave iterate
      call return
      (etc ad tedium)