The enhype binary expects precisely one argument - a file describing the language it's supposed to be processing. It looks up a few things in this "enhype keyword" file, then copies standard input to standard output, making a few edits along the way.
The enhype keyword file consists of a number of names "sections". The first one enhype looks for is the "[General]" section, which tells it which HTML tages you'd like to use. THe defaults are shown below:
[General] keyword tag = strong comment tag = em code tag = pre
After reading the [General] section, enhype then looks for two language-defining sections, one giving the approximate lexical syntax of the language, the other listing the keywords. At heart, enhype thinks all programming languages have the following characteristics:
(These assumptions seem so reasonable that it's perhaps a little surprising that I've yet to encounter any programming language that doesn't break at least one of them.)
By way of example, here are the sections describing C++. First, the characters section, which defines any interesting characters (and character sequences) in the language:
[characters] comment = /* */ comment = // comment = # letters = _ quotes = '" escape = \
What's this telling us?
Firstly, there are three different comment styles, one with both start- and end- markers, and two of the "until end of line" style. (Note that # doesn't really mark a comment in C++, it's just that I prefer to have preprocessor lines formatted that way.)
By default enhype expects that identifiers contain letters and/or digits. If, as is often the case, your language allows other things, you can define them with a letters= line. Here I just add the underscore.
The next two lines are related to strings. We need to say which characters start and end strings, and which (if any) hide these characters once you're in a string.
OK - on with the keywords list:
[keywords] if else do while for break continue switch case default int float unsigned signed double char void long short const volatile typedef struct union enum static register auto extern return sizeof goto friend inline this virtual class private protected public template operator new delete try catch finally
That was fairly painless.
The next example is to do with Rexx, which has the property of being case independent. This gives enhype the chance to look clever - I can ask it to display all keywords in upper case, and all variables in lower case. Oh, and, believe it or not, Rexx's comments nest:
[characters] letter = _ quotes = '" comments = /* */ nested keyword case = upper variable case = lower [keywords] if then else select when otherwise do to by for while until forever end leave iterate call return (etc ad tedium)