Getting Started with JavaCC

  1. Download the sample files from our webpage Exp.jj and Calc1i.jj. These will be used as test cases for the javacc compiler compiler.  Download them into a directory we’ll identify as Try.
  2. Download the JavaCC compiler compiler from https://javacc.dev.java.net/ .  My zip file was named javacc-3.2.
  3. Extract the files from the zip file.  I extracted them into a directory called C:\javacc
  4. Javacc is run from the command prompt.  In order to make javacc accessible from the command prompt window, you need to know the full path of javacc. On my machine, it is in C:\javacc\javacc-3.2\bin. If you don't know where the system put it, do a Start\Search\For Files or Folders\ to find it. You can test this location by going to the directory containing the Exp.jj file  (Try) and at the dos prompt typing,

C:\javacc\javacc-3.2\bin\javacc Exp.jj

 If this doesn't work, there is something wrong with your installation (or your typing). You want to make this path part of your path variable. There are several ways to do this, depending on the OS you are running. The options are explained below.

5.        If you have done your own install of jdk, you recall having set the path variable. You need to add the javacc path to it. You can either set it again permanently (as directed in the jdk install, by right clicking on My Computer, then selecting properties then Advanced then Environment Variables, and finally editing user variables.  You will add the path to javacc to the end of the current path variable) or you can do it temporarily from the command prompt. You just type

set path=directoryToSearch

On my machine, I type

set path=%path%;"C:\javacc\javacc-3.2\bin"

The %path% part makes sure that my previous path settings are not destroyed; I am only concatenating my new path only the old. To check that it worked, type path from the command prompt. It should echo the full path name.

If I ever forget how to set a class path, I do a google search to find instructions.

From the command prompt, make sure you are in the Try directory.  Run javacc on the grammar input file to generate a bunch of Java files that implement the parser and lexical analyzer (or token manager). It may generate more lines than you were expecting, but they aren't errors.   You type the following:

javacc Exp.jj

 

  1. Now compile the resulting Java programs, by typing the following:

              javac *.java

  1. The parser is now ready to use. To run the parser, type:

java Exp

The parser just asks you to input an expression, and it identifies the identifiers.

Looking at the Exp.jj  code: Compilation Unit

Take a look at the Exp.jj file which was input to javacc.  The first part looks like:

PARSER_BEGIN(Exp)

public class Exp {

  public static void main(String args[]) throws ParseException {

    Exp parser = new Exp(System.in);

    parser.ExpressionList();

  }

}

PARSER_END(Exp)

 

The Java compilation unit is enclosed between "PARSER_BEGIN(name)" and "PARSER_END(name)". This compilation unit can be of arbitrary complexity. The only constraint on this compilation unit is that it must define a class called "name" - the same as the arguments to PARSER_BEGIN and PARSER_END.   Thus, we see Exp parser = new Exp(System.in);

This “Exp” is the name that is used as the prefix for the Java files generated by the parser generator. The parser code that is generated is inserted immediately before the closing brace of the class called "name".

In the example, the class in which the parser is generated contains a main program. This main program creates an instance of the parser object (an object of type Exp) by using a constructor that takes one argument of type java.io.InputStream ("System.in" in this case).

The main program then makes a call to the non-terminal in the grammar that it would like to parse - "System.in" in this case. All non-terminals have equal status in a JavaCC generated parser, and hence one may parse with respect to any grammar non-terminal, as a specifc start symbol is not identified.

Tokens

The regular expression:

< ID: ["a"-"z","A"-"Z","_"] ( ["a"-"z","A"-"Z","_","0"-"9"] )* >

creates a new regular expression whose name is ID. This can be referred anywhere else in the grammar simply as <ID>. What follows in square brackets are a set of allowable characters - in this case it is any of the lower or upper case letters or the underscore. This is followed by 0 or more occurrences of any of the lower or upper case letters, digits, or the underscore.

Other constructs that may appear in regular expressions are:

  ( ... )+     : One or more occurrences of ...
  ( ... )?     : An optional occurrence of ... (Note that in the case
        of lexical tokens, (...)? and [...] are not equivalent)
  ( r1 | r2 | ... ) : Any one of r1, r2, ...

A construct of the form [...] is a pattern that is matched by the characters specified in ... . These characters can be individual characters or character ranges. A ~ before this construct is a pattern that matches any character not specified in ... . Therefore:

  ["a"-"z"] matches all lower case letters
  ~[] matches any character
  ~["\n","\r"] matches any character except the new line characters

You will note that the notation for regular expressions is slightly different from the one we used in class, so take note of the differences.  When a regular expression is used in an expansion, it takes a value of type "Token". This is generated into the generated parser directory as "Token.java". In the Exp.jj example, we have defined a variable of type "Token" and assigned the value of the regular expression to it.

Looking at the Exp.jj code: Productions

The next section consists of a list of productions. In this example, there are productions, that define the non-terminals. In JavaCC grammars, non-terminals are written and implemented (by JavaCC) as Java methods. When the non-terminal is used on the left-hand side of a production, it is considered to be declared and its syntax follows the Java syntax. On the right-hand side its use is similar to a method call in Java.

Each production defines its left-hand side non-terminal followed by a colon.  C code  (surrounded by braces) is interspersed in the production (making it harder to read).  There are declarations as well as code to be executed as the production is applied to the parsing. (In this example, it is common that there are no declarations and hence this appears as {}).  When the syntax rules specify actions such as output produced or code generation, we term it syntax directed translation – the syntax directs how the code is translated.

The first production in Exp.jj says that the non-terminal "ExpressionList" expands to zero or more non-terminal "Expression" followed by a semi-colon. The whole thing is followed by EOF (end of file).

The second production in Exp.jj says that the non-terminal "Expression" expands to a Term followed by zero or more occurrences of the plus followed by a term.

Square brackets [...] in a JavaCC input file indicate that the ... is optional.

[...] may also be written as (...)?. These two forms are equivalent. Other structures that may appear in expansions are:

   e1 | e2 | e3 | ... : A choice of e1, e2, e3, etc.
   ( e )+             : One or more occurrences of e
   ( e )*             : Zero or more occurrences of e

Note that these may be nested within each other, so we can have something like:

(( e1 | e2 )* [ e3 ] ) | e4

After compiling and typing java Exp, type a sequence of expressions followed by a return and an end of file (CTRL-D on UNIX machines). If this is a problem on your machine, you can create a file and pipe it as input to the generated parser in this manner

java Exp < myfile

Piping also does not work on all machines - if this is a problem, just replace "System.in" in the grammar file with 'new FileInputStream("testfile")' and place your input inside “testfile”.


Exp.jj Code

PARSER_BEGIN(Exp)
public class Exp {
  public static void main(String args[]) throws ParseException {
    Exp parser = new Exp(System.in);
    parser.ExpressionList();   
        // Notice this calls the start symbol for the grammar
  }
}
 
PARSER_END(Exp)
SKIP :
{ " " | "\t" | "\n" | "\r" }
 
TOKEN :
{ < ID: ["a"-"z","A"-"Z"] ( ["a"-"z","A"-"Z","0"-"9"] )* >
| < NUM: ( ["0"-"9"] )+ >
}
 
void ExpressionList() :
{ String s; }
{ { System.out.println(
             "Please type in an expression followed by a \";\" or ^D to quit:");
          System.out.println(""); }
  ( Expression() ";" )* <EOF>
}
 
void Expression() :
{ }
{ Term() ( "+" Term() )* }
 
void Term() :
{ }
{ Factor() ( "*" Factor() )* }
 
void Factor() :
{ Token t; String s; }
{ t=<ID>
        { System.out.println("Just read a " +t.image); }
| t=<NUM>
        { System.out.println("Just read a " + t.image); }
| "(" Expression() ")"
        { System.out.println("Just read a parenthesized expression");        }
}

Here is another example called Calc1i.jj (downloadable from our class webpage)

/* This is the basic expression grammar for four function
 * Expressions. The grammar supports the plus (+), minus (-)
 * multiply (*), and divide (/) operations.
 */
options { LOOKAHEAD=1; }
PARSER_BEGIN(Calc1i)
public class Calc1i {
    // The next two declarations are for global variables, usable in
    //any production
    static int total;  // Total value
    
    static java.util.Stack argStack = new java.util.Stack(); 
        // evaluation stack
 
    public static void main(String args[]) throws ParseException {
    Calc1i parser = new Calc1i(System.in);
    while (true) {
        System.out.print("Enter Expression: ");
        System.out.flush();
        try { switch (parser.one_line()){//call to grammar start symbol
              case -1: System.exit(0);
              case 0: break;
              case 1:  // result is stored on top of stack
                int x = ((Integer) argStack.pop()).intValue();
                System.out.println("Total = " + x);
                break;
          }
        } catch (ParseException x) {
        System.out.println("Exiting."); throw x;
        }
    } }
}
PARSER_END(Calc1i)
SKIP :
{ " " |    "\r" |    "\t" }
   // Tokens (terminals) are defined by regular expressions
TOKEN : { < EOL: "\n" > }
TOKEN : /* OPERATORS */
{      < PLUS: "+" >
  |    < MINUS: "-" >
  |    < MULTIPLY: "*" >
  |    < DIVIDE: "/" >
}
 
TOKEN :
{         < CONSTANT: ( <DIGIT> )+ >
|   < #DIGIT: ["0" - "9"] >    // # begins internal definition
                               // (used in rule itself)
}
 
int one_line() :
{}
{    sum() <EOL> { return 1; }
  |  <EOL> { return 0; }
  |  <EOF> { return -1; }
}
 
void sum() ://Production rule: sum ->term ((*|+)term)*)
{Token x;}  // local variable to store token which was matched
{ term()( 
        ( x = <PLUS> | x = <MINUS> ) term()
        {
          int a = ((Integer) argStack.pop()).intValue();
          int b = ((Integer) argStack.pop()).intValue();
          if ( x.kind == PLUS )  // query local variable for type
             argStack.push(new Integer(b + a));
          else
             argStack.push(new Integer(b - a));
        }
        )*
}
 
void term() :
{Token x;}
{ unary() ( 
          ( x = <MULTIPLY> | x = <DIVIDE> ) unary()
           {    int a = ((Integer) argStack.pop()).intValue();
                int b = ((Integer) argStack.pop()).intValue();
                if ( x.kind == MULTIPLY )
                   argStack.push(new Integer(b * a));
                else
                   argStack.push(new Integer(b / a));
           }
          )*
}
 
void unary() :
{}
{ <MINUS> element()
    {   int a = ((Integer) argStack.pop()).intValue();
        argStack.push(new Integer(- a));
    }
    | element()  
         // no need to place value on stack as element() has already
}
 
void element() :
{}
{   <CONSTANT>
    {   try {int x = Integer.parseInt(token.image);
               // token.image contains actual value matched by CONSTANT
            argStack.push(new Integer(x));
        } catch (NumberFormatException ee) {
        argStack.push(new Integer(0));}
    }
    |  "(" sum() ")"
}

}