C:\javacc\javacc-3.2\bin. If you don't know where
the system put it, do a Start\Search\For Files or Folders\ to find it. You can
test this location by going to the directory containing the Exp.jj file (Try) and at the dos prompt typing, C:\javacc\javacc-3.2\bin\javacc Exp.jj
If this doesn't work, there is
something wrong with your installation (or your typing). You want to make this
path part of your path variable.
There are several ways to do this, depending on the OS you are running. The
options are explained below.
5.
If you have done your own install of jdk, you recall
having set the path variable. You need to add the javacc path to it. You can
either set it again permanently (as directed in the jdk install, by right
clicking on My Computer, then selecting properties then Advanced then Environment
Variables, and finally editing user variables. You will add the path to javacc to the
end of the current path variable) or you can do it temporarily from the command
prompt. You just type
set path=directoryToSearch
On my machine, I type
set path=%path%;"C:\javacc\javacc-3.2\bin"
The %path% part makes sure that my previous path settings are not destroyed; I am
only concatenating my new path only the old. To check that it worked, type path from the command prompt. It should echo the full
path name.
If I ever forget how to set a class path, I do a google search to find
instructions.
From the command prompt, make sure you are in the Try directory. Run javacc on the grammar input file to
generate a bunch of Java files that implement the parser and lexical analyzer
(or token manager). It may generate more lines than you were expecting, but
they aren't errors. You type the
following:
javacc Exp.jj
javac *.java
java Exp
The parser just asks you to input an expression, and it identifies the
identifiers.
PARSER_BEGIN(Exp)
public class Exp {
public static void main(String
args[]) throws ParseException {
Exp parser = new
Exp(System.in);
parser.ExpressionList();
}
}
PARSER_END(Exp)
The Java compilation unit is enclosed between
"PARSER_BEGIN(name)" and "PARSER_END(name)". This compilation
unit can be of arbitrary complexity. The only constraint on this compilation
unit is that it must define a class called "name" - the same as the
arguments to PARSER_BEGIN and PARSER_END. Thus, we see Exp parser = new Exp(System.in);
This “Exp”
is the name that is used as the prefix for the Java files generated by the
parser generator. The parser code that is generated is inserted immediately
before the closing brace of the class called "name".
In the example,
the class in which the parser is generated contains a main program. This main
program creates an instance of the parser object (an object of type Exp) by
using a constructor that takes one argument of type java.io.InputStream
("System.in" in this case).
The main program
then makes a call to the non-terminal in the grammar that it would like to
parse - "System.in" in this case. All non-terminals have equal status
in a JavaCC generated parser, and hence one may parse with respect to any
grammar non-terminal, as a specifc start symbol is not identified.
The regular
expression:
< ID:
["a"-"z","A"-"Z","_"] (
["a"-"z","A"-"Z","_","0"-"9"]
)* >
creates a new
regular expression whose name is ID. This can be referred anywhere else in the
grammar simply as <ID>. What follows in square brackets are a set of
allowable characters - in this case it is any of the lower or upper case
letters or the underscore. This is followed by 0 or more occurrences of any of
the lower or upper case letters, digits, or the underscore.
Other constructs
that may appear in regular expressions are:
( ... )+ : One or more occurrences of ... ( ... )? : An optional occurrence of ... (Note that in the case of lexical tokens, (...)? and [...] are not equivalent) ( r1 | r2 | ... ) : Any one of r1, r2, ...
A construct of
the form [...] is a pattern that is matched by the characters specified in ...
. These characters can be individual characters or character ranges. A ~ before this construct is
a pattern that matches any character not specified in ... . Therefore:
["a"-"z"] matches all lower case letters ~[] matches any character ~["\n","\r"] matches any character except the new line characters
You will note
that the notation for regular expressions is slightly different from the one we
used in class, so take note of the differences. When a regular expression is used in an
expansion, it takes a value of type "Token". This is generated into
the generated parser directory as "Token.java". In the Exp.jj example,
we have defined a variable of type "Token" and assigned the value of
the regular expression to it.
The next section
consists of a list of productions. In this example, there are productions, that
define the non-terminals. In JavaCC grammars, non-terminals are written and implemented
(by JavaCC) as Java methods. When the non-terminal is used on the left-hand
side of a production, it is considered to be declared and its syntax follows
the Java syntax. On the right-hand side its use is similar to a method call in
Java.
Each production
defines its left-hand side non-terminal followed by a colon. C code (surrounded by braces) is interspersed in
the production (making it harder to read).
There are declarations as well as code to be executed as the production
is applied to the parsing. (In this example, it is common that there are no
declarations and hence this appears as {}). When the syntax rules specify actions
such as output produced or code generation, we term it syntax directed translation – the syntax directs how the code
is translated.
The first
production in Exp.jj says that the non-terminal "ExpressionList"
expands to zero or more non-terminal "Expression" followed by a
semi-colon. The whole thing is followed by EOF (end of file).
The second
production in Exp.jj says that the non-terminal "Expression" expands
to a Term followed by zero or more occurrences of the plus followed by a term.
Square brackets [...] in a JavaCC input file
indicate that the ... is optional.
[...] may also be written as (...)?. These two forms are
equivalent. Other structures that may appear in expansions are:
e1 | e2 | e3 | ... : A choice of e1, e2, e3, etc. ( e )+ : One or more occurrences of e ( e )* : Zero or more occurrences of e
Note that these
may be nested within each other, so we can have something like:
(( e1 | e2
)* [ e3 ] ) | e4
After compiling
and typing java Exp, type a sequence of expressions followed by a return and an
end of file (CTRL-D on UNIX machines). If this is a problem on your machine,
you can create a file and pipe it as input to the generated parser in this
manner
java Exp <
myfile
Piping also does
not work on all machines - if this is a problem, just replace
"System.in" in the grammar file with 'new
FileInputStream("testfile")' and place your input inside “testfile”.
PARSER_BEGIN(Exp)public class Exp { public static void main(String args[]) throws ParseException { Exp parser = new Exp(System.in); parser.ExpressionList(); // Notice this calls the start symbol for the grammar }} PARSER_END(Exp)SKIP :{ " " | "\t" | "\n" | "\r" } TOKEN :{ < ID: ["a"-"z","A"-"Z"] ( ["a"-"z","A"-"Z","0"-"9"] )* >| < NUM: ( ["0"-"9"] )+ >} void ExpressionList() :{ String s; }{ { System.out.println( "Please type in an expression followed by a \";\" or ^D to quit:"); System.out.println(""); } ( Expression() ";" )* <EOF>} void Expression() :{ }{ Term() ( "+" Term() )* } void Term() :{ }{ Factor() ( "*" Factor() )* } void Factor() :{ Token t; String s; }{ t=<ID> { System.out.println("Just read a " +t.image); }| t=<NUM> { System.out.println("Just read a " + t.image); }| "(" Expression() ")" { System.out.println("Just read a parenthesized expression"); }}/* This is the basic expression grammar for four function * Expressions. The grammar supports the plus (+), minus (-) * multiply (*), and divide (/) operations. */options { LOOKAHEAD=1; }PARSER_BEGIN(Calc1i)public class Calc1i { // The next two declarations are for global variables, usable in //any production static int total; // Total value static java.util.Stack argStack = new java.util.Stack(); // evaluation stack public static void main(String args[]) throws ParseException { Calc1i parser = new Calc1i(System.in); while (true) { System.out.print("Enter Expression: "); System.out.flush(); try { switch (parser.one_line()){//call to grammar start symbol case -1: System.exit(0); case 0: break; case 1: // result is stored on top of stack int x = ((Integer) argStack.pop()).intValue(); System.out.println("Total = " + x); break; } } catch (ParseException x) { System.out.println("Exiting."); throw x; } } }}PARSER_END(Calc1i)SKIP :{ " " | "\r" | "\t" } // Tokens (terminals) are defined by regular expressionsTOKEN : { < EOL: "\n" > }TOKEN : /* OPERATORS */{ < PLUS: "+" > | < MINUS: "-" > | < MULTIPLY: "*" > | < DIVIDE: "/" >} TOKEN :{ < CONSTANT: ( <DIGIT> )+ >| < #DIGIT: ["0" - "9"] > // # begins internal definition // (used in rule itself)} int one_line() :{}{ sum() <EOL> { return 1; } | <EOL> { return 0; } | <EOF> { return -1; }} void sum() ://Production rule: sum ->term ((*|+)term)*){Token x;} // local variable to store token which was matched{ term()( ( x = <PLUS> | x = <MINUS> ) term() { int a = ((Integer) argStack.pop()).intValue(); int b = ((Integer) argStack.pop()).intValue(); if ( x.kind == PLUS ) // query local variable for type argStack.push(new Integer(b + a)); else argStack.push(new Integer(b - a)); } )*} void term() :{Token x;}{ unary() ( ( x = <MULTIPLY> | x = <DIVIDE> ) unary() { int a = ((Integer) argStack.pop()).intValue(); int b = ((Integer) argStack.pop()).intValue(); if ( x.kind == MULTIPLY ) argStack.push(new Integer(b * a)); else argStack.push(new Integer(b / a)); } )*} void unary() :{}{ <MINUS> element() { int a = ((Integer) argStack.pop()).intValue(); argStack.push(new Integer(- a)); } | element() // no need to place value on stack as element() has already} void element() :{}{ <CONSTANT> { try {int x = Integer.parseInt(token.image); // token.image contains actual value matched by CONSTANT argStack.push(new Integer(x)); } catch (NumberFormatException ee) { argStack.push(new Integer(0));} } | "(" sum() ")"}
}