Unlike the standard parser, the configurable parser can be configured at run time. It makes the process of adding new operators easier and allows other grammatical rules to be included.
It is straightforward to use the configurable parser rather than the default
standard parser. Just use the setComponent()
method as shown here:
Jep jep = new Jep(); jep.setComponent(new StandardConfigurableParser());
The parser has two main parts: a tokenizer and a grammar analyzer.
The tokenizer breaks the input into a series of tokens and the grammar analyzer reads these tokens and turns them into a tree of nodes. Each type of token is recognized by a TokenMatcher and new TokenMatchers can be added. There are Tokens and corresponding tokenMatchers for numbers, variables, strings, functions operators, white space and comments. New TokenMatchers can be added at run time to allow a configurable syntax.
After tokenizing, a filtering step is performed on the list of tokens. This is mainly used to remove white space and comments, although other operations on the list could be performed.
The final stage is to interpret the tokens and build a tree of nodes. This
stage uses the precedence rules of operators so that the expression 2*3+4
is correctly interpreted as (2*3)+4
and not 2*(3+4)
. The
core of the algorithm is an operator precedence parser using the shunting
yard algorithm, most of the grammar rules are constructed from the operators
specified in the OperatorTable and are build automatically. Additional grammar
rules can be specified by adding GrammarMatchers
to the parser. Such additional rules are used to specify the syntax for functions,
and lists.
There are two main classes StandardConfigurableParser is a parser with the most common configuration already set, and ConfigurableParser is a base parser which allows the full configuration to be specified. A number of methods are available to add different types of behaviour:
The order of these methods is important, the earlier matchers will be called before later ones. It is generally better to add the matchers in the order given above.
Other matchers are available, but these do not have corresponding methods exposed by ConfigurableParser.
For example com.singularsys.jep.configurableparser.matchers.HexNumberTokenMatcher() matches hexadecimal numbers.
This can be added to the parser using addTokenMatcher(new HexNumberTokenMatcher())
.
To allow new lexical elements, a new TokenMatcher should be added to the
list of token matchers used by the parser using the ConfigurableParser.addTokenMatcher()
method. A number of predefined TokenMatchers
are already defined. See the matchers
Javadoc for a list of these.
To create a new TokenMatcher, a new class implementing the TokenMatcher interface should be created. Typically this will sub-class one of the existing TokenMatchers.
package com.singularsys.jep.configurableparser.matchers; public interface TokenMatcher { /** Attempts to match the start of the string. * @param s the string to match against * @return if successful returns the corresponding token, * return null if failed to match */ public abstract Token match(String s); /** Initialize the matcher when the Jep instance is known. */ public void init(Jep j); }In general the match method should return one of the pre-defined tokens listed in tokens javadoc although other token types can be used if there is a corresponding GrammarMatcher.
Once created, the TokenMatcher needs to be added to the list of matchers used by the parser. The order is important as each matcher is called in turn and some input will match more than one type of input. Typically the full set of lists will need to be added in the correct order. See the example below.
Most changes to the syntax will simply consist of adding new operators or changing the symbol of existing operators. A simple example would be:
OperatorTable ot = jep.getOperatorTable(); // create a bit-wise complement operator Operator op = new Operator("~",new bitComp(),Operator.UNARY+Operator.PREFIX); // add it with the same precedence as not ot.addOperator(op, ot.getNot()); // informs the parser and other components about the new operator jep.reinitializeComponents();
Once added, the new operator is ready to be used in the parser. For more details on adding operator see Operators manual page.
New grammatical rules can be implemented by creating a class implementing the
GrammarMatcher interface.
and adding it the the parser using ConfigurableParser.addGrammarMatcher()
.
/** * Interface defining matchers for custom grammatical elements. * GrammarMatchers match syntax elements at the same precedence level * as brackets. */ public interface GrammarMatcher { /** Test whether the input matches this pattern. * @param it An iterator inspecting the input * @param parser the parser to use when evaluating sub-expressions * @return if matched returns a node representing the content, * returns null if does not match * @throws ParseException */ public Node match(Lookahead2Iterator>Token< it,GrammarParser parser) throws ParseException; /** Delayed initialization, this methods is called * whenever components of the Jep instance are changed. * @param jep */ public void init(Jep jep); }
The match method can query the next two tokens from the input using it.peekNext()
and it.nextNext()
if these tokens match the rule then the current position of the input should be advanced using it.consume()
.
If the rule does not match then the match method should return null
before calling it.consume()
.
Further tokens can be read using a combination of it.peekNext()
and it.consume()
.
Various methods of the Token
class can be used to query the type of token; for instance Token.isFunction()
,
Token.isIdentifier()
, Token.isNumber()
. The Token.equals(Object
o)
method can also be used to check the status of tokens.
For functions and lists it may be necessary to parse the arguments or list
elements. These can be parsed using the public Node parseSubExpression()
method of the GrammarParser
interface.
Once the input has been parsed, the resulting node needs to be assembled. Here the NodeFactory methods can be used to construct nodes of the appropriate type. The OperatorTable, FunctionTable, VariableTable and NumberFactory classes can also be used.
New syntactical features may require special symbols, for instance the [
and ]
used to represent lists. These symbols need to be recognized by the Tokenizer stage
and used later by appropriate GrammarMatchers, they are represented by SymbolToken.
The ConfigurableParser.addSymbols()
can be used to add symbols to the parser
and ConfigurableParser.getSymbolToken(String sym)
to return the corresponding SymbolToken.
The SymbolTokens can then be passed
in the constructor of a GrammarMatcher and the token's equals()
method used to test it it the same token
as in the input.
public class myGrammarMatcher { SymbolToken colon; public myGrammarMatcher(SymbolToken colon) { this.colon = colon; } public Node match(Lookahead2Iterator<Token> it,GrammarParser parser) throws ParseException; { Token t = it.peekNext(); // use this way round to avoid problems when t is null if (colon.equals(t)) .... } } // Create a special symbol and add it to the list ConfigurableParser cp = new ConfigurableParser(); ... cp.addSymbol(":"); SymbolToken st = cp.getSymbolToken(":"); ... GrammarMatcher gm = new myGrammarMatcher(st); cp.addGrammarMatcher(gm);
/** * A GrammarMatcher which matches functions in the form 'atan2(y,x)'. * The function must be in the FunctionTable and brackets are required. */ public class FunctionGrammarMatcher implements GrammarMatcher { Token open; // Token representing opening bracket Token close; // Token representing closing bracket Token comma; // Token representing argument separator NodeFactory nf; // The node factory /** * Create a FunctionGrammarMatcher * @param open token representing an opening bracket * @param close token representing a closing bracket * @param comma token representing a list item separator */ public FunctionGrammarMatcher(Token open, Token close, Token comma) { this.open = open; this.close = close; this.comma = comma; } // store the node factory for later use public void init(Jep jep) { nf = jep.getNodeFactory(); } // Try to match the rule public Node match(Lookahead2Iterator<Token> it, GrammarParser parser) throws ParseException { Token t = it.peekNext(); // look at next token if (t == null) return null; if (!t.isFunction()) return null; // return if not a function // is the next token an open bracket? if (!open.equals(it.nextnext())) return null; // input will match 'cos', '(' it.consume(); // advance by two tokens it.consume(); String name = t.getSource(); PostfixMathCommandI pfmc = ((FunctionToken) t).getPfmc(); // if next token is the closing bracket construct node and return if (close.equals(it.peekNext())) { it.consume(); return nf.buildFunctionNode(name, pfmc,new Node[0]); } // function will have one or more arguments List<Node> seq=new ArrayList<Node>(); while (true) { // read next argument Node contents = parser.parseSubExpression(); seq.add(contents); // if next token is a closing bracket? if (close.equals(it.peekNext())) break; else if(comma.equals(it.peekNext())) // is next token a comma it.consume(); // if so advance the input else // syntax error throw new ParseException("Closing bracket not found"); } it.consume(); // advance the input to consume the ')' // Build a node representing the function return nf.buildFunctionNode(name, pfmc, seq.toArray(new Node[seq.size()])); } }
The following code illustrates how a configurable parser could be initialized. This is the same sequence as the StandardConfigurableParser.
ConfigurableParser cp = new ConfigurableParser(); cp.addHashComments(); cp.addSlashComments(); cp.addDoubleQuoteStrings(); cp.addWhiteSpace(); cp.addExponentNumbers(); cp.addOperatorTokenMatcher(); cp.addSymbols("(",")","[","]",","); cp.setImplicitMultiplicationSymbols("(","["); cp.addIdentifiers(); cp.addSemiColonTerminator(); cp.addWhiteSpaceCommentFilter(); cp.addBracketMatcher("(",")"); cp.addFunctionMatcher("(",")",","); cp.addListMatcher("[","]",","); cp.addArrayAccessMatcher("[","]"); // Construct the Jep instance and set the parser jep = new Jep(); jep.setComponent(cp);