Configurable Parser

Unlike the standard parser, the configurable parser can be configured at run time. It makes the process of adding new operators easier and allows other grammatical rules to be included.

Using the configurable parser

It is straightforward to use the configurable parser rather than the default standard parser. Just use the setComponent() method as shown here:

Jep jep = new Jep();
jep.setComponent(new StandardConfigurableParser());

Basic architecture

The parser has two main parts: a tokenizer and a grammar analyzer.

The tokenizer breaks the input into a series of tokens and the grammar analyzer reads these tokens and turns them into a tree of nodes. Each type of token is recognized by a TokenMatcher and new TokenMatchers can be added. There are Tokens and corresponding tokenMatchers for numbers, variables, strings, functions operators, white space and comments. New TokenMatchers can be added at run time to allow a configurable syntax.

After tokenizing, a filtering step is performed on the list of tokens. This is mainly used to remove white space and comments, although other operations on the list could be performed.

The final stage is to interpret the tokens and build a tree of nodes. This stage uses the precedence rules of operators so that the expression 2*3+4 is correctly interpreted as (2*3)+4 and not 2*(3+4). The core of the algorithm is an operator precedence parser using the shunting yard algorithm, most of the grammar rules are constructed from the operators specified in the OperatorTable and are build automatically. Additional grammar rules can be specified by adding GrammarMatchers to the parser. Such additional rules are used to specify the syntax for functions, and lists.

Standard options

There are two main classes StandardConfigurableParser is a parser with the most common configuration already set, and ConfigurableParser is a base parser which allows the full configuration to be specified. A number of methods are available to add different types of behaviour:

Comments
  • addHashComments() - recognize comments starting with # until end of line.
  • addSlashComments() - recognize java-style // and /* ... */ comments.
Strings
  • addDoubleQuoteStrings() - recognize double quoted strings "..".
  • addSingleQuoteStrings() - recognize single quoted strings '...'
White-space
  • addWhiteSpace() - recognize standard white-space characters.
  • addWhiteSpaceCommentFilter() - filter out white space and comments before the syntactical stage. Both of these should generally be added.
Numbers
  • addExponentNumbers() - recognize numbers with or without an exponent 1.2, 1.2e2, 1.2E-2 etc.
  • addSimpleNumbers() - recognize only numbers without exponents.
Symbols
  • addSymbols("(",")","[","]",",") - recognize additional symbols.
  • setImplicitMultiplicationSymbols("(","[") - allows the given symbols to be used on the right hand side of an implicit multiplication. For example 3 (x+1). Note the jep.setImplicitMul(true) should be set to allow implicit multiplication.
Operators
  • addOperatorTokenMatcher() - recognize operators specified by the operator set.
Identifiers
  • addIdentifiers() - recognize java-style variable and function names.
Terminators
  • addSemiColonTerminator() - when a semi-colon is encountered, parsing end. This allows several equations to be separated by ;.
Grammatical sequences
  • addBracketMatcher("(",")") - match bracketed expressions: 2*(3+4).
  • addFunctionMatcher("(",")",",") - match functions: atan2(y,x).
  • addListMatcher("[","]",",") - matches vectors/arrays: [1,2,3].
  • addArrayAccessMatcher("[","]") - matches array access: a[3].
  • addListOrBracketMatcher("(",")",",") - matches lists: (1,2) or brackets: (1+2), depending on number of arguments.
The arguments of these methods should match those in addSymbols().

The order of these methods is important, the earlier matchers will be called before later ones. It is generally better to add the matchers in the order given above.

Other matchers are available, but these do not have corresponding methods exposed by ConfigurableParser. For example com.singularsys.jep.configurableparser.matchers.HexNumberTokenMatcher() matches hexadecimal numbers. This can be added to the parser using addTokenMatcher(new HexNumberTokenMatcher()).

Adding and changing the tokenizer stage

To allow new lexical elements, a new TokenMatcher should be added to the list of token matchers used by the parser using the ConfigurableParser.addTokenMatcher() method. A number of predefined TokenMatchers are already defined. See the matchers Javadoc for a list of these.

To create a new TokenMatcher, a new class implementing the TokenMatcher interface should be created. Typically this will sub-class one of the existing TokenMatchers.

package com.singularsys.jep.configurableparser.matchers;
public interface TokenMatcher {
	/** Attempts to match the start of the string.
	 * @param s the string to match against
	 * @return if successful returns the corresponding token, 
	 *   return null if failed to match
	 */
	public abstract Token match(String s);
	/** Initialize the matcher when the Jep instance is known. */
	public void init(Jep j);
}
	
In general the match method should return one of the pre-defined tokens listed in tokens javadoc although other token types can be used if there is a corresponding GrammarMatcher.

Once created, the TokenMatcher needs to be added to the list of matchers used by the parser. The order is important as each matcher is called in turn and some input will match more than one type of input. Typically the full set of lists will need to be added in the correct order. See the example below.

Adding new operators

Most changes to the syntax will simply consist of adding new operators or changing the symbol of existing operators. A simple example would be:

OperatorTable ot = jep.getOperatorTable();
// create a bit-wise complement operator
Operator op = new Operator("~",new bitComp(),Operator.UNARY+Operator.PREFIX);
// add it with the same precedence as not
ot.addOperator(op, ot.getNot());
// informs the parser and other components about the new operator
jep.reinitializeComponents();

Once added, the new operator is ready to be used in the parser. For more details on adding operator see Operators manual page.

Adding a GrammarMatcher

New grammatical rules can be implemented by creating a class implementing the GrammarMatcher interface. and adding it the the parser using ConfigurableParser.addGrammarMatcher().

/**
 * Interface defining matchers for custom grammatical elements.
 * GrammarMatchers match syntax elements at the same precedence level
 * as brackets.
 */
public interface GrammarMatcher {
	/** Test whether the input matches this pattern.
	 * @param it An iterator inspecting the input
	 * @param parser the parser to use when evaluating sub-expressions
	 * @return if matched returns a node representing the content, 
	 *  returns null if does not match
	 * @throws ParseException 
	 */
	public Node match(Lookahead2Iterator>Token< it,GrammarParser parser)
					throws ParseException;
	
	/** Delayed initialization, this methods is called 
	 * whenever components of the Jep instance are changed. 
	 * @param jep
	 */
	public void init(Jep jep);
}

The match method can query the next two tokens from the input using it.peekNext() and it.nextNext() if these tokens match the rule then the current position of the input should be advanced using it.consume(). If the rule does not match then the match method should return null before calling it.consume(). Further tokens can be read using a combination of it.peekNext() and it.consume().

Various methods of the Token class can be used to query the type of token; for instance Token.isFunction(), Token.isIdentifier(), Token.isNumber(). The Token.equals(Object o) method can also be used to check the status of tokens.

For functions and lists it may be necessary to parse the arguments or list elements. These can be parsed using the public Node parseSubExpression() method of the GrammarParser interface.

Once the input has been parsed, the resulting node needs to be assembled. Here the NodeFactory methods can be used to construct nodes of the appropriate type. The OperatorTable, FunctionTable, VariableTable and NumberFactory classes can also be used.

SymbolTokens

New syntactical features may require special symbols, for instance the [ and ] used to represent lists. These symbols need to be recognized by the Tokenizer stage and used later by appropriate GrammarMatchers, they are represented by SymbolToken. The ConfigurableParser.addSymbols() can be used to add symbols to the parser and ConfigurableParser.getSymbolToken(String sym) to return the corresponding SymbolToken. The SymbolTokens can then be passed in the constructor of a GrammarMatcher and the token's equals() method used to test it it the same token as in the input.

public class myGrammarMatcher {
	SymbolToken colon;
	public myGrammarMatcher(SymbolToken colon) {
		this.colon = colon;
	}
	public Node match(Lookahead2Iterator<Token> it,GrammarParser parser)
				throws ParseException;
	{
		Token t = it.peekNext();
		// use this way round to avoid problems when t is null
		if (colon.equals(t))
			....
	}
}

// Create a special symbol and add it to the list 
ConfigurableParser cp = new ConfigurableParser();
...
cp.addSymbol(":");
SymbolToken st = cp.getSymbolToken(":");
...
GrammarMatcher gm = new myGrammarMatcher(st);
cp.addGrammarMatcher(gm);

Example grammar matcher

The following code is an example of matching a function
/**
 * A GrammarMatcher which matches functions in the form 'atan2(y,x)'.
 * The function must be in the FunctionTable and brackets are required.
 */
public class FunctionGrammarMatcher implements GrammarMatcher {
Token open; // Token representing opening bracket
Token close; // Token representing closing bracket
Token comma; // Token representing argument separator
NodeFactory nf; // The node factory

/**
 * Create a FunctionGrammarMatcher
 * @param open token representing an opening bracket
 * @param close token representing a closing bracket
 * @param comma token representing a list item separator 
 */
public FunctionGrammarMatcher(Token open, Token close, Token comma) {
    this.open = open;
    this.close = close;
    this.comma = comma;
}

// store the node factory for later use
public void init(Jep jep) {
    nf = jep.getNodeFactory();
}

// Try to match the rule
public Node match(Lookahead2Iterator<Token> it, GrammarParser parser)
        throws ParseException {
    Token t = it.peekNext(); // look at next token 
    if (t == null) return null;
    if (!t.isFunction()) return null; // return if not a function

    // is the next token an open bracket?
    if (!open.equals(it.nextnext())) return null;

    // input will match 'cos', '('
    it.consume(); // advance by two tokens
    it.consume();
    String name = t.getSource();
    PostfixMathCommandI pfmc = ((FunctionToken) t).getPfmc();
    
    // if next token is the closing bracket construct node and return
    if (close.equals(it.peekNext())) {
        it.consume();
        return nf.buildFunctionNode(name, pfmc,new Node[0]);
    }
    
    // function will have one or more arguments
    List<Node> seq=new ArrayList<Node>();

    while (true) {
        // read next argument
        Node contents = parser.parseSubExpression();
        seq.add(contents);
        
        // if next token is a closing bracket?
        if (close.equals(it.peekNext())) 
            break;
        else if(comma.equals(it.peekNext())) // is next token a comma
            it.consume(); // if so advance the input
        else // syntax error
            throw new ParseException("Closing bracket not found");
    }
    
    it.consume(); // advance the input to consume the ')' 
    
    // Build a node representing the function
    return nf.buildFunctionNode(name, pfmc,
            seq.toArray(new Node[seq.size()]));
}
}

Example usage

The following code illustrates how a configurable parser could be initialized. This is the same sequence as the StandardConfigurableParser.

ConfigurableParser cp = new ConfigurableParser();
cp.addHashComments();
cp.addSlashComments();
cp.addDoubleQuoteStrings();
cp.addWhiteSpace();
cp.addExponentNumbers();
cp.addSymbols("(",")","[","]",",");
cp.setImplicitMultiplicationSymbols("(","[");
cp.addOperatorTokenMatcher();
cp.addIdentifiers();
cp.addSemiColonTerminator();
cp.addWhiteSpaceCommentFilter();
cp.addBracketMatcher("(",")");
cp.addFunctionMatcher("(",")",",");
cp.addListMatcher("[","]",",");
cp.addArrayAccessMatcher("[","]");

// Construct the Jep instance and set the parser
jep = new Jep();
jep.setComponent(cp);
 	
top