Configurable Parser

Unlike the standard parser, the configurable parser can be configured at run time. It makes the process of adding new operators easier and allows other grammatical rules to be included.

Using the configurable parser

It is straightforward to use the configurable parser rather than the default standard parser. Just use the setComponent() method as shown here:

Jep jep = new Jep();
jep.setComponent(new ConfigurableParser());

Basic architecture

The parser has two main parts: a tokenizer and a grammar analyzer.

The tokenizer breaks the input into a series of tokens and the grammar analyzer reads these tokens and turns them into a tree of nodes. Each type of token is recognized by a TokenMatcher and new TokenMatchers can be added. There are Tokens and corresponding tokenMatchers for numbers, variables, strings, functions operators, white space and comments. New TokenMatchers can be added at run time to allow a configurable syntax.

After tokenizing, a filtering step is performed on the list of tokens. This is mainly used to remove white space and comments, although other operations on the list could be performed.

The final stage is to interpret the tokens and build a tree of nodes. This stage uses the precedence rules of operators so that the expression 2*3+4 is correctly interpreted as (2*3)+4 and not 2*(3+4). The core of the algorithm is an operator precedence parser using the shunting yard algorithm, most of the grammar rules are constructed from the operators specified in the OperatorTable and are build automatically. Additional grammar rules can be specified by adding GrammerMatchers to the parser. Such additional rules are used to specify the syntax for functions, and lists.

Adding and changing the tokenizer stage

To allow new lexical elements, a new TokenMatcher should be added to the list of token matchers used by the parser. A number of predefined TokenMatchers are already defined. See the matchers javadoc for a list of these.

To create a new TokenMatcher, a new class implementing the TokenMatcher interface should be created. Typically this will sub-class one of the existing TokenMatchers.

package com.singularsys.jep.configurableparser.matchers;
public interface TokenMatcher {
	/** Attempts to match the start of the string.
	 * @param s the string to match against
	 * @return if successful returns the corresponding token, 
	 *   return null if failed to match
	 */
	public abstract Token match(String s);
	/** Initialize the matcher when the Jep instance is known. */
	public void init(Jep j);
}
	
In general the match method should return one of the pre-defined tokens listed in tokens javadoc although other token types can be used if there is a corresponding GrammarMatcher.

Once created, the TokenMatcher needs to be added to the list of matchers used by the parser. The order is important as each matcher is called in turn and some input will match more than one type of input. Typically the full set of lists will need to be added in the correct order. See the example below.

Adding new operators

Most changes to the syntax will simply consist of adding new operators or changing the symbol of existing operators. A simple example would be:

OperatorTable ot = jep.getOperatorTable();
// create a bit-wise complement operator
Operator op = new Operator("~",new bitComp(),Operator.UNARY+Operator.PREFIX);
// add it with the same precedence as not
ot.addOperator(ot.getNumOps(), op, ot.getNot());
// informs the parser and other components about the new operator
jep.reinitializeComponents();

Once added, the new operator is ready to be used in the parser. For more details on adding operator see Operators manual page.

Adding a GrammerMatcher

New grammatical rules can be implemented by creating a class implementing the GrammarMatcher interface.

/**
 * Interface defining matchers for custom grammatical elements.
 * GrammarMatchers match syntax elements at the same precedence level
 * as brackets.
 */
public interface GrammarMatcher {
	/** Test whether the input matches this pattern.
	 * @param it An iterator inspecting the input
	 * @param parser the parser to use when evaluating sub-expressions
	 * @return if matched returns a node representing the content, 
	 *  returns null if does not match
	 * @throws ParseException 
	 */
	public Node match(Lookahead2Iterator it,GrammarParser parser)
					throws ParseException;
	
	/** Delayed initialization, this methods is called 
	 * whenever components of the Jep instance are changed. 
	 * @param jep
	 */
	public void init(Jep jep);
}

The match method can query the next two tokens from the input using it.peekNext() and it.nextNext() if these tokens match the rule then the current position of the input should be advanced using it.consume(). If the rule does not match then the match method should return null before calling it.consume(). Further tokens can be read using a combination of it.peekNext() and it.consume().

Various methods of the Token class can be used to query the type of token; for instance Token.isFunction(), Token.isIdentifier(), Token.isNumber(). The Token.equals(Object o) method can also be used to check the status of tokens.

For functions and lists it may be necessary to parse the arguments or list elements. These can be parsed using the public Node parseSubExpression() method of the GrammarParser interface.

Once the input has been parsed, the resulting node needs to be assembled. Here the NodeFactory methods can be used to construct nodes of the appropriate type. The OperatorTable, FunctionTable, VariableTable and NumberFactory classes can also be used.

SymbolTokens

New syntactical features may require special symbols, for instance the [ and ] used to represent lists. These symbols need to be recognized by the Tokenizer stage and used later by appropriate GrammarMatachers. The SymbolTokenMatchers class can recognize a set of SymbolTokens, which are loaded using its public boolean add(SymbolToken e) method. The same tokens can then be passed in the constructor of a GrammerMatcher and the tokens equal method used to test it it the same token as in the input.

// A matcher for special symbols
SymbolTokenMatcher stm = new SymbolTokenMatcher();
m.add(stm); // add to the list of matchers

// Create a special symbol and add it to the list 
SymbolToken colon = new SymbolToken(":");
stm.add(colon);

...
public class myGrammarMatcher {
	SymbolToken colon;
	public myGrammarMatcher(SymbolToken colon) {
		this.colon = colon;
	}
	public Node match(Lookahead2Iterator<Token> it,GrammarParser parser)
				throws ParseException;
	{
		Token t = it.peekNext();
		// use this way round to avoid probems when t is null
		if (colon.equals(t))
			....
	}

Example grammar matcher

The following code is an example of matching a function
/**
 * A GrammarMatcher which matches functions in the form 'atan2(y,x)'.
 * The function must be in the FunctionTable and brackets are required.
 */
public class FunctionGrammarMatcher implements GrammarMatcher {
Token open; // Token representing opening bracket
Token close; // Token representing closing bracket
Token comma; // Token representing argument separator
NodeFactory nf; // The node factory

/**
 * Create a FunctionGrammarMatcher
 * @param open token representing an opening bracket
 * @param close token representing a closing bracket
 * @param comma token representing a list item separator 
 */
public FunctionGrammarMatcher(Token open, Token close, Token comma) {
	this.open = open;
	this.close = close;
	this.comma = comma;
}

// store the node factory for later use
public void init(Jep jep) {
		nf = jep.getNodeFactory();
	}

// Try to match the rule
public Node match(Lookahead2Iterator<Token> it, GrammarParser parser)
				throws ParseException {
	Token t = it.peekNext(); // look at next token 
	if(t==null) return null;
	if(!t.isFunction()) return null; // return if not a function
	// is the next token an open bracket?
	if(!open.equals(it.nextnext())) return null;
	// input will match 'cos', '('
	it.consume(); // advance by two tokens
	it.consume();
	String name = t.getSource();
	PostfixMathCommandI pfmc = ((FunctionToken) t).getPfmc();
	// if next token is the closing bracket construct node and return
	if(close.equals(it.peekNext())) {
	    it.consume();
	    return nf.buildFunctionNode(name, pfmc,new Node[0]);
	}
	// function will have one or more arguments
	List<Node> seq=new ArrayList<Node>();
	while(true) {
		// read next argument
		Node contents = parser.parseSubExpression();
		seq.add(contents);
		// if next token is a closing bracket?
		if(close.equals(it.peekNext())) 
			break;
		else if(comma.equals(it.peekNext())) // is next token a comma
			it.consume(); // if so advance the input
		else // syntax error
			throw new ParseException("Closing bracket not found");
	}
	it.consume(); // advance the input to consume the ')' 
	// Build a node representing the function
	return nf.buildFunctionNode(name, pfmc,
		seq.toArray(new Node[seq.size()]));
}
}

Example usage

The following code illustrates how a configurable parser could be initialized.

List<TokenMatcher> m = new ArrayList<TokenMatcher>();
List<TokenFilter> filters = new ArrayList<TokenFilter>();
List<GrammerMatcher> gm = new ArrayList<GrammerMatcher>();

// Add token matchers for comments
m.add(CommentTokenMatcher.hashCommentMatcher());
m.add(CommentTokenMatcher.slashSlashCommentMatcher());
m.add(CommentTokenMatcher.slashStarCommentMatcher());
m.add(CommentTokenMatcher.multiLineSlashStarCommentMatcher());

// match different types of string, ' and "
m.add(StringTokenMatcher.doubleQuoteStringMatcher());
//m.add(StringTokenMatcher.singleQuoteStringMatcher());

// match whitespace
m.add(WhiteSpaceTokenMatcher.defaultWhiteSpaceTokenMatcher());

// match numbers
m.add(NumberTokenMatcher.exponentNumberTokenMatcher());

// match variable or function names
m.add(IdentifierTokenMatcher.basicIndetifierMatcher());

// match operators in the OperatorTable
m.add(new OperatorTokenMatcher());

// A matcher for special symbols
SymbolTokenMatcher stm = new SymbolTokenMatcher();
m.add(stm);

// Special symbols used in the grammar
SymbolToken ropen = new OpenToken("(");
SymbolToken rclose = new CloseToken(")");
SymbolToken sopen = new OpenToken("[");
SymbolToken sclose = new CloseToken("]");
SymbolToken comma = new SymbolToken(",");

stm.add(ropen);
stm.add(rclose);
stm.add(sopen);
stm.add(sclose);
stm.add(comma);

// Equations can be separated by a semi-colon
// This matcher causes the tokenizer to stop parsing when a semi-colon
// is encountered.
m.add(TerminalTokenMatcher.SemiColonTerminalMatcher());

// remove comments
filters.add(new WhiteSpaceCommentFilter());

// Handle brackets, this matcher uses round brackets for vectors (2,3)
gm.add(new ListOrBracketGrammarMatcher(ropen,rclose,comma));

// match functions like cos(pi/2)
gm.add(new FunctionGrammerMatcher(ropen,rclose,comma));

// match vectors with square brackets [2,3]
//gm.add(new ListGrammarMatcher(sopen,sclose,comma));

// match the array access operation v[2]
gm.add(new ArrayAccessGrammarMatcher(sopen,sclose));

// Construct the Jep instance and set the parser
jep = new Jep();
jep.setComponent(new ConfigurableParser(m,subs,gm));
 	
top