Parsing JavaScript code using Mozilla Rhino

Last year I had blogged “Understanding AST created by Mozilla Rhino parser“, where I exaplained how to traverse AST to get all functions and variables. Since then I have received some comments, public and private, to provide sample code. So here is an example of how to parse JavaScript code and get all functions and variables using Mozilla Rhino.

Note that my intention here is not to build the complete symbol table for any given JavaScript code. And I have not run a lot of test cases on this code. The idea is to show how AST could be traversed and how hierarchy of symbols (functions and variables) be built .

So, for example, if you feed following JavaScript code to this program –

function func1()
{
	var local1 = 10;

	global1 = 20;
}

obj1 = {
	objFunc1 : function()
	{
		var local2 = 30;
	},
	objProp1 : "prop1"
}

You will get following output –

Function : func1
		local1
	global1
	obj1
	obj1
		objFunc1
		Function : objFunc1
			local2
		objProp1

Yes, I know there are duplicates for object and closure – because they are first processed as variable names and then object/closure. This can be easily fixed, but I am going to leave it that way to keep the code simple.

I am not going to produce the complete source code here, but will explain some important parts of the program.

The program uses visitor pattern to first traverse AST and then to print symbol hierarchy. It contains following classes –

JSSymbol : This class holds reference to AST node and child elements. We will create hierarchy of objects of this class

JSNodeVisitor: implements Rhino’s NodeVisitor interface and builds hierarchy of JSSymbo

IJSSymbolVisitor : visitor interface for JSSymbol

JSSymbolVisitor : implements IJSSymbolVisitor and prints hierarchy of JSSymbol objects

JSErrorReporter : implements ErrorReporter interface of Rhino and prints syntax errors in the JS script

RhinoDemo : main program

paseJS function in RhinoDemo class parses the script and gets the AstRoot node. It then calls visit method of the root node, passing JSNodeVisitor to it. After building the symbol heir achy, it prints it using JSSymbolVisitor.

public void parseJS (String filePath) throws Exception
{
	CompilerEnvirons env = new CompilerEnvirons();

	env.setRecoverFromErrors(true);

	FileReader strReader = new FileReader(filePath);

	IRFactory factory = new IRFactory(env, new JSErrorReporter());
	AstRoot rootNode = factory.parse(strReader, null, 0);

	JSNodeVisitor nodeVisitor = new JSNodeVisitor();

	rootNode.visit(nodeVisitor);

	nodeVisitor.getRoot().visit(new JSSymbolVisitor());
}

visit method of JSNodeVisistor calls addToParent method which actually builds symbol hierarchy

private void addToParent(AstNode node)
{
	if (root == null)
	{
		root = new JSSymbol(node);
		functionsStack.push(root);
		currentFuncEndOffset = node.getAbsolutePosition() + node.getLength();
		return;
	}

	if (functionsStack.size() == 0)
		return;

	int nodeType = node.getType();

	//we will track only variables and functions
	if (nodeType != Token.FUNCTION && nodeType != Token.VAR && nodeType != Token.OBJECTLIT &&
			!(nodeType == Token.NAME && node.getParent() instanceof ObjectProperty))
	{
		if (isVariableName(node))
		{
			//check if it is in the current function
			String symbolName = ((Name)node).getIdentifier();
			JSSymbol currentSymContainer = functionsStack.peek();
			if (!currentSymContainer.childExist(symbolName))
			{
				//this is a global symbol
				root.addChild(node);
			}
		}
		return;
	}

	if (node.getType() == Token.VAR && node instanceof VariableInitializer == false)
		return;

	JSSymbol currSym = null;

	JSSymbol parent = functionsStack.peek();
	if (parent.getNode().getAbsolutePosition() + parent.getNode().getLength() > node.getAbsolutePosition())
	{
		currSym = new JSSymbol(node);
		parent.addChild(currSym);
	}
	else //outside current function boundary
	{
		//pop current parent
		functionsStack.pop();
		addToParent(node);
		return;
	}

	//currSym is already set above
	if (nodeType == Token.FUNCTION || nodeType == Token.OBJECTLIT)
	{
		AstNode parentNode = node.getParent();
		AstNode leftNode = null;
		if (parentNode.getType() == Token.ASSIGN)
		{
			leftNode = ((Assignment)parentNode).getLeft();
		}
		else if (parentNode instanceof ObjectProperty)
		{
			leftNode = ((ObjectProperty)parentNode).getLeft();
		}

		if (leftNode instanceof Name)
			currSym.setName(((Name)leftNode).getIdentifier());

		functionsStack.push(currSym);
		currentFuncEndOffset = node.getAbsolutePosition() + node.getLength();
	}
}

The above function uses Stack to keep track of current parent symbol (function or Object). It checks if the node being processed is child of the symbol at the top of functionsStack by comparing node’s absolute position (AstNode.getAbsolutePosition) with end offset of the current (top) symbol. Since AST is visited in depth first manner, this logic works fine.

Remaining code in this program is quite simple. Download Eclipse project for this program if you use Eclipse.

Again, I do not guarantee that the program will create error-free symbol hierarchy At best, use this code as a reference and modify it to suit your requirements. It is just one of the ways to create symbol heirachy and there could be better ways.

-Ram Kulkarni

4 Replies to “Parsing JavaScript code using Mozilla Rhino”

  1. i already parsing my javascript using your code above,, after that i want to running one of javascript function which listed on that parsing result,, how do i that?

  2. the above method will parse the variable and the function information, which works perfectly fine. I also wanted to parse the value of the variables, business rules (All if/else/for) statement and the function getting called from another function. Is there any way i can do that?

Leave a Reply to Anonymous Cancel reply