Parsing JavaScript code using Mozilla Rhino

Last year I had blogged “Understanding AST created by Mozilla Rhino parser“, where I exaplained how to traverse AST to get all functions and variables. Since then I have received some comments, public and private, to provide sample code. So here is an example of how to parse JavaScript code and get all functions and variables using Mozilla Rhino.

Note that my intention here is not to build the complete symbol table for any given JavaScript code. And I have not run a lot of test cases on this code. The idea is to show how AST could be traversed and how hierarchy of symbols (functions and variables) be built .

So, for example, if you feed following JavaScript code to this program –

function func1()
{
	var local1 = 10;

	global1 = 20;
}

obj1 = {
	objFunc1 : function()
	{
		var local2 = 30;
	},
	objProp1 : "prop1"
}

You will get following output –

Function : func1
		local1
	global1
	obj1
	obj1
		objFunc1
		Function : objFunc1
			local2
		objProp1

Yes, I know there are duplicates for object and closure – because they are first processed as variable names and then object/closure. This can be easily fixed, but I am going to leave it that way to keep the code simple. Continue reading “Parsing JavaScript code using Mozilla Rhino”

Capturing absolute offsets for JavaCC/JJTree tokens

I use JavaCC for generating parsers in Java. And use JJTree to create AST after parsing. JJTree creates nodes of the AST and you can configure JavaCC options to capture tokens in the node – i.e. if you want each node to contain start and end tokens. The default code generated by JavaCC creates Token class with offsets that are relative to the starting offset of the line. It has fields like beginLine, beginColumn, endLine and endColumn. Here the line numbers are absolute line numbers (starting from 1) and column fields contain offsets (again starting from 1) within the corresponding lines.
However many times you want to capture absolute offsets of tokens in the input stream, and not just relative offset in the line. I wish there was a JavaCC option to enable this. But it is not too complex if you want to do it yourself.

To explain how to do this, I will take a grammer file that is generated by the JavaCC wizard of JavaCC Eclipse plugin. This is the default grammer file it generates – Continue reading “Capturing absolute offsets for JavaCC/JJTree tokens”

Handling some of the warnings and errors generated by JavaCC

I am currently building a parser using JavaCC. I have used JavaCC in the past, but whenever I use it after a long gap, I have to relearn a few things about it – particularly handling warnings. So I thought this time I would blog about ways to handle some of the frequent warnings that I have seen.

If you are unfamiliar with JavaCC, then it is a parser generator. You create grammer using EBNF (Extended Backus-Naur Form) and feed it to JavaCC. JavaCC then creates Java classes for the parser. I do not want to make this post into JavaCC tutorial. There are some very good tutorials available at JavaCC Documentation page and FAQ. I especially find Lookahead MiniTutorial and Token Manager MiniTutorial very useful. If you use Eclipse IDE, then you would find JavaCC plugin for Eclipse useful – it provides wizard to create JavaCC or JJTree (JJTree creates AST, Abstract Syntax Tree, after parsing the input) files, provides code colorization, outline, code hyper link, syntax checking and compilation. You can also set JavaCC debug options easily using this plugin.

I will use following tokens that are generated by default if you use the wizard provided by JavaCC Eclipse plugin to create a JavaCC grammer file. I have created a .jjt file for examples in this blog.

Continue reading “Handling some of the warnings and errors generated by JavaCC”