Last year I had blogged “Understanding AST created by Mozilla Rhino parser“, where I exaplained how to traverse AST to get all functions and variables. Since then I have received some comments, public and private, to provide sample code. So here is an example of how to parse JavaScript code and get all functions and variables using Mozilla Rhino.
Note that my intention here is not to build the complete symbol table for any given JavaScript code. And I have not run a lot of test cases on this code. The idea is to show how AST could be traversed and how hierarchy of symbols (functions and variables) be built .
So, for example, if you feed following JavaScript code to this program –
function func1() { var local1 = 10; global1 = 20; } obj1 = { objFunc1 : function() { var local2 = 30; }, objProp1 : "prop1" }
You will get following output –
Function : func1 local1 global1 obj1 obj1 objFunc1 Function : objFunc1 local2 objProp1
Yes, I know there are duplicates for object and closure – because they are first processed as variable names and then object/closure. This can be easily fixed, but I am going to leave it that way to keep the code simple.
I am not going to produce the complete source code here, but will explain some important parts of the program.
The program uses visitor pattern to first traverse AST and then to print symbol hierarchy. It contains following classes –
JSSymbol : This class holds reference to AST node and child elements. We will create hierarchy of objects of this class
JSNodeVisitor: implements Rhino’s NodeVisitor interface and builds hierarchy of JSSymbo
IJSSymbolVisitor : visitor interface for JSSymbol
JSSymbolVisitor : implements IJSSymbolVisitor and prints hierarchy of JSSymbol objects
JSErrorReporter : implements ErrorReporter interface of Rhino and prints syntax errors in the JS script
RhinoDemo : main program
paseJS function in RhinoDemo class parses the script and gets the AstRoot node. It then calls visit method of the root node, passing JSNodeVisitor to it. After building the symbol heir achy, it prints it using JSSymbolVisitor.
public void parseJS (String filePath) throws Exception { CompilerEnvirons env = new CompilerEnvirons(); env.setRecoverFromErrors(true); FileReader strReader = new FileReader(filePath); IRFactory factory = new IRFactory(env, new JSErrorReporter()); AstRoot rootNode = factory.parse(strReader, null, 0); JSNodeVisitor nodeVisitor = new JSNodeVisitor(); rootNode.visit(nodeVisitor); nodeVisitor.getRoot().visit(new JSSymbolVisitor()); }
visit method of JSNodeVisistor calls addToParent method which actually builds symbol hierarchy
private void addToParent(AstNode node) { if (root == null) { root = new JSSymbol(node); functionsStack.push(root); currentFuncEndOffset = node.getAbsolutePosition() + node.getLength(); return; } if (functionsStack.size() == 0) return; int nodeType = node.getType(); //we will track only variables and functions if (nodeType != Token.FUNCTION && nodeType != Token.VAR && nodeType != Token.OBJECTLIT && !(nodeType == Token.NAME && node.getParent() instanceof ObjectProperty)) { if (isVariableName(node)) { //check if it is in the current function String symbolName = ((Name)node).getIdentifier(); JSSymbol currentSymContainer = functionsStack.peek(); if (!currentSymContainer.childExist(symbolName)) { //this is a global symbol root.addChild(node); } } return; } if (node.getType() == Token.VAR && node instanceof VariableInitializer == false) return; JSSymbol currSym = null; JSSymbol parent = functionsStack.peek(); if (parent.getNode().getAbsolutePosition() + parent.getNode().getLength() > node.getAbsolutePosition()) { currSym = new JSSymbol(node); parent.addChild(currSym); } else //outside current function boundary { //pop current parent functionsStack.pop(); addToParent(node); return; } //currSym is already set above if (nodeType == Token.FUNCTION || nodeType == Token.OBJECTLIT) { AstNode parentNode = node.getParent(); AstNode leftNode = null; if (parentNode.getType() == Token.ASSIGN) { leftNode = ((Assignment)parentNode).getLeft(); } else if (parentNode instanceof ObjectProperty) { leftNode = ((ObjectProperty)parentNode).getLeft(); } if (leftNode instanceof Name) currSym.setName(((Name)leftNode).getIdentifier()); functionsStack.push(currSym); currentFuncEndOffset = node.getAbsolutePosition() + node.getLength(); } }
The above function uses Stack to keep track of current parent symbol (function or Object). It checks if the node being processed is child of the symbol at the top of functionsStack by comparing node’s absolute position (AstNode.getAbsolutePosition) with end offset of the current (top) symbol. Since AST is visited in depth first manner, this logic works fine.
Remaining code in this program is quite simple. Download Eclipse project for this program if you use Eclipse.
Again, I do not guarantee that the program will create error-free symbol hierarchy At best, use this code as a reference and modify it to suit your requirements. It is just one of the ways to create symbol heirachy and there could be better ways.
-Ram Kulkarni
i already parsing my javascript using your code above,, after that i want to running one of javascript function which listed on that parsing result,, how do i that?
I had experimented with these APIs long back and don’t recall now. But I think this link could have the information you are looking for
the above method will parse the variable and the function information, which works perfectly fine. I also wanted to parse the value of the variables, business rules (All if/else/for) statement and the function getting called from another function. Is there any way i can do that?
Take a look at my reply to Shreeharsha Voonna for my earlier post Understanding AST created by Mozilla Rhino parser