{"id":740,"date":"2013-03-31T12:07:08","date_gmt":"2013-03-31T06:37:08","guid":{"rendered":"http:\/\/ramkulkarni.com\/blog\/?p=740"},"modified":"2013-03-31T12:07:08","modified_gmt":"2013-03-31T06:37:08","slug":"understanding-ast-created-by-mozilla-rhino-parser","status":"publish","type":"post","link":"http:\/\/ramkulkarni.com\/blog\/understanding-ast-created-by-mozilla-rhino-parser\/","title":{"rendered":"Understanding AST created by Mozilla Rhino parser"},"content":{"rendered":"<p>For an application I am developing, I needed to get all functions and variables declared in JavaScript code. Because the application I am developing is in Java, I started looking for readily\u00a0available\u00a0JavaScript parser written in Java. I found <a title=\"Mozilla Rhino JS Parser\" href=\"https:\/\/developer.mozilla.org\/en\/docs\/Rhino\" target=\"_blank\">Mozilla Rhino<\/a> to be a good fit for my requirements and decided to use it.<\/p>\n<p>Parsing with Rhino is quite simple and well documented.<\/p>\n<div style=\"background: white; overflow: auto; width: auto; color: black; border: solid gray; border-width: .1em .1em .1em .8em; padding: .2em .6em;\">\n<pre style=\"margin: 0; line-height: 125%;\"><span style=\"color: #008000; font-weight: bold;\">private<\/span> AstRoot <span style=\"color: #0060b0; font-weight: bold;\">parse<\/span><span style=\"color: #303030;\">(<\/span>String src<span style=\"color: #303030;\">,<\/span> <span style=\"color: #303090; font-weight: bold;\">int<\/span> startLineNum<span style=\"color: #303030;\">)<\/span>\n\t<span style=\"color: #008000; font-weight: bold;\">throws<\/span> IOException <span style=\"color: #303030;\">{<\/span>\n\n\tCompilerEnvirons env <span style=\"color: #303030;\">=<\/span> <span style=\"color: #008000; font-weight: bold;\">new<\/span> CompilerEnvirons<span style=\"color: #303030;\">();<\/span>\n\tenv<span style=\"color: #303030;\">.<\/span><span style=\"color: #0000c0;\">setRecoverFromErrors<\/span><span style=\"color: #303030;\">(<\/span><span style=\"color: #008000; font-weight: bold;\">true<\/span><span style=\"color: #303030;\">);<\/span>\n\tenv<span style=\"color: #303030;\">.<\/span><span style=\"color: #0000c0;\">setGenerateDebugInfo<\/span><span style=\"color: #303030;\">(<\/span><span style=\"color: #008000; font-weight: bold;\">true<\/span><span style=\"color: #303030;\">);<\/span>\n\tenv<span style=\"color: #303030;\">.<\/span><span style=\"color: #0000c0;\">setRecordingComments<\/span><span style=\"color: #303030;\">(<\/span><span style=\"color: #008000; font-weight: bold;\">true<\/span><span style=\"color: #303030;\">);<\/span>\n\n\tStringReader strReader <span style=\"color: #303030;\">=<\/span> <span style=\"color: #008000; font-weight: bold;\">new<\/span> StringReader<span style=\"color: #303030;\">(<\/span>src<span style=\"color: #303030;\">);<\/span>\n\n\tIRFactory factory <span style=\"color: #303030;\">=<\/span> <span style=\"color: #008000; font-weight: bold;\">new<\/span> IRFactory<span style=\"color: #303030;\">(<\/span>env<span style=\"color: #303030;\">);<\/span>\n\t<span style=\"color: #008000; font-weight: bold;\">return<\/span> factory<span style=\"color: #303030;\">.<\/span><span style=\"color: #0000c0;\">parse<\/span><span style=\"color: #303030;\">(<\/span>strReader<span style=\"color: #303030;\">,<\/span> <span style=\"color: #008000; font-weight: bold;\">null<\/span><span style=\"color: #303030;\">,<\/span> startLineNum<span style=\"color: #303030;\">);<\/span>\n\n<span style=\"color: #303030;\">}<\/span><\/pre>\n<\/div>\n<p><!--more--><br \/>\nAstNode class has methods like getSymbols and getSymbolTable which I presumed would give me all variables and functions declared in the code. However these methods returned only functions and not variables declared. So I had to traverse the tree and get all symbols. I thought I would use visit method of AstRoot to process all nodes.<\/p>\n<div style=\"background: white; overflow: auto; width: auto; color: black; border: solid gray; border-width: .1em .1em .1em .8em; padding: .2em .6em;\">\n<pre style=\"margin: 0; line-height: 125%;\">root<span style=\"color: #303030;\">.<\/span><span style=\"color: #0000c0;\">visit<\/span><span style=\"color: #303030;\">(<\/span><span style=\"color: #008000; font-weight: bold;\">new<\/span> NodeVisitor<span style=\"color: #303030;\">()<\/span> <span style=\"color: #303030;\">{<\/span>\n\n\t<span style=\"color: #505050; font-weight: bold;\">@Override<\/span>\n\t<span style=\"color: #008000; font-weight: bold;\">public<\/span> <span style=\"color: #303090; font-weight: bold;\">boolean<\/span> <span style=\"color: #0060b0; font-weight: bold;\">visit<\/span><span style=\"color: #303030;\">(<\/span>AstNode node<span style=\"color: #303030;\">)<\/span> <span style=\"color: #303030;\">{<\/span>\n\t\t<span style=\"color: #303090; font-weight: bold;\">int<\/span> nodeType  <span style=\"color: #303030;\">=<\/span> node<span style=\"color: #303030;\">.<\/span><span style=\"color: #0000c0;\">getType<\/span><span style=\"color: #303030;\">();<\/span>\n\t\t<span style=\"color: #808080;\">\/\/TODO: process the node based on node type<\/span>\n\t\t<span style=\"color: #008000; font-weight: bold;\">return<\/span> <span style=\"color: #008000; font-weight: bold;\">true<\/span><span style=\"color: #303030;\">;<\/span> <span style=\"color: #808080;\">\/\/process children<\/span>\n\t<span style=\"color: #303030;\">}<\/span>\n<span style=\"color: #303030;\">});<\/span><\/pre>\n<\/div>\n<p>However the above code threw ClassCastException exception in the visit method of ScriptNode class.<\/p>\n<p><span style=\"text-decoration: underline;\">java.lang.ClassCastException<\/span>: org.mozilla.javascript.Node cannot be cast to org.mozilla.javascript.ast.AstNode \u00a0 \u00a0at org.mozilla.javascript.ast.ScriptNode.visit(<span style=\"text-decoration: underline;\">ScriptNode.java:312<\/span>)<\/p>\n<p>And the test code was a simple JS script &#8211;<\/p>\n<div style=\"background: white; overflow: auto; width: auto; color: black; border: solid gray; border-width: .1em .1em .1em .8em; padding: .2em .6em;\">\n<pre style=\"margin: 0; line-height: 125%;\"><span style=\"color: #008000; font-weight: bold;\">function<\/span> test()\n{\n\t<span style=\"color: #008000; font-weight: bold;\">var<\/span> k <span style=\"color: #303030;\">=<\/span> <span style=\"color: #0000d0; font-weight: bold;\">10<\/span>;\n}\n\ni <span style=\"color: #303030;\">=<\/span> <span style=\"color: #0000d0; font-weight: bold;\">100<\/span>;<\/pre>\n<\/div>\n<p>So finally I decided to traverse the AST from root and process each node. That is when I realized Rhino created AST differently from that AST I had worked with earlier. The AST I had worked with created nodes with their children as array of nodes. If you start with the root and visit each child, you are sure to traverse the entire tree and hence the code.<\/p>\n<p>However Rhino does not create nodes with array of child nodes. Each node has methods to access the first node and then you can traverse\u00a0remaining\u00a0child nodes by accessing &#8216;next&#8217; member. So instead of an array, it creates linked list of child nodes, which is not such a big problem. However a few things\u00a0surprised\u00a0me when I traversed the tree.<\/p>\n<p>I started with the root node and for each node I followed following rules &#8211;<\/p>\n<ol>\n<li>Get first child and traverse the linked list by calling next till the next is null<\/li>\n<li>Call next on the parent node and repeat the loop.<\/li>\n<\/ol>\n<p>Here is what I found &#8211;<\/p>\n<ol>\n<li>When you traverse as above, you will only visit function name, and not function body<\/li>\n<li>If you want to visit function body, you will have to call getFunctions (or getFunctionNode) method of ScriptNode. Both AstNode and FunctionNode are of type ScriptNode.<\/li>\n<li>If you traverse the node as I described above (children first and then next node), you are not\u00a0guaranteed\u00a0to visit all AST nodes.<\/li>\n<\/ol>\n<p>For the simple snippet of JS code above, the AST creates is as follows<\/p>\n<p><a href=\"https:\/\/i0.wp.com\/138.197.85.232\/blog\/wp-content\/uploads\/2013\/03\/Rhino_AST.png\"><img loading=\"lazy\" decoding=\"async\" data-attachment-id=\"743\" data-permalink=\"http:\/\/ramkulkarni.com\/blog\/understanding-ast-created-by-mozilla-rhino-parser\/rhino_ast\/\" data-orig-file=\"https:\/\/i0.wp.com\/ramkulkarni.com\/blog\/wp-content\/uploads\/2013\/03\/Rhino_AST.png?fit=463%2C303\" data-orig-size=\"463,303\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Rhino AST\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/ramkulkarni.com\/blog\/wp-content\/uploads\/2013\/03\/Rhino_AST.png?fit=300%2C196\" data-large-file=\"https:\/\/i0.wp.com\/ramkulkarni.com\/blog\/wp-content\/uploads\/2013\/03\/Rhino_AST.png?fit=463%2C303\" class=\"alignnone size-full wp-image-743\" alt=\"Rhino AST\" src=\"https:\/\/i0.wp.com\/138.197.85.232\/blog\/wp-content\/uploads\/2013\/03\/Rhino_AST.png?resize=463%2C303\" width=\"463\" height=\"303\" srcset=\"https:\/\/i0.wp.com\/ramkulkarni.com\/blog\/wp-content\/uploads\/2013\/03\/Rhino_AST.png?w=463 463w, https:\/\/i0.wp.com\/ramkulkarni.com\/blog\/wp-content\/uploads\/2013\/03\/Rhino_AST.png?resize=300%2C196 300w\" sizes=\"(max-width: 463px) 100vw, 463px\" data-recalc-dims=\"1\" \/><\/a><\/p>\n<p>As you can see, only function Name node is in the tree. AstRoot has two children &#8211; Name (Function) and Node (Expr_Result). Expr_Result node has one child &#8211; Node (setName). SetName has two children- Name (BindName) and NumberLiteral. This tree represents the entire code, except body of the function test. As mentioned earlier, you can get body of the function by calling getFunctionNode on AstRoot.<\/p>\n<p>Now, if you inspect Name (BindName) node, you will see that it&#8217;s parent is Assignment node. Assignment has left node as Name(BindName) and right node as NumberLiteral. However\u00a0Assignment\u00a0is not part of the tree if you traverse it as described above.<\/p>\n<p>So you also need to consider type \u00a0(type constants are decalred in\u00a0org.mozilla.javascript.Token class) and parent of the node when processing the AST created by Rhino.<\/p>\n<p>-Ram Kulkarni<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>For an application I am developing, I needed to get all functions and variables declared in JavaScript code. Because the application I am developing is in Java, I started looking for readily\u00a0available\u00a0JavaScript parser written in Java. I found Mozilla Rhino to be a good fit for my requirements and decided to use it. Parsing with &hellip; <\/p>\n<p class=\"link-more\"><a href=\"http:\/\/ramkulkarni.com\/blog\/understanding-ast-created-by-mozilla-rhino-parser\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Understanding AST created by Mozilla Rhino parser&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"Understanding AST created by Mozilla Rhino parser http:\/\/wp.me\/p2g9O8-bW","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","enabled":false}}},"categories":[20,1],"tags":[67],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p2g9O8-bW","jetpack-related-posts":[],"_links":{"self":[{"href":"http:\/\/ramkulkarni.com\/blog\/wp-json\/wp\/v2\/posts\/740"}],"collection":[{"href":"http:\/\/ramkulkarni.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/ramkulkarni.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/ramkulkarni.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/ramkulkarni.com\/blog\/wp-json\/wp\/v2\/comments?post=740"}],"version-history":[{"count":0,"href":"http:\/\/ramkulkarni.com\/blog\/wp-json\/wp\/v2\/posts\/740\/revisions"}],"wp:attachment":[{"href":"http:\/\/ramkulkarni.com\/blog\/wp-json\/wp\/v2\/media?parent=740"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/ramkulkarni.com\/blog\/wp-json\/wp\/v2\/categories?post=740"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/ramkulkarni.com\/blog\/wp-json\/wp\/v2\/tags?post=740"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}