{"id":921,"date":"2013-07-22T11:03:04","date_gmt":"2013-07-22T05:33:04","guid":{"rendered":"http:\/\/ramkulkarni.com\/blog\/?p=921"},"modified":"2013-07-22T11:03:04","modified_gmt":"2013-07-22T05:33:04","slug":"capturing-absolute-offsets-for-javaccjjtree-tokens","status":"publish","type":"post","link":"http:\/\/ramkulkarni.com\/blog\/capturing-absolute-offsets-for-javaccjjtree-tokens\/","title":{"rendered":"Capturing absolute offsets for JavaCC\/JJTree tokens"},"content":{"rendered":"<p>I use <a href=\"https:\/\/javacc.java.net\/\" target=\"_blank\">JavaCC<\/a> for generating parsers in Java. And use <a href=\"https:\/\/javacc.java.net\/doc\/JJTree.html\" target=\"_blank\">JJTree<\/a> to create AST after parsing. JJTree creates nodes of the AST and you can configure JavaCC options to capture tokens in the node &#8211; i.e. if you want each node to contain start and end tokens. The default code generated by JavaCC creates Token class with offsets that are relative to the starting offset of the line. It has fields like beginLine, beginColumn, endLine and endColumn. Here the line numbers are absolute line numbers (starting from 1) and column fields contain offsets (again starting from 1) within the corresponding\u00a0lines.<br \/>\nHowever many times you want to capture absolute offsets of tokens in the input stream, and not just relative offset in the line. I wish there was a <a href=\"https:\/\/javacc.java.net\/doc\/javaccgrm.html#prod6\" target=\"_blank\">JavaCC option<\/a> to enable this. But it is not too complex if you want to do it yourself.<\/p>\n<p>To explain how to do this, I will take a grammer file that is generated by the JavaCC wizard of <a href=\"http:\/\/eclipse-javacc.sourceforge.net\/\" target=\"_blank\">JavaCC Eclipse plugin<\/a>. This is the default grammer file it generates &#8211;<!--more--><\/p>\n<pre><span style=\"color: #3f5fbf;\">\/**<\/span>\n<span style=\"color: #7f9fbf; font-weight: bold;\">*<\/span><span style=\"color: #3f5fbf;\"> JJTree file<\/span>\n<span style=\"color: #3f5fbf;\">*\/<\/span>\n\n<span style=\"color: #7f0055; background: #ffffff; font-weight: bold;\">options<\/span><span style=\"color: #000000; background: #ffffff;\">{<\/span>\n<span style=\"color: #000000; background: #ffffff;\">\u00a0\u00a0JDK_VERSION <\/span><span style=\"color: #000000; background: #ffffff;\">=<\/span><span style=\"color: #2a00ff; background: #ffffff;\">\"1.5\"<\/span><span style=\"color: #000000; background: #ffffff;\">;<\/span>\n<span style=\"color: #000000; background: #ffffff;\">}<\/span>\n\n<span style=\"color: #7f0055; background: #ffffff; font-weight: bold;\">PARSER_BEGIN<\/span><span style=\"color: #000000; background: #ffffff;\">(<\/span><span style=\"color: #000000; background: #ffffff;\">eg2<\/span><span style=\"color: #000000; background: #ffffff;\">)<\/span>\n<span style=\"color: #7f0055; background: #ffffff; font-weight: bold;\">package<\/span><span style=\"color: #7f0055; background: #ffffff;\"> test<\/span><span style=\"color: #7f0055; background: #ffffff;\">;<\/span>\n\n<span style=\"color: #7f0055; background: #ffffff; font-weight: bold;\">public<\/span><span style=\"color: #7f0055; background: #ffffff; font-weight: bold;\">class<\/span><span style=\"color: #000000; background: #ffffff;\"> eg2 <\/span><span style=\"color: #000000; background: #ffffff;\">{<\/span>\n<span style=\"color: #000000; background: #ffffff;\">\u00a0\u00a0<\/span><span style=\"color: #7f0055; background: #ffffff; font-weight: bold;\">public<\/span><span style=\"color: #7f0055; background: #ffffff; font-weight: bold;\">static<\/span><span style=\"color: #7f0055; background: #ffffff; font-weight: bold;\">void<\/span><span style=\"color: #000000; background: #ffffff;\"> main<\/span><span style=\"color: #000000; background: #ffffff;\">(<\/span><span style=\"color: #7f0055; background: #ffffff; font-weight: bold;\">String<\/span><span style=\"color: #000000; background: #ffffff;\"> args<\/span><span style=\"color: #000000; background: #ffffff;\">[<\/span><span style=\"color: #000000; background: #ffffff;\">]<\/span><span style=\"color: #000000; background: #ffffff;\">)<\/span><span style=\"color: #000000; background: #ffffff;\">{<\/span>\n<span style=\"color: #000000; background: #ffffff;\">\u00a0\u00a0\u00a0\u00a0<\/span><span style=\"color: #7f0055; background: #ffffff; font-weight: bold;\">System<\/span><span style=\"color: #000000; background: #ffffff;\">.<\/span><span style=\"color: #000000; background: #ffffff;\">out<\/span><span style=\"color: #000000; background: #ffffff;\">.<\/span><span style=\"color: #000000; background: #ffffff;\">println<\/span><span style=\"color: #000000; background: #ffffff;\">(<\/span><span style=\"color: #2a00ff; background: #ffffff;\">\"Reading from standard input...\"<\/span><span style=\"color: #000000; background: #ffffff;\">)<\/span><span style=\"color: #000000; background: #ffffff;\">;<\/span>\n<span style=\"color: #000000; background: #ffffff;\">\u00a0\u00a0\u00a0\u00a0<\/span><span style=\"color: #7f0055; background: #ffffff; font-weight: bold;\">System<\/span><span style=\"color: #000000; background: #ffffff;\">.<\/span><span style=\"color: #000000; background: #ffffff;\">out<\/span><span style=\"color: #000000; background: #ffffff;\">.<\/span><span style=\"color: #000000; background: #ffffff;\">print<\/span><span style=\"color: #000000; background: #ffffff;\">(<\/span><span style=\"color: #2a00ff; background: #ffffff;\">\"Enter an expression like <\/span><span style=\"color: #2a00ff; background: #ffffff;\">\\\"<\/span><span style=\"color: #2a00ff; background: #ffffff;\">1+(2+3)*var;<\/span><span style=\"color: #2a00ff; background: #ffffff;\">\\\"<\/span><span style=\"color: #2a00ff; background: #ffffff;\"> :\"<\/span><span style=\"color: #000000; background: #ffffff;\">)<\/span><span style=\"color: #000000; background: #ffffff;\">;<\/span>\n<span style=\"color: #000000; background: #ffffff;\">\u00a0\u00a0\u00a0\u00a0eg2 parser <\/span><span style=\"color: #000000; background: #ffffff;\">=<\/span><span style=\"color: #7f0055; background: #ffffff; font-weight: bold;\">new<\/span><span style=\"color: #000000; background: #ffffff;\"> eg2<\/span><span style=\"color: #000000; background: #ffffff;\">(<\/span><span style=\"color: #7f0055; background: #ffffff; font-weight: bold;\">System<\/span><span style=\"color: #000000; background: #ffffff;\">.<\/span><span style=\"color: #000000; background: #ffffff;\">in<\/span><span style=\"color: #000000; background: #ffffff;\">)<\/span><span style=\"color: #000000; background: #ffffff;\">;<\/span>\n<span style=\"color: #000000; background: #ffffff;\">\u00a0\u00a0\u00a0\u00a0<\/span><span style=\"color: #7f0055; background: #ffffff; font-weight: bold;\">try<\/span><span style=\"color: #000000; background: #ffffff;\">{<\/span>\n<span style=\"color: #000000; background: #ffffff;\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0SimpleNode startNode <\/span><span style=\"color: #000000; background: #ffffff;\">=<\/span><span style=\"color: #000000; background: #ffffff;\"> parser<\/span><span style=\"color: #000000; background: #ffffff;\">.<\/span><span style=\"color: #000000; background: #ffffff;\">Start<\/span><span style=\"color: #000000; background: #ffffff;\">(<\/span><span style=\"color: #000000; background: #ffffff;\">)<\/span><span style=\"color: #000000; background: #ffffff;\">;<\/span>\n<span style=\"color: #000000; background: #ffffff;\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0startNode<\/span><span style=\"color: #000000; background: #ffffff;\">.<\/span><span style=\"color: #000000; background: #ffffff;\">dump<\/span><span style=\"color: #000000; background: #ffffff;\">(<\/span><span style=\"color: #2a00ff; background: #ffffff;\">\"\"<\/span><span style=\"color: #000000; background: #ffffff;\">)<\/span><span style=\"color: #000000; background: #ffffff;\">;<\/span>\n<span style=\"color: #000000; background: #ffffff;\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span style=\"color: #7f0055; background: #ffffff; font-weight: bold;\">System<\/span><span style=\"color: #000000; background: #ffffff;\">.<\/span><span style=\"color: #000000; background: #ffffff;\">out<\/span><span style=\"color: #000000; background: #ffffff;\">.<\/span><span style=\"color: #000000; background: #ffffff;\">println<\/span><span style=\"color: #000000; background: #ffffff;\">(<\/span><span style=\"color: #2a00ff; background: #ffffff;\">\"Thank you.\"<\/span><span style=\"color: #000000; background: #ffffff;\">)<\/span><span style=\"color: #000000; background: #ffffff;\">;<\/span>\n<span style=\"color: #000000; background: #ffffff;\">\u00a0\u00a0\u00a0\u00a0<\/span><span style=\"color: #000000; background: #ffffff;\">}<\/span><span style=\"color: #7f0055; background: #ffffff; font-weight: bold;\">catch<\/span><span style=\"color: #000000; background: #ffffff;\">(<\/span><span style=\"color: #7f0055; background: #ffffff; font-weight: bold;\">Exception<\/span><span style=\"color: #000000; background: #ffffff;\"> e<\/span><span style=\"color: #000000; background: #ffffff;\">)<\/span><span style=\"color: #000000; background: #ffffff;\">{<\/span>\n<span style=\"color: #000000; background: #ffffff;\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span style=\"color: #7f0055; background: #ffffff; font-weight: bold;\">System<\/span><span style=\"color: #000000; background: #ffffff;\">.<\/span><span style=\"color: #000000; background: #ffffff;\">out<\/span><span style=\"color: #000000; background: #ffffff;\">.<\/span><span style=\"color: #000000; background: #ffffff;\">println<\/span><span style=\"color: #000000; background: #ffffff;\">(<\/span><span style=\"color: #2a00ff; background: #ffffff;\">\"Oops.\"<\/span><span style=\"color: #000000; background: #ffffff;\">)<\/span><span style=\"color: #000000; background: #ffffff;\">;<\/span>\n<span style=\"color: #000000; background: #ffffff;\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span style=\"color: #7f0055; background: #ffffff; font-weight: bold;\">System<\/span><span style=\"color: #000000; background: #ffffff;\">.<\/span><span style=\"color: #000000; background: #ffffff;\">out<\/span><span style=\"color: #000000; background: #ffffff;\">.<\/span><span style=\"color: #000000; background: #ffffff;\">println<\/span><span style=\"color: #000000; background: #ffffff;\">(<\/span><span style=\"color: #000000; background: #ffffff;\">e<\/span><span style=\"color: #000000; background: #ffffff;\">.<\/span><span style=\"color: #000000; background: #ffffff;\">getMessage<\/span><span style=\"color: #000000; background: #ffffff;\">(<\/span><span style=\"color: #000000; background: #ffffff;\">)<\/span><span style=\"color: #000000; background: #ffffff;\">)<\/span><span style=\"color: #000000; background: #ffffff;\">;<\/span>\n<span style=\"color: #000000; background: #ffffff;\">\u00a0\u00a0\u00a0\u00a0<\/span><span style=\"color: #000000; background: #ffffff;\">}<\/span>\n<span style=\"color: #000000; background: #ffffff;\">\u00a0\u00a0<\/span><span style=\"color: #000000; background: #ffffff;\">}<\/span>\n<span style=\"color: #000000; background: #ffffff;\">}<\/span>\n<span style=\"color: #7f0055; background: #ffffff; font-weight: bold;\">PARSER_END<\/span><span style=\"color: #000000; background: #ffffff;\">(<\/span><span style=\"color: #000000; background: #ffffff;\">eg2<\/span><span style=\"color: #000000; background: #ffffff;\">)<\/span>\n\n<span style=\"color: #7f0055; font-weight: bold;\">SKIP <\/span>:\n{\n  <span style=\"color: #2a00ff;\">\" \"<\/span>\n| <span style=\"color: #2a00ff;\">\"<\/span><span style=\"color: #2a00ff;\">\\t<\/span><span style=\"color: #2a00ff;\">\"<\/span>\n| <span style=\"color: #2a00ff;\">\"<\/span><span style=\"color: #2a00ff;\">\\n<\/span><span style=\"color: #2a00ff;\">\"<\/span>\n| <span style=\"color: #2a00ff;\">\"<\/span><span style=\"color: #2a00ff;\">\\r<\/span><span style=\"color: #2a00ff;\">\"<\/span>\n| &lt;<span style=\"color: #2a00ff;\">\"\/\/\"<\/span> (~[<span style=\"color: #2a00ff;\">\"<\/span><span style=\"color: #2a00ff;\">\\n<\/span><span style=\"color: #2a00ff;\">\"<\/span>,<span style=\"color: #2a00ff;\">\"<\/span><span style=\"color: #2a00ff;\">\\r<\/span><span style=\"color: #2a00ff;\">\"<\/span>])* (<span style=\"color: #2a00ff;\">\"<\/span><span style=\"color: #2a00ff;\">\\n<\/span><span style=\"color: #2a00ff;\">\"<\/span>|<span style=\"color: #2a00ff;\">\"<\/span><span style=\"color: #2a00ff;\">\\r<\/span><span style=\"color: #2a00ff;\">\"<\/span>|<span style=\"color: #2a00ff;\">\"<\/span><span style=\"color: #2a00ff;\">\\r<\/span><span style=\"color: #2a00ff;\">\\n<\/span><span style=\"color: #2a00ff;\">\"<\/span>)&gt;\n| &lt;<span style=\"color: #2a00ff;\">\"\/*\"<\/span> (~[<span style=\"color: #2a00ff;\">\"*\"<\/span>])* <span style=\"color: #2a00ff;\">\"*\"<\/span> (~[<span style=\"color: #2a00ff;\">\"\/\"<\/span>] (~[<span style=\"color: #2a00ff;\">\"*\"<\/span>])* <span style=\"color: #2a00ff;\">\"*\"<\/span>)* <span style=\"color: #2a00ff;\">\"\/\"<\/span>&gt;\n}\n<span style=\"color: #7f0055; font-weight: bold;\">TOKEN <\/span>: <span style=\"color: #3f7f59;\">\/* LITERALS *\/<\/span>\n{\n  &lt; INTEGER_LITERAL:\n        &lt;DECIMAL_LITERAL&gt; ([<span style=\"color: #2a00ff;\">\"l\"<\/span>,<span style=\"color: #2a00ff;\">\"L\"<\/span>])?\n      | &lt;HEX_LITERAL&gt; ([<span style=\"color: #2a00ff;\">\"l\"<\/span>,<span style=\"color: #2a00ff;\">\"L\"<\/span>])?\n      | &lt;OCTAL_LITERAL&gt; ([<span style=\"color: #2a00ff;\">\"l\"<\/span>,<span style=\"color: #2a00ff;\">\"L\"<\/span>])?\n  &gt;\n|  &lt; #DECIMAL_LITERAL: [<span style=\"color: #2a00ff;\">\"1\"<\/span>-<span style=\"color: #2a00ff;\">\"9\"<\/span>] ([<span style=\"color: #2a00ff;\">\"0\"<\/span>-<span style=\"color: #2a00ff;\">\"9\"<\/span>])* &gt;\n|  &lt; #HEX_LITERAL: <span style=\"color: #2a00ff;\">\"0\"<\/span> [<span style=\"color: #2a00ff;\">\"x\"<\/span>,<span style=\"color: #2a00ff;\">\"X\"<\/span>] ([<span style=\"color: #2a00ff;\">\"0\"<\/span>-<span style=\"color: #2a00ff;\">\"9\"<\/span>,<span style=\"color: #2a00ff;\">\"a\"<\/span>-<span style=\"color: #2a00ff;\">\"f\"<\/span>,<span style=\"color: #2a00ff;\">\"A\"<\/span>-<span style=\"color: #2a00ff;\">\"F\"<\/span>])+ &gt;\n|  &lt; #OCTAL_LITERAL: <span style=\"color: #2a00ff;\">\"0\"<\/span> ([<span style=\"color: #2a00ff;\">\"0\"<\/span>-<span style=\"color: #2a00ff;\">\"7\"<\/span>])* &gt;\n}\n<span style=\"color: #7f0055; font-weight: bold;\">TOKEN <\/span>: <span style=\"color: #3f7f59;\">\/* IDENTIFIERS *\/<\/span>\n{\n  &lt; IDENTIFIER: &lt;LETTER&gt; (&lt;LETTER&gt;|&lt;DIGIT&gt;)* &gt;\n|  &lt; #LETTER: [<span style=\"color: #2a00ff;\">\"_\"<\/span>,<span style=\"color: #2a00ff;\">\"a\"<\/span>-<span style=\"color: #2a00ff;\">\"z\"<\/span>,<span style=\"color: #2a00ff;\">\"A\"<\/span>-<span style=\"color: #2a00ff;\">\"Z\"<\/span>] &gt;\n|  &lt; #DIGIT: [<span style=\"color: #2a00ff;\">\"0\"<\/span>-<span style=\"color: #2a00ff;\">\"9\"<\/span>] &gt;\n}\n\nSimpleNode Start():{}\n{\n  Expression() <span style=\"color: #2a00ff;\">\";\"<\/span>\n  { <span style=\"color: #7f0055; font-weight: bold;\">return<\/span> jjtThis; }\n}\n<span style=\"color: #7f0055; font-weight: bold;\">void<\/span> Expression():{ }\n{\n  AdditiveExpression()\n}\n<span style=\"color: #7f0055; font-weight: bold;\">void<\/span> AdditiveExpression():{}\n{\n  MultiplicativeExpression() ( ( <span style=\"color: #2a00ff;\">\"+\"<\/span> | <span style=\"color: #2a00ff;\">\"-\"<\/span> ) MultiplicativeExpression() )*\n}\n<span style=\"color: #7f0055; font-weight: bold;\">void<\/span> MultiplicativeExpression():{}\n{\n  UnaryExpression() ( ( <span style=\"color: #2a00ff;\">\"*\"<\/span> | <span style=\"color: #2a00ff;\">\"\/\"<\/span> | <span style=\"color: #2a00ff;\">\"%\"<\/span> ) UnaryExpression() )*\n}\n<span style=\"color: #7f0055; font-weight: bold;\">void<\/span> UnaryExpression():{}\n{\n  <span style=\"color: #2a00ff;\">\"(\"<\/span> Expression() <span style=\"color: #2a00ff;\">\")\"<\/span> | Identifier() | <span style=\"color: #7f0055; font-weight: bold;\">Integer<\/span>()\n}\n<span style=\"color: #7f0055; font-weight: bold;\">void<\/span> Identifier():{}\n{\n  &lt;IDENTIFIER&gt;\n}\n<span style=\"color: #7f0055; font-weight: bold;\">void<\/span> <span style=\"color: #7f0055; font-weight: bold;\">Integer<\/span>():{}\n{\n  &lt;INTEGER_LITERAL&gt;\n}<\/pre>\n<p>I have slightly modified the main function so as not to use eg2 class as a static class. Above is a grammer for parsing simple arithmetic expressions like 1 + 2 * ( 3 +4) etc. Now consider the following input (with new lines) &#8211;<br \/>\n1 +<br \/>\n2 * (3+4);<br \/>\nFor &#8216;1&#8217;, JavaCC generates token with beginLine=1 and beginColumn=1. For &#8216;2&#8217; it generates Token with beginLine=2 and beginColumn=1. However you might want to know the absolute position of &#8216;2&#8217; in the input stream. BTW, the AST nodes generated by the above grammer will not contain start and end tokens. You will need to set JavaCC option &#8216;TRACK_TOKENS&#8217; to true.<\/p>\n<p>To capture absolute offsets, we need to modify SimpleCharStream class to keep track of total chars read and create an additional field in the Token class to store absolute offsets. However we need to to go through some intermediate steps to do this.<\/p>\n<h2>Setting additional JavaCC options:<\/h2>\n<p>Set following options in the grammer file &#8211;<\/p>\n<pre><span style=\"color: #7f0055; background: #ffffff; font-weight: bold;\">options<\/span><span style=\"color: #000000; background: #ffffff;\">{<\/span>\n<span style=\"color: #000000; background: #ffffff;\">\u00a0\u00a0JDK_VERSION <\/span><span style=\"color: #000000; background: #ffffff;\">=<\/span><span style=\"color: #2a00ff; background: #ffffff;\">\"1.5\"<\/span><span style=\"color: #000000; background: #ffffff;\">;<\/span>\n<span style=\"color: #000000; background: #ffffff;\">\u00a0\u00a0TRACK_TOKENS <\/span><span style=\"color: #000000; background: #ffffff;\">=<\/span><span style=\"color: #7f0055; background: #ffffff; font-weight: bold;\">true<\/span><span style=\"color: #000000; background: #ffffff;\">;<\/span>\n<span style=\"color: #000000; background: #ffffff;\">\u00a0\u00a0TOKEN_EXTENDS <\/span><span style=\"color: #000000; background: #ffffff;\">=<\/span><span style=\"color: #2a00ff; background: #ffffff;\">\"BaseToken\"<\/span><span style=\"color: #000000; background: #ffffff;\">; \/\/specify package if not in the same as the parser <\/span>\n<span style=\"color: #000000; background: #ffffff;\">\u00a0\u00a0<\/span><span style=\"color: #7f0055; background: #ffffff;\">COMMON_TOKEN_ACTION<\/span><span style=\"color: #000000; background: #ffffff;\">=<\/span><span style=\"color: #7f0055; background: #ffffff; font-weight: bold;\">true<\/span><span style=\"color: #000000; background: #ffffff;\">;<\/span>\n<span style=\"color: #000000; background: #ffffff;\">}<\/span><\/pre>\n<p>We will not modify Token class to add a field for the absolute offset, but we will specify the parent class for Token. This is done using option TOKEN_EXTENDS.<br \/>\nWe want a hook in the token manager to set absolute offset after a new Token is created. So we set COMMON_TOKEN_ACTION. When this is set to true, TokenManager calls\u00a0CommonTokenAction after creating a new Token. This method must be declared in the\u00a0<a href=\"https:\/\/javacc.java.net\/doc\/javaccgrm.html#prod12\" target=\"_blank\">TOKEN_MGR_DECLS<\/a> section.<\/p>\n<h2>Create Base Class for Token<\/h2>\n<pre><span style=\"color: #7f0055; font-weight: bold;\">public<\/span> <span style=\"color: #7f0055; font-weight: bold;\">class<\/span> BaseToken {\n\t<span style=\"color: #7f0055; font-weight: bold;\">public<\/span> <span style=\"color: #7f0055; font-weight: bold;\">int<\/span> absoluteBeginColumn = 0;\n\t<span style=\"color: #7f0055; font-weight: bold;\">public<\/span> <span style=\"color: #7f0055; font-weight: bold;\">int<\/span> absoluteEndColumn = 0;\n}<\/pre>\n<h2>Modify\u00a0SimpleCharStream<\/h2>\n<p>This class reads input from the input stream (such as file), buffers it and keeps track of current token offset. You can specify initial buffer size for this class. SimpleCharStream reads from the input stream and stores data in a char buffer (of the size you specified or default). When all the chars in the buffer are read (by the TokenManager) and more data is available to read, then it expands the buffer.<\/p>\n<p>We will add two fields to this class &#8211;<\/p>\n<pre><span style=\"color: #7f0055; font-weight: bold;\">protected<\/span> int totalCharsRead = 0;\n<span style=\"color: #7f0055; font-weight: bold;\">protected<\/span> int absoluteTokenBengin = 0;<\/pre>\n<p>As the name suggests, totalCharsRead keeps count of total chars read from the input stream. And absoluteTokenBegin points to absolute offset of beginning of a new Token. We will add an accessor function for\u00a0absoluteTokenBegin &#8211;<\/p>\n<pre><span style=\"color: #7f0055; font-weight: bold;\">public<\/span> <span style=\"color: #7f0055; font-weight: bold;\">final<\/span> int getAbsoluteTokenBengin() {\n    <span style=\"color: #7f0055; font-weight: bold;\">return<\/span> absoluteTokenBengin;\n}<\/pre>\n<p>We will increment totalChars read whenever a character is read from the buffer and we will decrement it when character is but back in the buffer (in backup function). The read \u00a0and backup functions look as below after modifications &#8211;<\/p>\n<pre><span style=\"color: #3f5fbf;\">\/** Read a character. *\/<\/span>\n  <span style=\"color: #7f0055; font-weight: bold;\">public<\/span> char readChar() <span style=\"color: #7f0055; font-weight: bold;\">throws<\/span> java.io.IOException\n  {\n    <span style=\"color: #7f0055; font-weight: bold;\">if<\/span> (inBuf &gt; 0)\n    {\n      --inBuf;\n\n      <span style=\"color: #7f0055; font-weight: bold;\">if<\/span> (++bufpos == bufsize)\n        bufpos = 0;\n\n      <strong>totalCharsRead++;<\/strong>\n      <span style=\"color: #7f0055; font-weight: bold;\">return<\/span> buffer[bufpos];\n    }\n\n    <span style=\"color: #7f0055; font-weight: bold;\">if<\/span> (++bufpos &gt;= maxNextCharInd)\n      FillBuff();\n\n    <strong>totalCharsRead++;<\/strong>\n    <span style=\"color: #7f0055; font-weight: bold;\">char<\/span> c = buffer[bufpos];\n\n    UpdateLineColumn(c);\n    <span style=\"color: #7f0055; font-weight: bold;\">return<\/span> c;\n  }\n\n<span style=\"color: #3f5fbf;\">\/** Backup a number of characters. *\/<\/span>\n  <span style=\"color: #7f0055; font-weight: bold;\">public<\/span> void backup(int amount) {\n\n    inBuf += amount;\n    <strong>totalCharsRead -= amount;<\/strong>\n    <span style=\"color: #7f0055; font-weight: bold;\">if<\/span> ((bufpos -= amount) &lt; 0)\n      bufpos += bufsize;\n  }<\/pre>\n<p>Modifications are highlighted in bold. Next, we will modify BeginToken function to set\u00a0absoluteTokenBengin &#8211;<\/p>\n<pre><span style=\"color: #3f5fbf;\">\/** Start. *\/<\/span>\n  <span style=\"color: #7f0055; font-weight: bold;\">public<\/span> char BeginToken() <span style=\"color: #7f0055; font-weight: bold;\">throws<\/span> java.io.IOException\n  {\n    tokenBegin = -1;\n    <span style=\"color: #7f0055; font-weight: bold;\">char<\/span> c = readChar();\n    tokenBegin = bufpos;\n    <strong>absoluteTokenBengin = totalCharsRead;<\/strong>\n\n    <span style=\"color: #7f0055; font-weight: bold;\">return<\/span> c;\n  }<\/pre>\n<h2>Modify TokenManager<\/h2>\n<p>Finally we will add method\u00a0CommonTokenAction to the token manager. As mentioned above, we need to add this function to TOKEN_MGR_DECLS section of the grammer file &#8211;<\/p>\n<pre><span style=\"color: #7f0055; font-weight: bold;\">TOKEN_MGR_DECLS <\/span>:\n{\n\t<span style=\"color: #7f0055; font-weight: bold;\">public<\/span> <span style=\"color: #7f0055; font-weight: bold;\">void<\/span> CommonTokenAction(Token t)\n\t{\n\t\tt.absoluteBeginColumn = getCurrentTokenAbsolutePosition();\n\t\tt.absoluteEndColumn = t.absoluteBeginColumn + t.image.length();\n\t}\n\n\t<span style=\"color: #7f0055; font-weight: bold;\">public<\/span> <span style=\"color: #7f0055; font-weight: bold;\">int<\/span> getCurrentTokenAbsolutePosition()\n\t{\n\t\t<span style=\"color: #7f0055; font-weight: bold;\">if<\/span> (input_stream <span style=\"color: #7f0055; font-weight: bold;\">instanceof<\/span> SimpleCharStream)\n\t\t\t<span style=\"color: #7f0055; font-weight: bold;\">return<\/span> ((SimpleCharStream)input_stream).getAbsoluteTokenBengin();\n\t\t<span style=\"color: #7f0055; font-weight: bold;\">return<\/span> -1;\n\t}\n}<\/pre>\n<p>This is not related to the problem we are trying to solve, but if you have to write a large block of Java code in TOKEN_MGR_DECLS, which could be delegated to parser class, then you might want to use TOKEN_MANAGER_USES_PARSER option. If set to true, JavaCC creates TokenManager class with a field, parser, that holds reference to the main parser class. Then from within\u00a0TOKEN_MGR_DECLS, you can call methods on the parser object.<\/p>\n<p>You can easily test if the absolute offsets are set in the Token by debugging the code. Or you can also modify dump method of SimpleNode to print additional token information &#8211;<\/p>\n<pre><span style=\"color: #3f7f59;\">\/* Override this method if you want to customize how the node dumps<\/span>\n<span style=\"color: #3f7f59;\">\u00a0\u00a0\u00a0\u00a0\u00a0out its children. *\/<\/span>\n\n  <span style=\"color: #7f0055; font-weight: bold;\">public<\/span> void dump(String prefix) {\n    <span style=\"color: #7f0055; font-weight: bold;\">System<\/span>.out.println(toString(prefix));\n    printTokenInfo(prefix);\n    <span style=\"color: #7f0055; font-weight: bold;\">if<\/span> (children != <span style=\"color: #7f0055; font-weight: bold;\">null<\/span>) {\n      <span style=\"color: #7f0055; font-weight: bold;\">for<\/span> (<span style=\"color: #7f0055; font-weight: bold;\">int<\/span> i = 0; i &lt; children.length; ++i) {\n        SimpleNode n = (SimpleNode)children[i];\n        <span style=\"color: #7f0055; font-weight: bold;\">if<\/span> (n != <span style=\"color: #7f0055; font-weight: bold;\">null<\/span>) {\n          n.dump(prefix + <span style=\"color: #2a00ff;\">\" \"<\/span>);\n        }\n      }\n    }\n  }\n}\n\n<span style=\"color: #3f7f59;\">\/\/New method added to print token information<\/span>\n  <span style=\"color: #7f0055; font-weight: bold;\">public<\/span> void printTokenInfo(String prefix)\n  {\n\t  prefix += <span style=\"color: #2a00ff;\">\"<\/span><span style=\"color: #2a00ff;\">\\t<\/span><span style=\"color: #2a00ff;\">\"<\/span>;\n\t  <span style=\"color: #7f0055; font-weight: bold;\">System<\/span>.out.println(prefix + <span style=\"color: #2a00ff;\">\"StartCol = \"<\/span> + firstToken.beginColumn +\n\t\t\t  <span style=\"color: #2a00ff;\">\" AbsStartCol = \"<\/span> + firstToken.absoluteBeginColumn + <span style=\"color: #2a00ff;\">\" - \"<\/span> +\n\t\t\t  firstToken.image);\n\t  <span style=\"color: #7f0055; font-weight: bold;\">System<\/span>.out.println(prefix + <span style=\"color: #2a00ff;\">\"EndCol = \"<\/span> + lastToken.beginColumn +\n\t\t\t  <span style=\"color: #2a00ff;\">\" AbsEndCol = \"<\/span> + lastToken.absoluteBeginColumn + <span style=\"color: #2a00ff;\">\" - \"<\/span> +\n\t\t\t  lastToken.image);\n  }<\/pre>\n<p>Now, for the input &#8211;<br \/>\n1 +<br \/>\n2 * (3+4);<br \/>\nfollowing output will be printed &#8211;<\/p>\n<pre>Start\n\tStartCol = 1 AbsStartCol = 1 Image - 1\n\tEndCol = 10 AbsEndCol = 14 Image - ;\n Expression\n \tStartCol = 1 AbsStartCol = 1 Image - 1\n \tEndCol = 9 AbsEndCol = 13 Image - )\n  AdditiveExpression\n  \tStartCol = 1 AbsStartCol = 1 Image - 1\n  \tEndCol = 9 AbsEndCol = 13 Image - )\n   MultiplicativeExpression\n   \tStartCol = 1 AbsStartCol = 1 Image - 1\n   \tEndCol = 1 AbsEndCol = 1 Image - 1\n    UnaryExpression\n    \tStartCol = 1 AbsStartCol = 1 Image - 1\n    \tEndCol = 1 AbsEndCol = 1 Image - 1\n     Integer\n     \tStartCol = 1 AbsStartCol = 1 Image - 1\n     \tEndCol = 1 AbsEndCol = 1 Image - 1\n   MultiplicativeExpression\n   \tStartCol = 1 AbsStartCol = 5 Image - 2\n   \tEndCol = 9 AbsEndCol = 13 Image - )\n    UnaryExpression\n    \tStartCol = 1 AbsStartCol = 5 Image - 2\n    \tEndCol = 1 AbsEndCol = 5 Image - 2\n     Integer\n     \tStartCol = 1 AbsStartCol = 5 Image - 2\n     \tEndCol = 1 AbsEndCol = 5 Image - 2\n    UnaryExpression\n    \tStartCol = 5 AbsStartCol = 9 Image - (\n    \tEndCol = 9 AbsEndCol = 13 Image - )\n     Expression\n     \tStartCol = 6 AbsStartCol = 10 Image - 3\n     \tEndCol = 8 AbsEndCol = 12 Image - 4\n      AdditiveExpression\n      \tStartCol = 6 AbsStartCol = 10 Image - 3\n      \tEndCol = 8 AbsEndCol = 12 Image - 4\n       MultiplicativeExpression\n       \tStartCol = 6 AbsStartCol = 10 Image - 3\n       \tEndCol = 6 AbsEndCol = 10 Image - 3\n        UnaryExpression\n        \tStartCol = 6 AbsStartCol = 10 Image - 3\n        \tEndCol = 6 AbsEndCol = 10 Image - 3\n         Integer\n         \tStartCol = 6 AbsStartCol = 10 Image - 3\n         \tEndCol = 6 AbsEndCol = 10 Image - 3\n       MultiplicativeExpression\n       \tStartCol = 8 AbsStartCol = 12 Image - 4\n       \tEndCol = 8 AbsEndCol = 12 Image - 4\n        UnaryExpression\n        \tStartCol = 8 AbsStartCol = 12 Image - 4\n        \tEndCol = 8 AbsEndCol = 12 Image - 4\n         Integer\n         \tStartCol = 8 AbsStartCol = 12 Image - 4\n         \tEndCol = 8 AbsEndCol = 12 Image - 4\nThank you.<\/pre>\n<p>&#8211; Ram Kulkarni<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I use JavaCC for generating parsers in Java. And use JJTree to create AST after parsing. JJTree creates nodes of the AST and you can configure JavaCC options to capture tokens in the node &#8211; i.e. if you want each node to contain start and end tokens. The default code generated by JavaCC creates Token &hellip; <\/p>\n<p class=\"link-more\"><a href=\"http:\/\/ramkulkarni.com\/blog\/capturing-absolute-offsets-for-javaccjjtree-tokens\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Capturing absolute offsets for JavaCC\/JJTree tokens&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"Capturing absolute offsets for JavaCC\/JJTree tokens http:\/\/wp.me\/p2g9O8-eR","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","enabled":false}}},"categories":[76,1],"tags":[77,39,74],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p2g9O8-eR","jetpack-related-posts":[],"_links":{"self":[{"href":"http:\/\/ramkulkarni.com\/blog\/wp-json\/wp\/v2\/posts\/921"}],"collection":[{"href":"http:\/\/ramkulkarni.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/ramkulkarni.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/ramkulkarni.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/ramkulkarni.com\/blog\/wp-json\/wp\/v2\/comments?post=921"}],"version-history":[{"count":0,"href":"http:\/\/ramkulkarni.com\/blog\/wp-json\/wp\/v2\/posts\/921\/revisions"}],"wp:attachment":[{"href":"http:\/\/ramkulkarni.com\/blog\/wp-json\/wp\/v2\/media?parent=921"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/ramkulkarni.com\/blog\/wp-json\/wp\/v2\/categories?post=921"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/ramkulkarni.com\/blog\/wp-json\/wp\/v2\/tags?post=921"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}