I use JavaCC for generating parsers in Java. And use JJTree to create AST after parsing. JJTree creates nodes of the AST and you can configure JavaCC options to capture tokens in the node – i.e. if you want each node to contain start and end tokens. The default code generated by JavaCC creates Token class with offsets that are relative to the starting offset of the line. It has fields like beginLine, beginColumn, endLine and endColumn. Here the line numbers are absolute line numbers (starting from 1) and column fields contain offsets (again starting from 1) within the correspondingĀ lines.
However many times you want to capture absolute offsets of tokens in the input stream, and not just relative offset in the line. I wish there was a JavaCC option to enable this. But it is not too complex if you want to do it yourself.
To explain how to do this, I will take a grammer file that is generated by the JavaCC wizard of JavaCC Eclipse plugin. This is the default grammer file it generates – Continue reading “Capturing absolute offsets for JavaCC/JJTree tokens”