Discussion:
Parse tree difference between ANTLR 3.2 and 3.3/4
Thiruvel Thirumoolan
2012-11-30 14:56:54 UTC
Permalink
Hello,

I work on Apache Hive and it currently uses antlr 3.0.1. We would like to upgrade to antlr 3.4 so its easy to work with other Apache projects on Hadoop that use antlr 3.4. We found that the parse tree generated from Hive.g [1] is different with 3.0.1/3.1/3.2 and 3.3/3.4.

I have stripped down the lengthy grammar and created a smaller version (Insert.g [2]). I have pushed a small mvn v3 project to https://github.com/thiruvel/HiveANTLR34Issue that uses ANTLR in a way Hive uses it. Here is the tree difference and the entire output is on github. One can run "mvn test" to simulate it.

Antlr 3.0.1/3.1/3.2:

( TOK_DESTINATION( TOK_TAB( TOK_TABNAME( TABLE_X))( TOK_PARTSPEC( TOK_PARTVAL( DIM_1)( 'A'))( TOK_PARTVAL( DIM_2)( 'B')))))

Antlr 3.3/3.4:

( TOK_DESTINATION( TOK_TAB))


Are we missing something in the grammar or is this a bug addressed in v4? I am afraid we can't move to v4 as that would mean moving all other projects to v4. Are there any workarounds that we can use with antlr 3.4 to ensure a similar Tree is generated?

Any help is greatly appreciated.

Thank You!
Thiruvel

[1] - http://svn.apache.org/repos/asf/hive/branches/branch-0.9/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g
[2] - https://github.com/thiruvel/HiveANTLR34Issue/blob/master/src/main/antlr3/com/yahoo/antlr/Insert.g

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
Jim Idle
2012-12-03 02:13:40 UTC
Permalink
With just a quick glance at your sample grammar I think that your issues is
only that some of your rules are using rewrite rules ( -> ) but they are
only being used on say 1 out of 2 alts. IIRC, if you use rewrite rules on
one alt of a rule, you must use them on all the others too.

JIm


On Fri, Nov 30, 2012 at 10:56 PM, Thiruvel Thirumoolan <
Post by Thiruvel Thirumoolan
Hello,
I work on Apache Hive and it currently uses antlr 3.0.1. We would like to
upgrade to antlr 3.4 so its easy to work with other Apache projects on
Hadoop that use antlr 3.4. We found that the parse tree generated from
Hive.g [1] is different with 3.0.1/3.1/3.2 and 3.3/3.4.
I have stripped down the lengthy grammar and created a smaller version
(Insert.g [2]). I have pushed a small mvn v3 project to
https://github.com/thiruvel/HiveANTLR34Issue that uses ANTLR in a way
Hive uses it. Here is the tree difference and the entire output is on
github. One can run "mvn test" to simulate it.
( TOK_DESTINATION( TOK_TAB( TOK_TABNAME( TABLE_X))( TOK_PARTSPEC(
TOK_PARTVAL( DIM_1)( 'A'))( TOK_PARTVAL( DIM_2)( 'B')))))
( TOK_DESTINATION( TOK_TAB))
Are we missing something in the grammar or is this a bug addressed in v4?
I am afraid we can't move to v4 as that would mean moving all other
projects to v4. Are there any workarounds that we can use with antlr 3.4 to
ensure a similar Tree is generated?
Any help is greatly appreciated.
Thank You!
Thiruvel
[1] -
http://svn.apache.org/repos/asf/hive/branches/branch-0.9/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g
[2] -
https://github.com/thiruvel/HiveANTLR34Issue/blob/master/src/main/antlr3/com/yahoo/antlr/Insert.g
List: http://www.antlr.org/mailman/listinfo/antlr-interest
http://www.antlr.org/mailman/options/antlr-interest/your-email-address
List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
Thiruvel Thirumoolan
2012-12-03 12:41:15 UTC
Permalink
Hi Jim,

Thanks for your response, I guess the formatting was misleading. I have
reformatted and this should help.

After some poking around, a small change to the grammar provides a
consistent tree between 3.2 and 3.3. Is this the original mistake we had?

Suspected rule:

destination:
...
| KW_TABLE tableOrPartition
-> ^(tableOrPartition)


Modified rule:


destination:
...
| KW_TABLE tableOrPartition
-> ^(TOK_TAB_OR_PART tableOrPartition)

After adding a token to the rewrite rule, I was able to see a consistent
tree. I will have to change AST parsing code in Hive obviously, a little
involved. But is this the bug in the grammar? I could not find any
incompatible change in ANTLR 3.3 release notes regarding this [1] [only
debug related incompatible change].


[1] - http://www.antlr.org/wiki/display/ANTLR3/ANTLR+3.3+Release+Notes


Thanks!
Thiruvel
Post by Jim Idle
With just a quick glance at your sample grammar I think that your issues is
only that some of your rules are using rewrite rules ( -> ) but they are
only being used on say 1 out of 2 alts. IIRC, if you use rewrite rules on
one alt of a rule, you must use them on all the others too.
JIm
On Fri, Nov 30, 2012 at 10:56 PM, Thiruvel Thirumoolan <
Post by Thiruvel Thirumoolan
Hello,
I work on Apache Hive and it currently uses antlr 3.0.1. We would like to
upgrade to antlr 3.4 so its easy to work with other Apache projects on
Hadoop that use antlr 3.4. We found that the parse tree generated from
Hive.g [1] is different with 3.0.1/3.1/3.2 and 3.3/3.4.
I have stripped down the lengthy grammar and created a smaller version
(Insert.g [2]). I have pushed a small mvn v3 project to
https://github.com/thiruvel/HiveANTLR34Issue that uses ANTLR in a way
Hive uses it. Here is the tree difference and the entire output is on
github. One can run "mvn test" to simulate it.
( TOK_DESTINATION( TOK_TAB( TOK_TABNAME( TABLE_X))( TOK_PARTSPEC(
TOK_PARTVAL( DIM_1)( 'A'))( TOK_PARTVAL( DIM_2)( 'B')))))
( TOK_DESTINATION( TOK_TAB))
Are we missing something in the grammar or is this a bug addressed in v4?
I am afraid we can't move to v4 as that would mean moving all other
projects to v4. Are there any workarounds that we can use with antlr 3.4 to
ensure a similar Tree is generated?
Any help is greatly appreciated.
Thank You!
Thiruvel
[1] -
http://svn.apache.org/repos/asf/hive/branches/branch-0.9/ql/src/java/org/
apache/hadoop/hive/ql/parse/Hive.g
[2] -
https://github.com/thiruvel/HiveANTLR34Issue/blob/master/src/main/antlr3/
com/yahoo/antlr/Insert.g
List: http://www.antlr.org/mailman/listinfo/antlr-interest
http://www.antlr.org/mailman/options/antlr-interest/your-email-address
List: http://www.antlr.org/mailman/listinfo/antlr-interest
http://www.antlr.org/mailman/options/antlr-interest/your-email-address
List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
Cedric Berger
2012-12-03 12:48:49 UTC
Permalink
Mailman looks dead....

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe:
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
Terence Parr
2012-12-03 16:32:10 UTC
Permalink
Hi. i'll be killing the list within a week. not even sure *I* can unsub somebody on that list! ;)
T
Post by Cedric Berger
Mailman looks dead....
List: http://www.antlr.org/mailman/listinfo/antlr-interest
http://www.antlr.org/mailman/options/antlr-interest/your-email-address
List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
Loading...