how to skip/read next n Characters (n is read form input)

Discussion:

Thomas Ruschival

2012-10-31 16:47:00 UTC

I am a humble EE with little grammar experience, please forgive my
ignorance and give me a hint how professionals would do the trick.

I came up with a grammar for detecting commands "escape-sequences" in a
input text (for a UnifiedPOS printer)
that reads numbers and boolean argumets for escape sequence commands
from the input stream.
I can read numeric arguments and use them as function parameters, which
function to be called is parsed correctly.
For instance "ESC|#rF" means "print feed revers # lines"

The question is how to treat "ESC|#E" which means "send the next #
bytes untreated to the pinter" in other words:

How can I use a number N that I detected on the input stream to read
and consume the next N characters
'un-lexed' and 'un-parsed' as string/byte array?

I was thinking using something like this in a parse action using the
'input' member of the parser:

for (int i=0; i<N; i++){
output.append(input.LA(1));
input.consume();
}

But it doesn't seem very professional to me. Furthermore this gives me
tokens and not plain bytes....
Can you give me a hint?

Thanks in advance
Thomas

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address

Juancarlo Añez

2012-11-01 01:19:55 UTC

Permalink

Thomas,

ANTLR may be overkill or inadequate for what you're doing.

I think you'd be better of with a program with a main loop that dispatches
to different functions based on the escape code. Each function can affect
the input position, or do anything else it pleases. It would be
a handcrafted state machine.

You can do this in Python or any of the friendly languages.

Cheers,

-- Juancarlo

Post by Thomas Ruschival
I am a humble EE with little grammar experience, please forgive my
ignorance and give me a hint how professionals would do the trick.
I came up with a grammar for detecting commands "escape-sequences" in a
input text (for a UnifiedPOS printer)
that reads numbers and boolean argumets for escape sequence commands
from the input stream.
I can read numeric arguments and use them as function parameters, which
function to be called is parsed correctly.
For instance "ESC|#rF" means "print feed revers # lines"
The question is how to treat "ESC|#E" which means "send the next #
How can I use a number N that I detected on the input stream to read
and consume the next N characters
'un-lexed' and 'un-parsed' as string/byte array?
I was thinking using something like this in a parse action using the
for (int i=0; i<N; i++){
output.append(input.LA(1));
input.consume();
}
But it doesn't seem very professional to me. Furthermore this gives me
tokens and not plain bytes....
Can you give me a hint?
Thanks in advance
Thomas
List: http://www.antlr.org/mailman/listinfo/antlr-interest
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

--
Juancarlo *Añez*

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: http://www.antlr.org/mailman/option

Thomas Ruschival

2012-11-02 13:48:26 UTC

Permalink

Thanks alot Juancarlo,
I knew I could do it in one of my favourite languages, but thought
'Maybe this is the point where I should start using grammars'....

Best Regards
Thomas

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address

c***@public.gmane.org

2012-11-02 17:51:32 UTC

Permalink

Thomas, I would use validating semantic predicate

readNchars
: NUM
(b+=CHAR)+ {$b.size()<=Integer.parseInt($NUM.text)}?
;

The idea is from Ter's book The Definitive ANTLR Reference (ANTLR v3)

Gruß Claus-Dieter

-----Ursprüngliche Nachricht-----
Von: Juancarlo Añez [mailto:***@gmail.com]
Gesendet: Donnerstag, 1. November 2012 02:20
An: Thomas Ruschival
Cc: antlr-***@antlr.org
Betreff: Re: [antlr-interest] how to skip/read next n Characters (n is read form input)

Thomas,

ANTLR may be overkill or inadequate for what you're doing.

I think you'd be better of with a program with a main loop that dispatches to different functions based on the escape code. Each function can affect the input position, or do anything else it pleases. It would be a handcrafted state machine.

You can do this in Python or any of the friendly languages.

Cheers,

-- Juancarlo

Post by Thomas Ruschival
I am a humble EE with little grammar experience, please forgive my
ignorance and give me a hint how professionals would do the trick.
I came up with a grammar for detecting commands "escape-sequences" in
a input text (for a UnifiedPOS printer) that reads numbers and boolean
argumets for escape sequence commands from the input stream.
I can read numeric arguments and use them as function parameters,
which function to be called is parsed correctly.
For instance "ESC|#rF" means "print feed revers # lines"
The question is how to treat "ESC|#E" which means "send the next #
How can I use a number N that I detected on the input stream to read
and consume the next N characters 'un-lexed' and 'un-parsed' as
string/byte array?
I was thinking using something like this in a parse action using the
for (int i=0; i<N; i++){
output.append(input.LA(1));
input.consume();
}
But it doesn't seem very professional to me. Furthermore this gives me
tokens and not plain bytes....
Can you give me a hint?
Thanks in advance
Thomas
List: http://www.antlr.org/mailman/listinfo/antlr-interest
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

--
Juancarlo *Añez*

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/

Thomas-MBvru5FtY+

2012-11-05 13:05:32 UTC

Permalink

Thanks Claus-Dieter,
it seems to work at least in the small test grammar I played around
with. So far I didn't use semantic predicates, mainly because I don't
fully understand them and many people warn about side-effects. Anyways I
will try to integrate it and have our purchaseing department get a copy
of Terence's book.

Thanks again
Thomas

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: http://www.antlr.org/mailma

Thomas Ruschival

2012-11-14 20:53:00 UTC

Permalink

Hi,
finally I got a (Kindle-)copy of the ANTLR reference and read some
chapters.
The cited example generates NullPointerExceptions:

Looking through the generated code I realized that 'b' is translated in
Java to a List b_list = null;
b_list is never initialized to be a valid object. (no b_list = new
ArrayList(); anywhere in the readNchars method)

This issue I fixed manually in the generated code...

The second issue came up during runtime: an EarlyExitException is
thrown.
According to the reference is occurs if "The recognizer did not match
anything for a (..)+ loop."
This is (at least for me) quite odd. Since I also tried to match
(b+=.)+ as well as (b+=CHAR)+

Best regards
Thomas

Post by c***@public.gmane.org
Thomas, I would use validating semantic predicate
readNchars
: NUM
(b+=CHAR)+ {$b.size()<=Integer.parseInt($NUM.text)}?
;
The idea is from Ter's book The Definitive ANTLR Reference (ANTLR v3)
Gruß Claus-Dieter
-----Ursprüngliche Nachricht-----
Gesendet: Donnerstag, 1. November 2012 02:20
An: Thomas Ruschival
Betreff: Re: [antlr-interest] how to skip/read next n Characters (n
is read form input)
Thomas,
ANTLR may be overkill or inadequate for what you're doing.
I think you'd be better of with a program with a main loop that
dispatches to different functions based on the escape code. Each
function can affect the input position, or do anything else it
pleases. It would be a handcrafted state machine.
You can do this in Python or any of the friendly languages.
Cheers,
-- Juancarlo
On Wed, Oct 31, 2012 at 12:17 PM, Thomas Ruschival

Post by Thomas Ruschival
I am a humble EE with little grammar experience, please forgive my
ignorance and give me a hint how professionals would do the trick.
I came up with a grammar for detecting commands "escape-sequences"
in
a input text (for a UnifiedPOS printer) that reads numbers and
boolean
argumets for escape sequence commands from the input stream.
I can read numeric arguments and use them as function parameters,
which function to be called is parsed correctly.
For instance "ESC|#rF" means "print feed revers # lines"
The question is how to treat "ESC|#E" which means "send the next #
How can I use a number N that I detected on the input stream to read
and consume the next N characters 'un-lexed' and 'un-parsed' as
string/byte array?
I was thinking using something like this in a parse action using the
for (int i=0; i<N; i++){
output.append(input.LA(1));
input.consume();
}
But it doesn't seem very professional to me. Furthermore this gives me
tokens and not plain bytes....
Can you give me a hint?
Thanks in advance
Thomas
List: http://www.antlr.org/mailman/listinfo/antlr-interest
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

--
Juancarlo *Añez*

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: http://www.antlr.org/mailman/options/antlr-i

c***@public.gmane.org

2012-11-15 08:46:35 UTC

Permalink

Hi Thomas,
the sample from ANTLR v3 reference is not (yet?) running with ANTLR v4.0b3.
(b+=CHAR)+ {$b.size()<=Integer.parseInt($NUM.text)}?

I see that {System.out.println(((StartContext)_localctx).b.size());}
is a workaround for the counter but synpred is not working for me too.

It would be nice getting therefor a solution from Ter in the final Version.

Regards Claus-Dieter

-----Ursprüngliche Nachricht-----
Von: Thomas Ruschival [mailto:***@ruschival.de]
Gesendet: Mittwoch, 14. November 2012 21:53
An: ***@t-online.de
Cc: 'Juancarlo Añez'; antlr-***@antlr.org
Betreff: Re: AW: [antlr-interest] how to skip/read next n Characters (n is read form input)

Hi,
finally I got a (Kindle-)copy of the ANTLR reference and read some chapters.
The cited example generates NullPointerExceptions:

Looking through the generated code I realized that 'b' is translated in Java to a List b_list = null; b_list is never initialized to be a valid object. (no b_list = new ArrayList(); anywhere in the readNchars method)

This issue I fixed manually in the generated code...

The second issue came up during runtime: an EarlyExitException is thrown.
According to the reference is occurs if "The recognizer did not match anything for a (..)+ loop."
This is (at least for me) quite odd. Since I also tried to match (b+=.)+ as well as (b+=CHAR)+

Best regards
Thomas

Post by c***@public.gmane.org
Thomas, I would use validating semantic predicate
readNchars
: NUM
(b+=CHAR)+ {$b.size()<=Integer.parseInt($NUM.text)}?
;
The idea is from Ter's book The Definitive ANTLR Reference (ANTLR v3)
Gruß Claus-Dieter
-----Ursprüngliche Nachricht-----
Gesendet: Donnerstag, 1. November 2012 02:20
An: Thomas Ruschival
Betreff: Re: [antlr-interest] how to skip/read next n Characters (n is
read form input)
Thomas,
ANTLR may be overkill or inadequate for what you're doing.
I think you'd be better of with a program with a main loop that
dispatches to different functions based on the escape code. Each
function can affect the input position, or do anything else it
pleases. It would be a handcrafted state machine.
You can do this in Python or any of the friendly languages.
Cheers,
-- Juancarlo
On Wed, Oct 31, 2012 at 12:17 PM, Thomas Ruschival

Post by Thomas Ruschival
I am a humble EE with little grammar experience, please forgive my
ignorance and give me a hint how professionals would do the trick.
I came up with a grammar for detecting commands "escape-sequences"
in
a input text (for a UnifiedPOS printer) that reads numbers and
boolean argumets for escape sequence commands from the input stream.
I can read numeric arguments and use them as function parameters,
which function to be called is parsed correctly.
For instance "ESC|#rF" means "print feed revers # lines"
The question is how to treat "ESC|#E" which means "send the next #
How can I use a number N that I detected on the input stream to read
and consume the next N characters 'un-lexed' and 'un-parsed' as
string/byte array?
I was thinking using something like this in a parse action using the
for (int i=0; i<N; i++){
output.append(input.LA(1));
input.consume();
}
But it doesn't seem very professional to me. Furthermore this gives
me tokens and not plain bytes....
Can you give me a hint?
Thanks in advance
Thomas
List: http://www.antlr.org/mailman/listinfo/antlr-interest
http://www.antlr.org/mailman/options/antlr-interest/your-email-addres
s

--
Juancarlo *Añez*

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: http://www.antlr.org/mailman/options/antlr-int

Thomas Ruschival

2012-11-15 20:17:14 UTC

Permalink

Hi Claus-Dieter,
I am running antlr v3 with anltworks 1.4.3 and java 1.7.0_1.

It seems the problem has 2 aspects.
1st) no list object is created, only declared (List b_list = null;)
2nd) Nothing matches, not (b+=CHAR)+, not (b+=CHAR|DIGIT|'E')+ not even
not (b+=.)+

What really surprised me (remember I am still newbie) is that ANTLR
generates Lexer tokens for characters form parser rules and that these
tokens do no longer match the generic CHAR lexer rule.
For instance the 'C' or 'E' from the grammar below do not match if I
replace (val=.) by (val=CHAR) in the rule print.

Best Regards
Thomas

I stripped down my grammar to share:

grammar JPosEscape;
@members{
private boolean escBool = false;
}

DIGIT :
('0'..'9');

CHAR :
('\u0000'..'\u7fff');

escapecmd :
( bold
| underline
| readNchars
);

bold :
'!'?
{
escBool = false;
}
('b' 'C')
{
System.out.println("Selecting Bold:" + escBool);
};

underline :
ullines = (DIGIT)*
'!'?
{
escBool = false;
}
('u' 'C')
{
System.out.println("Selecting UNDERLINE:" + escBool +
"using # "+ $ullines.text + "lines");
};

readNchars :
NUMBER = DIGIT+ 'E'
(
b += ( CHAR|DIGIT|'E' )
)+
{$b.size() <= Integer.parseInt($NUMBER.text)}?
{
System.out.println("
readNChars #" + $b.size());
};

expr :
print*;

print :
('@' '|') => ('@' '|')
escapecmd
| (val = . )
{
System.out.print($val.text);
};

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address

Jim Idle

2012-11-16 02:31:16 UTC

Permalink

- I suggest that you take a step back and read through all the examples.
- Read the getting started Wiki articles
- antlr.markmail.org may well be your new friend.
- When you are new to the game, I suggest that you do not use 'literals'
in your parser grammars.
- val=. In a parser will capture every token, not any character, hence it
is not usually useful

Jim

-----Original Message-----
From: antlr-interest-bounces-***@public.gmane.org
[mailto:antlr-interest-bounces-***@public.gmane.org] On Behalf Of Thomas Ruschival
Sent: Friday, November 16, 2012 4:17 AM
To: cd.barth-***@public.gmane.org
Cc: antlr-interest-***@public.gmane.org
Subject: Re: [antlr-interest] how to skip/read next n Characters (n is
read form input)

Hi Claus-Dieter,
I am running antlr v3 with anltworks 1.4.3 and java 1.7.0_1.

It seems the problem has 2 aspects.
1st) no list object is created, only declared (List b_list = null;)
2nd) Nothing matches, not (b+=CHAR)+, not (b+=CHAR|DIGIT|'E')+ not even
not (b+=.)+

What really surprised me (remember I am still newbie) is that ANTLR
generates Lexer tokens for characters form parser rules and that these
tokens do no longer match the generic CHAR lexer rule.
For instance the 'C' or 'E' from the grammar below do not match if I
replace (val=.) by (val=CHAR) in the rule print.

Best Regards
Thomas

I stripped down my grammar to share:

grammar JPosEscape;
@members{
private boolean escBool = false;
}

DIGIT :
('0'..'9');

CHAR :
('\u0000'..'\u7fff');

escapecmd :
( bold
| underline
| readNchars
);

bold :
'!'?
{
escBool = false;
}
('b' 'C')
{
System.out.println("Selecting Bold:" + escBool);
};

underline :
ullines = (DIGIT)*
'!'?
{
escBool = false;
}
('u' 'C')
{
System.out.println("Selecting UNDERLINE:" + escBool +
"using # "+ $ullines.text + "lines");
};

readNchars :
NUMBER = DIGIT+ 'E'
(
b += ( CHAR|DIGIT|'E' )
)+
{$b.size() <= Integer.parseInt($NUMBER.text)}?
{
System.out.println("
readNChars #" + $b.size());
};

expr :
print*;

print :
('@' '|') => ('@' '|')
escapecmd
| (val = . )
{
System.out.print($val.text);
};

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe:
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address

Thomas Ruschival

2012-11-18 21:54:33 UTC

Permalink

Hi Jim,
thanks for the hint about anlr.markmail.org, I wasn't aware of this
userfriendly interface!
For sure I am working my way through terence's book and the mailing list.

Post by Jim Idle
- When you are new to the game, I suggest that you do not use
'literals' in your parser grammars.

I started off with a textbook approach defining all token-rules for the
lexer and used only tokens in the parser rules. (btw. The initial
grammar worked well)
for instance I had defined:
BOLD : 'b' 'C'; // Lexer ruler
However since the lexer is greedy it will always generate a BOLD token
for the sequence 'bC'. Which is per-se not a problem until I was
confronted with the rule to recognize n following bytes (original post).

Post by Jim Idle
- val=. In a parser will capture every token, not any character, hence

it is not usually useful
I know, my workaround was to only have characters as tokens, so every
token had the length of 1 character, to not mess up the counter in the
readNchars rule.

Terence shared an antlr 4 grammar example (Data.g4) I will look into it
but since for the initial real-world problem I am bound to antlr v3
(can't use java 1.6) it serves only for my academic interest. (and
future projects)

Best regards
Thomas

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address

Terence Parr

2012-11-15 16:34:47 UTC

Permalink

here's a similar example from the book:

http://media.pragprog.com/titles/tpantlr2/code/tour/Data.g4
Ter

Post by c***@public.gmane.org
Hi Thomas,
the sample from ANTLR v3 reference is not (yet?) running with ANTLR v4.0b3.
(b+=CHAR)+ {$b.size()<=Integer.parseInt($NUM.text)}?
I see that {System.out.println(((StartContext)_localctx).b.size());}
is a workaround for the counter but synpred is not working for me too.
It would be nice getting therefor a solution from Ter in the final Version.
Regards Claus-Dieter
-----Ursprüngliche Nachricht-----
Gesendet: Mittwoch, 14. November 2012 21:53
Betreff: Re: AW: [antlr-interest] how to skip/read next n Characters (n is read form input)
Hi,
finally I got a (Kindle-)copy of the ANTLR reference and read some chapters.
Looking through the generated code I realized that 'b' is translated in Java to a List b_list = null; b_list is never initialized to be a valid object. (no b_list = new ArrayList(); anywhere in the readNchars method)
This issue I fixed manually in the generated code...
The second issue came up during runtime: an EarlyExitException is thrown.
According to the reference is occurs if "The recognizer did not match anything for a (..)+ loop."
This is (at least for me) quite odd. Since I also tried to match (b+=.)+ as well as (b+=CHAR)+
Best regards
Thomas

Post by c***@public.gmane.org
Thomas, I would use validating semantic predicate
readNchars
: NUM
(b+=CHAR)+ {$b.size()<=Integer.parseInt($NUM.text)}?
;
The idea is from Ter's book The Definitive ANTLR Reference (ANTLR v3)
Gruß Claus-Dieter
-----Ursprüngliche Nachricht-----
Gesendet: Donnerstag, 1. November 2012 02:20
An: Thomas Ruschival
Betreff: Re: [antlr-interest] how to skip/read next n Characters (n is
read form input)
Thomas,
ANTLR may be overkill or inadequate for what you're doing.
I think you'd be better of with a program with a main loop that
dispatches to different functions based on the escape code. Each
function can affect the input position, or do anything else it
pleases. It would be a handcrafted state machine.
You can do this in Python or any of the friendly languages.
Cheers,
-- Juancarlo
On Wed, Oct 31, 2012 at 12:17 PM, Thomas Ruschival

Post by Thomas Ruschival
I am a humble EE with little grammar experience, please forgive my
ignorance and give me a hint how professionals would do the trick.
I came up with a grammar for detecting commands "escape-sequences"
in
a input text (for a UnifiedPOS printer) that reads numbers and
boolean argumets for escape sequence commands from the input stream.
I can read numeric arguments and use them as function parameters,
which function to be called is parsed correctly.
For instance "ESC|#rF" means "print feed revers # lines"
The question is how to treat "ESC|#E" which means "send the next #
How can I use a number N that I detected on the input stream to read
and consume the next N characters 'un-lexed' and 'un-parsed' as
string/byte array?
I was thinking using something like this in a parse action using the
for (int i=0; i<N; i++){
output.append(input.LA(1));
input.consume();
}
But it doesn't seem very professional to me. Furthermore this gives
me tokens and not plain bytes....
Can you give me a hint?
Thanks in advance
Thomas
List: http://www.antlr.org/mailman/listinfo/antlr-interest
http://www.antlr.org/mailman/options/antlr-interest/your-email-addres
s

--
Juancarlo *Añez*

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address