[racket] [ragg] omit escaped tokens from syntax

From: Lorenz Köhl (rainbowtwigs at gmail.com)
Date: Wed Jan 23 12:29:18 EST 2013

The idea is to introduce a kind of soft skip for tokens. Ragg will accept them in the grammar but leave them out of the syntax object for the production.

I've had the opportunity to discover that it's really hard to parse syslog messages properly. Especially since I want to accept the traditional and new format.

To be able to write a sensible grammar I must include whitespace in the token stream, and a couple other bytes for delimiting parts of the message.

header: "<" NUM ">" timestamp SP hostname SP app-name SP [procid] SP [msgid]
date: (STR SP NUM SP) | NUM "-" NUM "-" NUM

Also I must stitch together strings since delimiting tokens are valid in some parts

hostname: (STR|'<'|'>'|':'|'='|'['|']'|'.'|'-'|'+'|'T'|'Z')+

It would be nice to have the extraneous tokens not in the syntax for the productions. One possibility might be to use an escape character in front of them, I'll use @ here.

header: @"<" NUM @">" timestamp @SP hostname @SP app-name @SP [procid] @SP [msgid]

For the space case a #:soft-skip keyword in (token …) would work to always purge it from the result.

What do you think about this? I don't know that much about parsing yet so if there's another way to handle this I'd be interested to hear.

Lo

Posted on the users mailing list.