[plt-scheme] parser tools

From: Laurent (laurent.orseau at gmail.com)
Date: Tue Nov 10 02:19:06 EST 2009

2009/11/9 Ivanyi Peter <pivanyi at freemail.hu>

>
> > But I recently made a simple quick-and-dirty text parser tool:
> > http://planet.plt-scheme.org/package-source/orseau/lazy-doc.plt
> > /1/6/planet-docs/manual/simple-parser.html
>
> Thanks, this seems to work.
> Now I do not understand one thing. Maybe this is not Scheme
> related, but from the following code I would expect:
> "\nsomething\n{#[#aa,#bb#]\n}\n"
> but I get:
> "\nsomething\n{\n#[\n#aa,#bb\n#]\n}\n"
>
> I thought the start-keyword would match zero or many new-line
> AND one or many spaces. What do I do wrong?
>

This is my bad.
For some reason, I had chosen to cut the text into lines (so that reading
files line by line would have the same behavior as reading it by chunks) and
to separate the end-of-lines from the rest of the text, so in fact:
"
something
{
 [
       aa, bb
 ]
}
"
is turned into :
'("\n" "something" "\n" "{" "\n"  "[" "\n" "       aa, bb" "\n"  "]" "\n"
"}" "\n")
which is parsed chunk by chunk.

This also addresses a speed issue, because with big parsers, multiple
regexps in parallel can be quite greedy, especially on long text files if
the text is not split into lines.

I should at least document that.

Let me know if this makes things difficult for you, I'll try to find a
workaround. Though for my own purposes, this behavior has not been much of a
problem.

Laurent


>
> Thanks,
>
> Peter Ivanyi
>
> ----------------------------------------------------------
> #lang scheme
>
> (require (planet orseau/lazy-doc:1:6/simple-parser))
>
> (define start-keyword "\n* +")
>
> (let ([block-parser (new-parser #:phase 'block-keywords)])
>      (add-items
>       block-parser
>       ('block-keywords (start-keyword "#"))
>      )
>  (parse-text block-parser "
> something
> {
>  [
>        aa, bb
>  ]
> }
> ")
> )
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.racket-lang.org/users/archive/attachments/20091110/78333932/attachment.html>

Posted on the users mailing list.