[racket] parsing methods for character-based invoices?

From: Robby Findler (robby at eecs.northwestern.edu)
Date: Thu May 9 16:38:12 EDT 2013

Even if it isn't that large, you may benefit from Pads, as they have a nice
way to describe the data. (Once you get it parsed, tho, they you could come
back to Racket if you wanted at that point.)

Robby


On Thu, May 9, 2013 at 3:07 PM, Matthias Felleisen <matthias at ccs.neu.edu>wrote:

>
> If you are talking about really large, really not quite properly formatted
> data sets,
> you want to look up the PADS project at
>
>  http://www.padsproj.org
>
> It's a product from ATT Labs (which is a Bell Labbs 'baby') and they
> apparently used it on their billing data.
>
> If you are looking at a few megabytes, any of our parser tools will do
> perhaps starting with 'parser-tools/'.
>
> -- Matthias
>
>
>
>
>
>
>
> On May 9, 2013, at 3:47 PM, David Vanderson <david.vanderson at gmail.com>
> wrote:
>
> > I've got character-based invoices from old systems that look roughly
> like (but much bigger):
> >
> > DATE        DESC               CREDIT   DEBIT
> > 01/01/2013  SERVICES         $1234.50
> > 01/01/2013  PAYMENT                     $1000.00
> >
> > BALANCE                  $234.50
> >
> >
> > I don't know exactly how they're formatted, so I'm working from
> examples.  My initial plan was to hand-code a dumb parser with regular
> expressions, but I suspect there's a better way.  In particular, it'd be
> nice to have some leeway as to exact positions of data, and hopefully some
> nice error reporting and recovery abilities.
> >
> > Can anyone point me towards a parsing technique that would lend itself
> to this problem?
> >
> > Thanks,
> > Dave
> > ____________________
> > Racket Users list:
> > http://lists.racket-lang.org/users
>
>
> ____________________
>   Racket Users list:
>   http://lists.racket-lang.org/users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.racket-lang.org/users/archive/attachments/20130509/0078681b/attachment.html>

Posted on the users mailing list.