[racket] parsing methods for character-based invoices?
If you are talking about really large, really not quite properly formatted data sets,
you want to look up the PADS project at
http://www.padsproj.org
It's a product from ATT Labs (which is a Bell Labbs 'baby') and they apparently used it on their billing data.
If you are looking at a few megabytes, any of our parser tools will do perhaps starting with 'parser-tools/'.
-- Matthias
On May 9, 2013, at 3:47 PM, David Vanderson <david.vanderson at gmail.com> wrote:
> I've got character-based invoices from old systems that look roughly like (but much bigger):
>
> DATE DESC CREDIT DEBIT
> 01/01/2013 SERVICES $1234.50
> 01/01/2013 PAYMENT $1000.00
>
> BALANCE $234.50
>
>
> I don't know exactly how they're formatted, so I'm working from examples. My initial plan was to hand-code a dumb parser with regular expressions, but I suspect there's a better way. In particular, it'd be nice to have some leeway as to exact positions of data, and hopefully some nice error reporting and recovery abilities.
>
> Can anyone point me towards a parsing technique that would lend itself to this problem?
>
> Thanks,
> Dave
> ____________________
> Racket Users list:
> http://lists.racket-lang.org/users