[racket] parsing methods for character-based invoices?

From: Matthias Felleisen (matthias at ccs.neu.edu)
Date: Thu May 9 16:07:11 EDT 2013

If you are talking about really large, really not quite properly formatted data sets, 
you want to look up the PADS project at 

 http://www.padsproj.org

It's a product from ATT Labs (which is a Bell Labbs 'baby') and they apparently used it on their billing data. 

If you are looking at a few megabytes, any of our parser tools will do perhaps starting with 'parser-tools/'. 

-- Matthias







On May 9, 2013, at 3:47 PM, David Vanderson <david.vanderson at gmail.com> wrote:

> I've got character-based invoices from old systems that look roughly like (but much bigger):
> 
> DATE        DESC               CREDIT   DEBIT
> 01/01/2013  SERVICES         $1234.50
> 01/01/2013  PAYMENT                     $1000.00
> 
> BALANCE                  $234.50
> 
> 
> I don't know exactly how they're formatted, so I'm working from examples.  My initial plan was to hand-code a dumb parser with regular expressions, but I suspect there's a better way.  In particular, it'd be nice to have some leeway as to exact positions of data, and hopefully some nice error reporting and recovery abilities.
> 
> Can anyone point me towards a parsing technique that would lend itself to this problem?
> 
> Thanks,
> Dave
> ____________________
> Racket Users list:
> http://lists.racket-lang.org/users



Posted on the users mailing list.