[racket] more liberal CSV parsing?
Shriram Krishnamurthi wrote at 07/30/2010 09:53 PM:
> wmic process get commandline /format:csv
>
> on Windows 7. You get lines like
>
> ROUBAIX,"C:\Program Files\Windows Sidebar\sidebar.exe" /autoRun,2940
> ROUBAIX,"C:\Users\sk\Local Settings\Apps\F.lux\flux.exe" /noshow,3048
>
In this particular case, it looks like the quotes are intended to be
part of the value of the field, and that the format that "wmic" is
writing not using any CSV quoting. So you can just do:
((make-csv-reader (open-input-string "d1,d2,\"foo\" bar,d3")
'((quote-char . #f))))
;; ==> ("d1" "d2" "\"foo\" bar" "d3")
I think that's the parse they intend for their format, and is actually
helpful for separately parsing the command line. I suspect that this
command line CSV field with the quotes in it came from a system call,
and that "wmic" simply wrote that string verbatim, with a comma before
and after.
Note that we could change the *default* CSV reader to handle this
particular example, by having it fall back to putting the quotes back
into the value if it sees junk after the end of what it thought was a
CSV-quoted field. But that would be a kludge that would fail in some
other cases. For the "csv" reusable library, I lean towards giving the
programmer a heads-up that the format is really not something that the
default CSV reader is likely to parse reliably.
> Mmph: it looks like the output may be just plain broken. I see
> another line that looks like
>
> ROUBAIX,cthelper 49170 xterm :erase=^?:size=24,80,4064
>
> which looks like it has one field too many....
>
Good catch. I think that parsing this particular output with regexps or
simpler string operations (to separate into fields at only the first and
last commas) might give best results, since it looks like Microsoft is
incorrectly not quoting their CSV fields at all in this case.
--
http://www.neilvandyke.org/