[racket] mime/multipart parsing

From: Jordan Schatz (jordan at noionlabs.com)
Date: Sun Jan 8 08:59:47 EST 2012

> Was this input perhaps extracted as a part of an enclosing multi-part
> message? (Maybe not using `net/mime' for that outer message?)
The input was the result of get-pure-port, the following is the same
message, but from get-impure-port (CRLF line endings)

----------------------------------------------------------------------
HTTP/1.1 200 OK
Server: MochiWeb/1.1 WebMachine/1.9.0 (someone had painted it blue)
Expires: Fri, 06 Jan 2012 02:01:12 GMT
Date: Fri, 06 Jan 2012 01:51:12 GMT
Content-Type: multipart/mixed; boundary=9nbsYRvJBLRyuL4VOuuejw9LcAy
Content-Length: 817


--9nbsYRvJBLRyuL4VOuuejw9LcAy
Content-Type: multipart/mixed; boundary=NdzDrpIQMsJKtfv9VrXmp4YwCPh

--NdzDrpIQMsJKtfv9VrXmp4YwCPh
X-Riak-Vclock: a85hYGBgzGDKBVIcypz/fvp9087NYEpkzGNlaGCpPMGXBQA=
Location: /buckets/invoices/keys/RAQpCw8SssXlXVhiGAGYXsVmwvk
Content-Type: application/json
Link: </buckets/invoices>; rel="up"
Etag: 1qS8Wrr2vkTBxkITOjo33K
Last-Modified: Wed, 04 Jan 2012 17:12:32 GMT

{"date": "11/02/2011"}
--NdzDrpIQMsJKtfv9VrXmp4YwCPh--

--9nbsYRvJBLRyuL4VOuuejw9LcAy--

----------------------------------------------------------------------

The code I'm working on is here: 
https://github.com/shofetim/Racket-Riak/blob/master/main.rkt
Its a Racket wrapper to the NoSQL database Riak. Riak returns
multipart/mixed responses for queries that involve multiple entities:
http://wiki.basho.com/HTTP-Link-Walking.html

I've noticed two more "interesting" things. The mime library doesn't know
about several common (IANA registered I think) MIME types:
application/json and image/png for example. Also it doesn't return the
content type header so that I can judge for myself (abit of a problem
because I also use unusual MIME types like text/sexp)


On Sat, Jan 07, 2012 at 09:13:34PM -0700, Matthew Flatt wrote:
> Looking at this again, I see that `net/mime' expects a complete message
> --- header and body, but no extra prefix --- so there shouldn't be a
> "--9nbsYRvJBLR..." line before the "Content-Type" header. By itself,
> that line is essentially being ignored as an ill-formed header element.
> Adding a blank line at the start of your input makes the header empty,
> instead.
> 
> Was this input perhaps extracted as a part of an enclosing multi-part
> message? (Maybe not using `net/mime' for that outer message?)
> 
> SirMail, which has been my mail client, uses `net/mime', so I'm fairly
> confident that the library works on real messages --- no problems in
> the last decade or so.
> 
> At Sat, 7 Jan 2012 11:41:57 -0700, Jordan Schatz wrote:
> > Thank you Matthew,
> > 
> > The message I was using did have CRLF line endings, but it had one too
> > many:
> > 
> > ----------------------------------------------------------------------
> > #lang racket
> > 
> > (require net/mime)
> > 
> > (define message-string
> >   (let ([sep "\r\n"])
> >     (string-append 
> >      sep ;;This message starts with a \r\n <- EXTRA CRLF
> >      "--9nbsYRvJBLRyuL4VOuuejw9LcAy" sep
> >      "Content-Type: multipart/mixed; boundary=NdzDrpIQMsJKtfv9VrXmp4YwCPh" sep
> >      sep
> >      "--NdzDrpIQMsJKtfv9VrXmp4YwCPh" sep
> >      "X-Riak-Vclock: a85hYGBgzGDKBVIcypz/fvp9087NYEpkzGNlaGCpPMGXBQA=" sep
> >      "Location: /buckets/invoices/keys/RAQpCw8SssXlXVhiGAGYXsVmwvk" sep
> >      "Content-Type: application/json" sep
> >      "Link: </buckets/invoices>; rel='up'" sep
> >      "Etag: 1qS8Wrr2vkTBxkITOjo33K" sep
> >      "Last-Modified: Wed, 04 Jan 2012 17:12:32 GMT" sep
> >      sep
> >      "{ 'date': '11/02/2011' }" sep
> >      "--NdzDrpIQMsJKtfv9VrXmp4YwCPh--" sep)))
> > 
> > (define ip
> >   (open-input-string
> >    message-string))
> > 
> > (let* ([analyzed (mime-analyze ip)] ;; port -> #<message>
> >        [our-entity (message-entity analyzed)] ;; grab #<entity> of this message
> >        [parts (entity-parts our-entity)] ;; #<entity> -> list of (inner) 
> > #<message>
> >        [inner-message (first parts)] ;; I only have one, grab it
> >        [inner-entity (message-entity inner-message)] ;; get its #<entity> part
> >        [body-proc (entity-body inner-entity)] ;; create a proc that returns the 
> > #<entity> body
> >        [tmp (open-output-string)]) 
> >   (write (message-fields inner-message)) ;;Should be a string of headers? 
> > Actual '()
> >   (body-proc tmp) ;; call proc to get message body, it needs an output port
> >   (write (get-output-string tmp))) ;; Should be json data? Actual ""
> > ----------------------------------------------------------------------
> > 
> > It looks like the message I am working with doesn't conform to RFC. But
> > it also seems like a sane thing for the net/mime library to check for and
> > handle? If so I should be able to add it and send a pull request.
> > 
> > Shalom,
> > Jordan
> > 
> > On Sat, Jan 07, 2012 at 06:12:22AM +0100, Matthew Flatt wrote:
> > > I think the main problem is that the input string has LF newlines, and
> > > it needs to have CRLF newlines. You'll also want a terminating CRLF.
> > > 
> > > With those changes, then `(message-fields inner-message)' instead of
> > > `(entity-fields inner-entity)' will get you the headers that include
> > > "X-Riak-Vclock: ...". The result of `(entity-fields inner-entity)'
> > > would correspond to a further multipart document in the inner message.
> > > 
> > > At Fri, 6 Jan 2012 20:17:21 -0700, Jordan Schatz wrote:
> > > > I'm having difficulties parsing mime multipart messages (probably I
> > > > missed something in the docs again). I have this code:
> > > > 
> > > > ----------------------------------------------------------------------
> > > > #lang racket
> > > > 
> > > > (require net/mime)
> > > > 
> > > > (define ip
> > > >   (open-input-string
> > > >    "--9nbsYRvJBLRyuL4VOuuejw9LcAy
> > > > Content-Type: multipart/mixed; boundary=NdzDrpIQMsJKtfv9VrXmp4YwCPh
> > > > 
> > > > --NdzDrpIQMsJKtfv9VrXmp4YwCPh
> > > > X-Riak-Vclock: a85hYGBgzGDKBVIcypz/fvp9087NYEpkzGNlaGCpPMGXBQA=
> > > > Location: /buckets/invoices/keys/RAQpCw8SssXlXVhiGAGYXsVmwvk
> > > > Content-Type: application/json
> > > > Link: </buckets/invoices>; rel='up'
> > > > Etag: 1qS8Wrr2vkTBxkITOjo33K
> > > > Last-Modified: Wed, 04 Jan 2012 17:12:32 GMT
> > > > 
> > > > { 'date': '11/02/2011' }
> > > > --NdzDrpIQMsJKtfv9VrXmp4YwCPh--"))
> > > > 
> > > > (let* ([analyzed (mime-analyze ip)] ;; port -> #<message>
> > > >        [our-entity (message-entity analyzed)] ;; grab #<entity> of this 
> > message
> > > >        [parts (entity-parts our-entity)] ;; #<entity> -> list of (inner) 
> > > > #<message>
> > > >        [inner-message (first parts)] ;; I only have one, grab it
> > > >        [inner-entity (message-entity inner-message)] ;; get its #<entity> 
> > part
> > > >        [body-proc (entity-body inner-entity)] ;; create a proc that returns 
> > the 
> > > > #<entity> body
> > > >        [tmp (open-output-string)]) 
> > > >   (write (entity-fields inner-entity)) ;;Should be a list of string of 
> > headers? 
> > > > Actual '()
> > > >   (body-proc tmp) ;; call proc to get message body, it needs an output port
> > > >   (write (get-output-string tmp))) ;; Should be json data? Actual ""
> > > > ----------------------------------------------------------------------
> > > > 
> > > > I thought it would write the headers, and message body, but instead I get
> > > > an empty list, and an empty string. I've been at it for a few hours and I
> > > > don't see what is wrong....
> > > > 
> > > > Thanks : )
> > > > Jordan
> > > > ____________________
> > > >   Racket Users list:
> > > >   http://lists.racket-lang.org/users


Posted on the users mailing list.