[plt-scheme] Help with MIME encodings

From: John Clements (clements at brinckerhoff.org)
Date: Fri May 6 15:12:22 EDT 2005

This may be a bug in the net collection... Then again, it may not be. I  
need help from someone more familiar with MIME conventions than I.

Here's the problem: I have a form with a "browse" field for students to  
submit their homework.  Let's say that the file they're submitting is a  
stuffit archive:

burundi:/tmp clements$ od -c foo.sit
0000000    S   t   u   f   f   I   t       (   c   )   1   9   9   7   -
0000020    2   0   0   2       A   l   a   d   d   i   n       S   y   s
0000040    t   e   m   s   ,       I   n   c   .   ,       h   t   t   p
0000060    :   /   /   w   w   w   .   a   l   a   d   d   i   n   s   y
0000100    s   .   c   o   m   /   S   t   u   f   f   I   t   /  \r  \n
0000120  032  \0 005 020  \0  \0  \0 344  \0  \0  \0   r  \0 001  \0  \0
0000140   \0   r   - 215  \r 245 245   R   e   s   e   r   v   e   d 245
0000160  245  \0 245 245 245 245 001  \0  \0   3  \0  \0 276 241   Z 335
0000200  276 241   Z 340  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
0000220   \0 003   w 361  \0  \0  \0 026  \0  \0  \0 033  \0  \0  \0  \0
0000240  017  \0   f   o   o  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
0000260   \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
0000300   \0  \0  \0  \0  \0  \0  \0  \0  \0   B 301 325 001   3 264   0
0000320  364 257   D   u 005   G   ! 330 001 313   =   9 255 323   [ 374
0000340  241 177 240 200
0000344

(looks much nicer if you're using a fixed-width font)

If you look carefully, you'll see that there's a CRLF sequence in  
there.  The cgi script base-64-encodes the whole thing, then (later) I  
base-64-decode the whole thing.  I'm attaching the file that is the  
result of this.  Here's what it looks like in the terminal:

burundi:/tmp clements$ less decoded-stream
Content-type: multipart/form-data; boundary=----------0xKhTmLbOuNdArY

------------0xKhTmLbOuNdArY
Content-Disposition: form-data; name="my-email"

blath
------------0xKhTmLbOuNdArY
Content-Disposition: form-data; name="team-password"

robby
------------0xKhTmLbOuNdArY
Content-Disposition: form-data; name="assignment-number"

2
------------0xKhTmLbOuNdArY
Content-Disposition: form-data; name="file"; filename="foo.sit"
Content-Type: application/x-stuffit

StuffIt (c)1997-2002 Aladdin Systems, Inc.,  
http://www.aladdinsys.com/StuffIt/
^Z^@^E^P^@^@^@<E4>^@^@^@r^@^A^@^@^@r- 
<8D>^M<A5><A5>Reserved<A5><A5>^@<A5><A5><A5><A5>^A^@^@3^@^@<BE>
<A1>Z<DD><BE><A1>Z<E0>^@^@^@^@^@^@^@^@^@^@^@^@^@^Cw<F1>^@^@^@^V^@^@^@ESC 
^@^@^@^@^O^@foo^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@B<C1><D5>^A3<B 
4>0<F4><AF>Du^EG!<D8>^A<CB>=9
<AD><D3>[<FC><A1>^?<A0><80>
------------0xKhTmLbOuNdArY--
burundi:/tmp clements$


... of course, in this form you can't see the CRLF sequence in the  
contained file, but it's there just fine.  The problem is that when I  
decode it using functions in the net collection, the CRLF gets turned  
into a newline.  Here's my code:

(module net-bug mzscheme
   (require (lib "mime.ss" "net"))

   (let* ([port (open-input-file "/tmp/decoded-stream")])
     (call-with-output-file
      "/tmp/decoded-file"
      (lambda (out-port)
        ((entity-body (message-entity (list-ref (entity-parts  
(message-entity (mime-analyze port))) 3))) out-port))
      'truncate)))

... and here's what decoded-file looks like:

burundi:/tmp clements$ od -c decoded-file
0000000    S   t   u   f   f   I   t       (   c   )   1   9   9   7   -
0000020    2   0   0   2       A   l   a   d   d   i   n       S   y   s
0000040    t   e   m   s   ,       I   n   c   .   ,       h   t   t   p
0000060    :   /   /   w   w   w   .   a   l   a   d   d   i   n   s   y
0000100    s   .   c   o   m   /   S   t   u   f   f   I   t   /  \n 032
0000120   \0 005 020  \0  \0  \0 344  \0  \0  \0   r  \0 001  \0  \0  \0
0000140    r   - 215  \r 245 245   R   e   s   e   r   v   e   d 245 245
0000160   \0 245 245 245 245 001  \0  \0   3  \0  \0 276 241   Z 335 276
0000200  241   Z 340  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
0000220  003   w 361  \0  \0  \0 026  \0  \0  \0 033  \0  \0  \0  \0 017
0000240   \0   f   o   o  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
0000260   \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
0000300   \0  \0  \0  \0  \0  \0  \0  \0   B 301 325 001   3 264   0 364
0000320  257   D   u 005   G   ! 330 001 313   =   9 255 323   [ 374 241
0000340  177 240 200  \n
0000344

Note that the CRLF got turned into a newline, and there's a trailing  
newline.  This looks like a bug in the net collection to me, but  
lacking intimate knowledge of the MIME standards, it could be that the  
file I attach is NOT the right series of bytes to encode the file I  
want, implying an error earlier in the signal chain

Any suggestions much appreciated.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: decoded-stream
Type: application/octet-stream
Size: 736 bytes
Desc: not available
URL: <http://lists.racket-lang.org/users/archive/attachments/20050506/17c52f70/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2430 bytes
Desc: not available
URL: <http://lists.racket-lang.org/users/archive/attachments/20050506/17c52f70/attachment.p7s>

Posted on the users mailing list.