[plt-scheme] MD5.SS signature (pun intended)

From: Ray Racine (ray.racine at comcast.net)
Date: Sat Sep 15 17:51:08 EDT 2007

The signature of md5.ss in lib/mzscheme does not match what I would
expect from an md5 hash i.e.,

md5: byte stream -> byte vector[8]

(I mean the type signature of the md5 procedure and not the
signature/hash value that it returns which is correct.)

The result of an md5 hash is a 128 bit value.  The current md5 returns a
bytestring of the hex string of the value and not the 128 bit value
itself.

IMHO the desired behavior is reflected below.

> (md5 (string->bytes/utf-8 ""))
#"\324\35\214\331\217\0\262\4\351\200\t\230\354\370B~" 
;; Note:: 8 bytes,128 bits

> (bytes->hexstring (md5 (string->bytes/utf-8 "")))
"d41d8cd98f00b204e9800998ecf8427e"

Currently DrScheme returns the 256 bit bytestring of the hex string of
the 128 bit value (yea, I know its confusing).

> (md5 (string->bytes/utf-8 ""))
#"d41d8cd98f00b204e9800998ecf8427e"  ;; 32 bytes => 256 bits

So why the big deal, well ... 

Access to the true 128 bit value is important for example to create a
HTTP1.1 Content-MD5 header which is:

(base64-encode (md5 <payload>))

====================================================================

I did create a modified md5.ss for my own use but I though I'd pass on
the suggested change and the code changes.  Though I could see where PLT
might not want to change the current behavior for compatibility ror
other reasons etc..

Note that bytes->hexstring proc is in my prelude.scm file but is
embedded in the md5-test proc.  md5-test passes with my changes.

======= Below Replaces End Of collects/mzlib/md5.ss ===============

  ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
  ;;; Changed to
  ;;; MD5:: bytes -> bytes[8]
  ;;; MD5 result is now a 128 bits value in a byte[8] string.
  ;;; Old Step 5 was returning a bytestring of the hex string of the
value of the MD5 result.
  ;;; This precluded things like (base64-encode (md5 <data>)) which one
needs to do to create
  ;;; a HTTP1.1 Content-MD5 header for example.
  ;;; The old behavior of the MD5 is reproduced by (bytes->hexstring
(md5 <data>))
  ;;; See commented out md5-test procedure below for a naive
bytes->hexstring
  ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
  
  ;; Step 5  -  Output
  ;; To finish up, we convert the word to hexadecimal string
  ;; - and make sure they end up in order.
  ;; step5 : (list word word word word) -> string
  
  ;  (define (step5 l)
  ;    (define hex #(48 49 50 51 52 53 54 55 56 57 97 98 99 100 101
102))
  ;    ;; word->bytesl : word -> (list byte),
  ;    ;; returns a little endian result, but each byte is hi half and
then lo half
  ;    (define (word->bytesl w)
  ;      (let ([byte (lambda (n b) (bitwise-and (arithmetic-shift n (-
b)) 15))]
  ;            [lo (cdr w)] [hi (car w)])
  ;        (list (byte lo 4) (byte lo 0) (byte lo 12) (byte lo 8)
  ;              (byte hi 4) (byte hi 0) (byte hi 12) (byte hi 8))))
  ;    (display "--Bytes: ")(display l)(newline)
  ;    (apply bytes (map (lambda (n) (vector-ref hex n))
  ;                      (apply append (map word->bytesl l)))))
  
  (define (step5 l)    
    (define (word->bytes w)
      (let ((lo (integer->integer-bytes (cdr w) 2 #f))
            (hi (integer->integer-bytes (car w) 2 #f)))
        (bytes-append lo hi)))    
    (apply bytes-append (map word->bytes l)))

;  (define (md5-test)
;    
;    (define (bytes->hexstring bstr)
;      (let ((hex #(48 49 50 51 52 53 54 55 56 57 97 98 99 100 101 102))
;            (umask #b11110000)
;            (lmask #b00001111))      
;        (let loop ((hexbytes '()) (i (bytes-length bstr)))
;          (if (zero? i)
;              (bytes->string/utf-8 (apply bytes hexbytes))
;              (let ((b (bytes-ref bstr (- i 1))))
;                (let ((unibble (vector-ref hex (arithmetic-shift
(bitwise-and b umask) -4)))
;                      (lnibble (vector-ref hex (bitwise-and b
lmask))))                
;                  (loop (cons unibble (cons lnibble hexbytes)) (- i
1))))))))    
;    
;    (if (and (equal? (bytes->hexstring (md5 (string->bytes/utf-8 "")))
;                     "d41d8cd98f00b204e9800998ecf8427e")
;             (equal? (bytes->hexstring (md5 (string->bytes/utf-8 "a")))
;                     "0cc175b9c0f1b6a831c399e269772661")
;             (equal? (bytes->hexstring (md5 (string->bytes/utf-8
"abc")))
;                     "900150983cd24fb0d6963f7d28e17f72")
;             (equal? (bytes->hexstring (md5 (string->bytes/utf-8
"message digest")))
;                     "f96b697d7cb7938d525a2f31aaf161d0")
;             (equal? (bytes->hexstring (md5 (string->bytes/utf-8
"abcdefghijklmnopqrstuvwxyz")))
;                     "c3fcd3d76192e4007dfb496cca67e13b")
;             (equal? (bytes->hexstring (md5 (string->bytes/utf-8
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789")))
;                     "d174ab98d277d9f5a5611c2c9f419d9f")
;             (equal? (bytes->hexstring (md5 (string->bytes/utf-8
"12345678901234567890123456789012345678901234567890123456789012345678901234567890")))
;                     "57edf4a22be3c955ac49da2e2107b67a"))
;        'passed
;        'failed))
;  
;  (md5-test)



Posted on the users mailing list.