[plt-scheme] reading a whole file

From: Richard Cleis (rcleis at mac.com)
Date: Mon Nov 10 12:22:52 EST 2008

On Sunday, November 09, 2008, at 09:05PM, "Shriram Krishnamurthi" <sk at cs.brown.edu> wrote:
>Fast on the heels of this thread, today I had to insert a bunch of
>JavaScript gobbledygook into each of my HTML files.  Here's the Scheme
>code I wrote:
>(define tracker "... gobbledygook goes here ...
>(define (go f)
>  (let ([txt (with-input-from-file f
>               (lambda () (read-string (file-size f))))])
>    (let ([new-txt
>           (regexp-replace (regexp "</HEAD>")
>                           txt
>                           (string-append tracker "</HEAD>"))])
>      (if (string=? txt new-txt)
>          (printf "Pattern not found in ~a~n" f)
>          (with-output-to-file "out"
>            (lambda () (write-string new-txt)))))))
>(go (vector-ref (current-command-line-arguments) 0))
>[Orthogonal note: writing the gobbledygook was itself a bit painful,
>and made me better appreciate Python's quoting.]
>I then launched this from a `find' command in the shell that would
>locate the .html files, do a quick check relating "out" to the
>original, and then mv out to override original.
>I eyeballed the output to make sure there were no instances of
>"Pattern not found".  There was one, and I was able to cross-check why
>and make sure there was no problem.
>It sure would have been nice to make the above code both shorter and
>more robust with FILE->STRING and STRING->FILE...
>[This is really just a response to Richard, who seemed to be arguing
>against such primitives.  The "optics" argument is actually stood on
>its head here: my hand-written code is the 80...let's call it 60%
>solution, because it does no error-checking, probably doesn't handle
>Unicode, certainly doesn't care about automicity due to assumptions
>about myself, etc.  At any rate, given that this is *not* being used
>in a compositional manner, the 80% solution of reading the file into
>memory, processing it, and writing it back out seems to me just the
>right thing.  Counter-argument?]

I was replying on a phone beneath a roller coaster in the middle of nowhere, so I didn't type enough to make the claim that the *worst case* in programming would be when so little correlation existed between the 80% cases that 3 of them used together would result in a 50% solution (* 0.8 0.8 0.8)... rather than another 80% solution.  Fortunately primitives, modules, etc are organized so that reality could be closer to 80 than 50. 

The file>string issue is driven by performance, perhaps due to the inability to generalize interfaces between user memory, os memory, disk memory, and disk media.  Many uses, such as the JavaScript example above, involve data which entirely fits in the memory on the disk drive; I doubt any modern drive has less than at least several megabytes.  Even if a program asks for 1 character, much more relevant data probably ended up in memory yet there aren't enough ways to optimize access to it.  The true solution, whatever that means, might need environments that necessarily remove a program's reliance on distinguishing memory from media.  But haven't people been making that claim for decades? :)


>  For list-related administrative tasks:
>  http://list.cs.brown.edu/mailman/listinfo/plt-scheme

Posted on the users mailing list.