[plt-scheme] reading a whole file

From: Robby Findler (robby at cs.uchicago.edu)
Date: Tue Nov 4 11:11:46 EST 2008

Okay, I got curious. Below are some timing tests that suggest that,
when it is possible, reading the file into a string has more
consistent performance. Matching directly on the port can be much
faster if the match is near the beginning of the file, but much worse
if it is at the end of the file. The code is below. The file in
question is 96k in size.

This is the output I get (on a mac 2.8ghz intel w/ a pretty fast filesystem):

Welcome to DrScheme, version 4.1.2.2-svn3nov2008 [3m].
Language: Module; memory limit: 128 megabytes.
port/beginning-of-file
cpu time: 11 real time: 11 gc time: 0
string/beginning-of-file
cpu time: 180 real time: 180 gc time: 30
port/end-of-file
cpu time: 2695 real time: 2715 gc time: 0
string/end-of-file
cpu time: 226 real time: 226 gc time: 10
>

Robby

#lang scheme
(require scheme/port)

(define ((check-matches-port file) r)
  (call-with-input-file file
    (λ (port)
      (regexp-match r port))))

(define (check-matches-string file)
  (let ([s
         (let ([sp (open-output-string)])
           (call-with-input-file file
             (λ (fp)
               (copy-port fp sp)))
           (get-output-string sp))])
    (λ (r)
      (regexp-match r s))))

(define (test check-matches reg)
  (time
   (let ([matcher (check-matches (build-path (collection-path "framework")
                                             "private"
                                             "frame.ss"))])
     (let loop ([n 300])
       (unless (zero? n)
         (matcher reg)
         (loop (- n 1)))))))

'port/beginning-of-file
(test check-matches-port #rx"unit")

'string/beginning-of-file
(test check-matches-string #rx"unit")


'port/end-of-file
(test check-matches-port #rx"pasteboard-mixin open-here%")

'string/end-of-file
(test check-matches-string #rx"pasteboard-mixin open-here%")


On Tue, Nov 4, 2008 at 11:00 AM, Shriram Krishnamurthi <sk at cs.brown.edu> wrote:
> Yes if you're doing only one string.  It's nice to be able to say
>
>  (define c ...contents of file...)
>
>  (match ... c)
>  (match ... c)
>
> as you formulate and test hypotheses about the content of the file.
> That is, it's nice to type the code to get the file content just once,
> rather than over and over again.
>
> Shriram
>
>

Posted on the users mailing list.