[plt-scheme] Process hangs with big input

From: Robby Findler (robby at cs.uchicago.edu)
Date: Tue Mar 14 10:01:58 EST 2006

At Tue, 14 Mar 2006 06:42:39 -0800 (PST), Noel Welsh wrote:
> Thanks for the tips.  What is the newish port library?  Is 
> that ports are waitable?  Or something else?

Just the stuff described in the mzscheme manual, plus (lib "ports.ss").

> I'm adverse to using threads in this context as I'd have to
> restruct my code, but if its the only way I guess I'll just
> have to suck it up and get on with it.
> 
> > What are you doing, in this particular case?
> 
> I'm running svn diff to count the number of lines of code
> students have modified in a project.  This is so we can
> (roughly) assess their contribution to a group project. 
> For my use the code is basically:
> 
>   - for all revisions, get the output of svn diff for the
> revision and its predecessor
>   - regex search for ^- and ^+ (subtractions and additions)
>   - count total number per user for all revisions
> 
> However, you can see the same problem if you simply run
> 
>  (system/output "cat a-big-file")
> 
> I suppose I could run the regexps on the ports, instead of
> collecting the results in a string.  I'll try that first.

That's definitely what I would try. It should be short.

Do you know about "svn annotate"? I've put some code below that does
what you're asking, using svn annotate, in case that's helpful.

Robby

(module tmp mzscheme
  (require (lib "process.ss")
           (lib "list.ss")
           (lib "port.ss"))
  
  ;; count : string -> hash-table[string -o> number]
  ;; counts the number of lines a person is responsible for in a file, using svn annotate
  (define (count file)
    (let-values ([(out in pid err proc) (apply values (process (format "svn annotate ~a" file)))])
      (let ([err-thd
             (thread
              (λ () 
                (copy-port err (current-error-port))
                (close-input-port err)))])
        (close-output-port in)
        (let ([ht (count-lines out)])
          (close-input-port out)
          (thread-wait err-thd)
          (proc 'wait)
          (quicksort (hash-table-map ht list)
                     (λ (x y) (>= (cadr x) (cadr y))))))))
  
  ;; count-lines : port -> hash-table[string -o> number]
  (define (count-lines out)
    (let ([ht (make-hash-table 'equal)])
      (let loop ()
        (let ([l (read-line out)])
          (unless (eof-object? l)
            (let ([m (regexp-match id-reg l)])
              (unless m
                (error 'count "cannot parse ~s" l))
              (hash-table-inc! ht (cadr m)))
            (loop))))
      ht))
  (define id-reg #rx"^[^0-9]*[0-9]+[ \t]+([^ ]*) ")

  (define (hash-table-inc! ht k) (hash-table-put! ht k (+ 1 (hash-table-get ht k (λ () 0)))))
  
  (printf "~s\n" (count "/Users/robby/svn/plt/collects/mzlib/port.ss")))



Posted on the users mailing list.