[racket] string-strip

From: Matthew Flatt (mflatt at cs.utah.edu)
Date: Wed Jan 11 11:41:53 EST 2012

> On 29-12-11 10:02, Marijn wrote:
> >>> What I found was that it is much slower to treat a string as a 
> >>> port and then read-char from that then it is to directly index
> >>>  the string.
> > 
> >> That string input ports are often noticeably slower than string 
> >> indexing is one of the banes of my existence.  Most reading and 
> >> parsing operations you implement, you want to work on both ports 
> >> and strings. But, if you first write a procedure that works on a 
> >> port, and then write a wrapper procedure that works on a string
> >> (by doing an "open-input-string" and calling your procedure that
> >> works on ports), the string one can be noticeably slower than if
> >> you'd handwritten the string one.  But having to write two
> >> separate procedures has big development cost, and I always just
> >> take the performance hit on strings instead, or write a string
> >> procedure and then not have a port procedure when I need it
> >> later.  One approach that might help is to design a macro that
> >> lets people define processing on strings and ports, and expands
> >> to produce two closure definitions -- one that works on ports,
> >> and one on strings, and avoids a lot of port-related overhead in
> >> the string one.
> > 
> > Matthew, any comments on this? Is there a fundamental reason that 
> > treating a string as a port is so much slower than direct indexing
> > or is there something that can be done about it? Or should we look
> > into automatically duplicating code with macros?

The biggest overhead in port access compared to byte/string access
comes from maintaining the port state, an indirection to support
different kinds of ports, and, in the case of comparing strings,
potentially decoding characters from UTF-8. Another effect (equally
large for byte strings) is that `bytes-ref' and `string-ref' are
JIT-inlined, while `read-byte' and `read-char' are not.

See the example program below. The `port-like' variants demonstrate the
work that `read-byte' and `read-char' actually do in the simple case
that line-counting is disabled, no UTF-8 decoding turns out to be
needed, etc.

On large strings:

 'bytes
 cpu time: 78 real time: 77 gc time: 0
 'bytes/no-inline
 cpu time: 220 real time: 220 gc time: 0
 'bytes-port-like
 cpu time: 430 real time: 429 gc time: 0
 'bytes-port
 cpu time: 373 real time: 372 gc time: 0

 'string
 cpu time: 81 real time: 81 gc time: 0
 'string/no-inline
 cpu time: 201 real time: 201 gc time: 0
 'string-port-like
 cpu time: 560 real time: 560 gc time: 17
 'string-port
 cpu time: 524 real time: 523 gc time: 1

The difference in the last case was bigger prior to v5.2, where we cut
the overhead of maintaining port state roughly in half. That
improvement is reflected in the `simple-ok?' field of a `port-like'
structure that guards the simple case. Currently, I don't see how to
improve that further.

As strings get smaller, then the cost of creating a port becomes more
important; in the case of strings, that creation involves a conversion
of the string to UTF-8.

If I shift 3 zeros from N to M in the example program (strings 1/1000th
of the old size, 1000 times as many iterations):

 'bytes
 cpu time: 77 real time: 78 gc time: 0
 'bytes/no-inline
 cpu time: 226 real time: 225 gc time: 0
 'bytes-port-like
 cpu time: 551 real time: 552 gc time: 21
 'bytes-port
 cpu time: 670 real time: 672 gc time: 21

 'string
 cpu time: 101 real time: 102 gc time: 0
 'string/no-inline
 cpu time: 258 real time: 258 gc time: 0
 'string-port-like
 cpu time: 759 real time: 760 gc time: 8
 'string-port
 cpu time: 902 real time: 904 gc time: 22

I think the `port-like' variants fare better here because the actual
implementation has a more complex structure to set up each time. I
found one minor improvement (around 10%) that's reflected above, but I
don't see other easy improvements.

----------------------------------------
#lang racket

(define N 10000)
(define bstr (bytes->immutable-bytes (make-bytes N (char->integer #\x))))
(define str (make-string N #\x))

(define M 1000)

'bytes
(time
 (for ([i (in-range M)])
   (for ([j (in-range N)])
     (bytes-ref bstr j))))

'bytes/no-inline
(define bytes-ref/not-inlined #f)
(set! bytes-ref/not-inlined bytes-ref)
(time
 (for ([i (in-range M)])
   (for ([j (in-range N)])
     (bytes-ref/not-inlined bstr j))))

'bytes-port-like
(struct port-like (proc [simple-ok? #:mutable]))
(define (make-port-like-bytes bstr)
  (define pos 0)
  (port-like
   (lambda ()
     (let ([b (bytes-ref bstr pos)])
       (set! pos (add1 pos))
       b))
   #t))
(define (like-read-byte pl)
  (if (port-like? pl)
      (if (port-like-simple-ok? pl)
          ((port-like-proc pl))
          (error "not implemented"))
      (error "bad")))
(time
 (for ([i (in-range M)])
   (define p (make-port-like-bytes bstr))
   (for ([j (in-range N)])
     (like-read-byte p))))

'bytes-port
(time
 (for ([i (in-range M)])
   (define p (open-input-bytes bstr))
   (for ([j (in-range N)])
     (read-byte p))))


'string
(time
 (for ([i (in-range M)])
   (for ([j (in-range N)])
     (string-ref str j))))

'string/no-inline
(define string-ref/not-inlined #f)
(set! string-ref/not-inlined string-ref)
(time
 (for ([i (in-range M)])
   (for ([j (in-range N)])
     (string-ref/not-inlined str j))))

'string-port-like
(define (make-port-like-string str)
  (define pos 0)
  (define bstr (string->bytes/utf-8 str))
  (port-like
   (lambda ()
     (let ([b (bytes-ref bstr pos)])
       (if (b . > . 127)
           (error "complicated case not implemented")
           (begin
             (set! pos (add1 pos))
             (integer->char b)))))
   #t))
(define (like-read-char pl)
  (if (port-like? pl)
      (if (port-like-simple-ok? pl)
          ((port-like-proc pl))
          (error "not implemented"))
      (error "bad")))
(time
 (for ([i (in-range M)])
   (define p (make-port-like-string str))
   (for ([j (in-range N)])
     (like-read-char p))))

'string-port
(time
 (for ([i (in-range M)])
   (define p (open-input-string str))
   (for ([j (in-range N)])
     (read-char p))))



Posted on the users mailing list.