[racket] string-strip
> On 29-12-11 10:02, Marijn wrote:
> >>> What I found was that it is much slower to treat a string as a
> >>> port and then read-char from that then it is to directly index
> >>> the string.
> >
> >> That string input ports are often noticeably slower than string
> >> indexing is one of the banes of my existence. Most reading and
> >> parsing operations you implement, you want to work on both ports
> >> and strings. But, if you first write a procedure that works on a
> >> port, and then write a wrapper procedure that works on a string
> >> (by doing an "open-input-string" and calling your procedure that
> >> works on ports), the string one can be noticeably slower than if
> >> you'd handwritten the string one. But having to write two
> >> separate procedures has big development cost, and I always just
> >> take the performance hit on strings instead, or write a string
> >> procedure and then not have a port procedure when I need it
> >> later. One approach that might help is to design a macro that
> >> lets people define processing on strings and ports, and expands
> >> to produce two closure definitions -- one that works on ports,
> >> and one on strings, and avoids a lot of port-related overhead in
> >> the string one.
> >
> > Matthew, any comments on this? Is there a fundamental reason that
> > treating a string as a port is so much slower than direct indexing
> > or is there something that can be done about it? Or should we look
> > into automatically duplicating code with macros?
The biggest overhead in port access compared to byte/string access
comes from maintaining the port state, an indirection to support
different kinds of ports, and, in the case of comparing strings,
potentially decoding characters from UTF-8. Another effect (equally
large for byte strings) is that `bytes-ref' and `string-ref' are
JIT-inlined, while `read-byte' and `read-char' are not.
See the example program below. The `port-like' variants demonstrate the
work that `read-byte' and `read-char' actually do in the simple case
that line-counting is disabled, no UTF-8 decoding turns out to be
needed, etc.
On large strings:
'bytes
cpu time: 78 real time: 77 gc time: 0
'bytes/no-inline
cpu time: 220 real time: 220 gc time: 0
'bytes-port-like
cpu time: 430 real time: 429 gc time: 0
'bytes-port
cpu time: 373 real time: 372 gc time: 0
'string
cpu time: 81 real time: 81 gc time: 0
'string/no-inline
cpu time: 201 real time: 201 gc time: 0
'string-port-like
cpu time: 560 real time: 560 gc time: 17
'string-port
cpu time: 524 real time: 523 gc time: 1
The difference in the last case was bigger prior to v5.2, where we cut
the overhead of maintaining port state roughly in half. That
improvement is reflected in the `simple-ok?' field of a `port-like'
structure that guards the simple case. Currently, I don't see how to
improve that further.
As strings get smaller, then the cost of creating a port becomes more
important; in the case of strings, that creation involves a conversion
of the string to UTF-8.
If I shift 3 zeros from N to M in the example program (strings 1/1000th
of the old size, 1000 times as many iterations):
'bytes
cpu time: 77 real time: 78 gc time: 0
'bytes/no-inline
cpu time: 226 real time: 225 gc time: 0
'bytes-port-like
cpu time: 551 real time: 552 gc time: 21
'bytes-port
cpu time: 670 real time: 672 gc time: 21
'string
cpu time: 101 real time: 102 gc time: 0
'string/no-inline
cpu time: 258 real time: 258 gc time: 0
'string-port-like
cpu time: 759 real time: 760 gc time: 8
'string-port
cpu time: 902 real time: 904 gc time: 22
I think the `port-like' variants fare better here because the actual
implementation has a more complex structure to set up each time. I
found one minor improvement (around 10%) that's reflected above, but I
don't see other easy improvements.
----------------------------------------
#lang racket
(define N 10000)
(define bstr (bytes->immutable-bytes (make-bytes N (char->integer #\x))))
(define str (make-string N #\x))
(define M 1000)
'bytes
(time
(for ([i (in-range M)])
(for ([j (in-range N)])
(bytes-ref bstr j))))
'bytes/no-inline
(define bytes-ref/not-inlined #f)
(set! bytes-ref/not-inlined bytes-ref)
(time
(for ([i (in-range M)])
(for ([j (in-range N)])
(bytes-ref/not-inlined bstr j))))
'bytes-port-like
(struct port-like (proc [simple-ok? #:mutable]))
(define (make-port-like-bytes bstr)
(define pos 0)
(port-like
(lambda ()
(let ([b (bytes-ref bstr pos)])
(set! pos (add1 pos))
b))
#t))
(define (like-read-byte pl)
(if (port-like? pl)
(if (port-like-simple-ok? pl)
((port-like-proc pl))
(error "not implemented"))
(error "bad")))
(time
(for ([i (in-range M)])
(define p (make-port-like-bytes bstr))
(for ([j (in-range N)])
(like-read-byte p))))
'bytes-port
(time
(for ([i (in-range M)])
(define p (open-input-bytes bstr))
(for ([j (in-range N)])
(read-byte p))))
'string
(time
(for ([i (in-range M)])
(for ([j (in-range N)])
(string-ref str j))))
'string/no-inline
(define string-ref/not-inlined #f)
(set! string-ref/not-inlined string-ref)
(time
(for ([i (in-range M)])
(for ([j (in-range N)])
(string-ref/not-inlined str j))))
'string-port-like
(define (make-port-like-string str)
(define pos 0)
(define bstr (string->bytes/utf-8 str))
(port-like
(lambda ()
(let ([b (bytes-ref bstr pos)])
(if (b . > . 127)
(error "complicated case not implemented")
(begin
(set! pos (add1 pos))
(integer->char b)))))
#t))
(define (like-read-char pl)
(if (port-like? pl)
(if (port-like-simple-ok? pl)
((port-like-proc pl))
(error "not implemented"))
(error "bad")))
(time
(for ([i (in-range M)])
(define p (make-port-like-string str))
(for ([j (in-range N)])
(like-read-char p))))
'string-port
(time
(for ([i (in-range M)])
(define p (open-input-string str))
(for ([j (in-range N)])
(read-char p))))