[racket] reading null-terminated byte-string?

From: Matthias Felleisen (matthias at ccs.neu.edu)
Date: Thu Jan 2 15:52:29 EST 2014

I know nothing about the internals and even a compiler writer may not be able to help you with such questions w/o profiling. Throw this one in too:

#lang racket 

(module+ test
  (require rackunit)
  (define Bytes (list->bytes '(102 111 111 0 98 97 114)))
  (check-equal? (with-input-from-bytes Bytes read-nt-string) #"foo")
  (check-equal? (with-input-from-bytes #"" read-nt-string) #""))

;; -> Bytes 
; read a null-terminated string
(define (read-nt-string) 
  (list->bytes 
   (let L ()
     (define next (read-byte))
     (cond
       [(eof-object? next) '()]
       [(= 0 next) '()]
       [else (cons next (L))]))))

It is the moral Racket equivalent of the regexp specialized ("compiled") to the specific expression. 



On Jan 2, 2014, at 3:26 PM, David Richards <contactguitarist at gmail.com> wrote:

> Interesting. So Racket's regex processor is highly optimized? I’ve seen people use it for more complex pattern matching, but I dismissed it as too overhead-costly for the simple use-case of detecting a terminating #”\0”.
> 
> Maybe I’ll exec profile these two solutions and see how badly one fares against the other. I trust yours will win because of your knowledge of Racket internals.
> 
> Thanks for the pointer to regexp-match.
> 
> :-)
> 
> dr
> 
> 
> On Jan 2, 2014, at 2:17 PM, Matthias Felleisen <matthias at ccs.neu.edu> wrote:
> 
>> 
>> 
>> 
>> No built-in function but easy to define like this: 
>> 
>> #lang racket 
>> 
>> (module+ test
>> (require rackunit)
>> (define Bytes (list->bytes '(102 111 111 0 98 97 114)))
>> (check-equal? (with-input-from-bytes Bytes read-nt-string) #"foo")
>> (check-equal? (with-input-from-bytes #"" read-nt-string) #""))
>> 
>> ;; -> Bytes 
>> ; read a null-terminated string
>> (define (read-nt-string) 
>> (define next (regexp-match "(.*)\0" (current-input-port)))
>> (if (boolean? next) #"" (second next)))
>> 
>> 
>> 
>> 
>> 
>> On Jan 2, 2014, at 1:22 PM, David Richards <contactguitarist at gmail.com> wrote:
>> 
>>> Hi Matthias,
>>> 
>>> Pardon my coding style:
>>> 
>>> (define Input (open-input-bytes (list->bytes '(102 111 111 0 98 97 114))))
>>> 
>>> (define (seek-byte Byte Port)
>>> (define (_seek-byte Byte Port Pos)
>>>   (if (equal? Byte (peek-byte Port Pos))
>>>       Pos
>>>       (_seek-byte Byte Port (+ 1 Pos))))
>>> (_seek-byte Byte Port 0))
>>> 
>>> (define (read-nt-string Port) ; read a null-terminated string
>>> (define Length (seek-byte 0 Port))
>>> (define Value (read-bytes Length Port))
>>> (read-bytes 1 Port) ; consume terminator
>>> Value)
>>> 
>>> (read-nt-string Input) ; => #”foo"
>>> 
>>> So, what built-in procedure is equivalent to “read-nt-string”?
>>> 
>>> “read-bytes-line” only permits (or/c 'linefeed 'return 'return-linefeed 'any 'any-one), not #"\0”. It’s only useful for generic 7-bit ASCII text with standard line endings. Not useful at all for general byte streams with 8-bit content.
>>> 
>>> Why just add the ability to terminate with an arbitrary byte, or even an arbitrary byte-string?
>>> 
>>> Admittedly a “line” typically ends with  (or/c 'linefeed 'return 'return-linefeed 'any 'any-one), so is there another library procedure that addresses this basic operation?
>>> 
>>> Obviously I can solve the problem as above with some ‘cobble code’, but there’s no way I’m going to address buffering, efficiently-sized block reads, vector scans, and all the other ‘inside stuff’ that is likely being done by the library procedures to optimize IO speed. Luckily I didn’t have a large data-set to process. Only about 500 MB, with no real-time demands. Otherwise all those calls to peek-byte would surely have killed me. I’d strongly prefer to use a library procedure, if it exists. And I’d love to know why it doesn’t exist, if it doesn’t exist.
>>> 
>>> Thanks.
>>> 
>>> dr
>>> 
>>> 
>>> 
>>> On Jan 2, 2014, at 9:42 AM, Matthias Felleisen <matthias at ccs.neu.edu> wrote:
>>> 
>>>> 
>>>> There are many ways to read bytes (and I assume you mean 'byte' not 'char' or 'string'). Here is how to read a complete line: 
>>>> 
>>>> Welcome to Racket v6.0.0.1.
>>>>> (read-bytes-line)
>>>> #""
>>>>> (read-bytes-line)"hello world, how is david"
>>>> #"\"hello world, how is david\""
>>>> 
>>>> 
>>>> The definitive reference is at http://docs.racket-lang.org/reference/Byte_and_String_Input.html
>>>> 
>>>> If this is not helpful, try to ask the question again. -- Matthias
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Jan 1, 2014, at 12:31 PM, David Richards <contactguitarist at gmail.com> wrote:
>>>> 
>>>>> 
>>>>> How do I read a value-terminated byte-string from an input port (i.e. a null-terminated string)?
>>>>> 
>>>>> dr
>>>>> ____________________
>>>>> Racket Users list:
>>>>> http://lists.racket-lang.org/users
>>>> 
>>> 
>> 
> 



Posted on the users mailing list.