[racket] reading null-terminated byte-string?
I know nothing about the internals and even a compiler writer may not be able to help you with such questions w/o profiling. Throw this one in too:
#lang racket
(module+ test
(require rackunit)
(define Bytes (list->bytes '(102 111 111 0 98 97 114)))
(check-equal? (with-input-from-bytes Bytes read-nt-string) #"foo")
(check-equal? (with-input-from-bytes #"" read-nt-string) #""))
;; -> Bytes
; read a null-terminated string
(define (read-nt-string)
(list->bytes
(let L ()
(define next (read-byte))
(cond
[(eof-object? next) '()]
[(= 0 next) '()]
[else (cons next (L))]))))
It is the moral Racket equivalent of the regexp specialized ("compiled") to the specific expression.
On Jan 2, 2014, at 3:26 PM, David Richards <contactguitarist at gmail.com> wrote:
> Interesting. So Racket's regex processor is highly optimized? I’ve seen people use it for more complex pattern matching, but I dismissed it as too overhead-costly for the simple use-case of detecting a terminating #”\0”.
>
> Maybe I’ll exec profile these two solutions and see how badly one fares against the other. I trust yours will win because of your knowledge of Racket internals.
>
> Thanks for the pointer to regexp-match.
>
> :-)
>
> dr
>
>
> On Jan 2, 2014, at 2:17 PM, Matthias Felleisen <matthias at ccs.neu.edu> wrote:
>
>>
>>
>>
>> No built-in function but easy to define like this:
>>
>> #lang racket
>>
>> (module+ test
>> (require rackunit)
>> (define Bytes (list->bytes '(102 111 111 0 98 97 114)))
>> (check-equal? (with-input-from-bytes Bytes read-nt-string) #"foo")
>> (check-equal? (with-input-from-bytes #"" read-nt-string) #""))
>>
>> ;; -> Bytes
>> ; read a null-terminated string
>> (define (read-nt-string)
>> (define next (regexp-match "(.*)\0" (current-input-port)))
>> (if (boolean? next) #"" (second next)))
>>
>>
>>
>>
>>
>> On Jan 2, 2014, at 1:22 PM, David Richards <contactguitarist at gmail.com> wrote:
>>
>>> Hi Matthias,
>>>
>>> Pardon my coding style:
>>>
>>> (define Input (open-input-bytes (list->bytes '(102 111 111 0 98 97 114))))
>>>
>>> (define (seek-byte Byte Port)
>>> (define (_seek-byte Byte Port Pos)
>>> (if (equal? Byte (peek-byte Port Pos))
>>> Pos
>>> (_seek-byte Byte Port (+ 1 Pos))))
>>> (_seek-byte Byte Port 0))
>>>
>>> (define (read-nt-string Port) ; read a null-terminated string
>>> (define Length (seek-byte 0 Port))
>>> (define Value (read-bytes Length Port))
>>> (read-bytes 1 Port) ; consume terminator
>>> Value)
>>>
>>> (read-nt-string Input) ; => #”foo"
>>>
>>> So, what built-in procedure is equivalent to “read-nt-string”?
>>>
>>> “read-bytes-line” only permits (or/c 'linefeed 'return 'return-linefeed 'any 'any-one), not #"\0”. It’s only useful for generic 7-bit ASCII text with standard line endings. Not useful at all for general byte streams with 8-bit content.
>>>
>>> Why just add the ability to terminate with an arbitrary byte, or even an arbitrary byte-string?
>>>
>>> Admittedly a “line” typically ends with (or/c 'linefeed 'return 'return-linefeed 'any 'any-one), so is there another library procedure that addresses this basic operation?
>>>
>>> Obviously I can solve the problem as above with some ‘cobble code’, but there’s no way I’m going to address buffering, efficiently-sized block reads, vector scans, and all the other ‘inside stuff’ that is likely being done by the library procedures to optimize IO speed. Luckily I didn’t have a large data-set to process. Only about 500 MB, with no real-time demands. Otherwise all those calls to peek-byte would surely have killed me. I’d strongly prefer to use a library procedure, if it exists. And I’d love to know why it doesn’t exist, if it doesn’t exist.
>>>
>>> Thanks.
>>>
>>> dr
>>>
>>>
>>>
>>> On Jan 2, 2014, at 9:42 AM, Matthias Felleisen <matthias at ccs.neu.edu> wrote:
>>>
>>>>
>>>> There are many ways to read bytes (and I assume you mean 'byte' not 'char' or 'string'). Here is how to read a complete line:
>>>>
>>>> Welcome to Racket v6.0.0.1.
>>>>> (read-bytes-line)
>>>> #""
>>>>> (read-bytes-line)"hello world, how is david"
>>>> #"\"hello world, how is david\""
>>>>
>>>>
>>>> The definitive reference is at http://docs.racket-lang.org/reference/Byte_and_String_Input.html
>>>>
>>>> If this is not helpful, try to ask the question again. -- Matthias
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Jan 1, 2014, at 12:31 PM, David Richards <contactguitarist at gmail.com> wrote:
>>>>
>>>>>
>>>>> How do I read a value-terminated byte-string from an input port (i.e. a null-terminated string)?
>>>>>
>>>>> dr
>>>>> ____________________
>>>>> Racket Users list:
>>>>> http://lists.racket-lang.org/users
>>>>
>>>
>>
>