[racket] How write regexp for lexer

From: Бомбин Валентин (wwall at yandex.ru)
Date: Thu Sep 26 03:07:36 EDT 2013

Thanks. 

26.09.2013, 06:39, "Evgeny Odegov" <oev-racket at sibmail.com>:
>>  I want to different two kind types of comment - first - is comment which
>>  begin from start line (first symbol of line is '/' or whitespaces and '/'
>>  ), second - is comment which start with '/' but beffore was reading any
>>  other symbol (except whitespaces)
>>  Javascript regexp for it is "^//.*\n", but i dot know how write '^' in
>>  lexer in drracket
>
> The only way I see is using a flag for it.
>
> #lang racket/base
>
> (require racket/string
>          parser-tools/lex
>          (prefix-in : parser-tools/lex-sre))
>
> (define-tokens my-tokens (BCOMMENT COMMENT OT))
>
> (define-empty-tokens my-empty-tokens (EOF NEWLINE))
>
> (define-lex-abbrevs
>   [%whitespace (:or #\tab #\space #\vtab)]
>   [%newline (:or #\newline (:seq #\return #\newline))]
>   [%other-token  "other-token"]
>   [%comment (:seq (:* %whitespace)
>                   "//"
>                   (:* (:& (char-complement #\return)
>                           (char-complement #\newline))))])
>
> (define my-lexer
>   (let ([bol #true]) ;; beginning of line flag
>     (lexer-src-pos
>      [(:+ %whitespace)
>       (return-without-pos (my-lexer input-port))]
>      [%newline
>       (begin
>         (set! bol #true)
>         #;(token-NEWLINE)
>         (return-without-pos (my-lexer input-port)))] ;; if NEWLINE tokens
> are not required further
>      [%comment
>       (let ([r (if bol
>                    (token-BCOMMENT lexeme)
>                    (token-COMMENT lexeme))])
>         (set! bol #false)
>         r)]
>      [%other-token
>       (begin
>         (set! bol #false)
>         (token-OT lexeme))]
>      [(eof)
>       'EOF])))
>
> (define p (open-input-string "//must be full line comment? but don't
> recognize
> other-token // end to line comment (it's ok)
> //full line comment (it's right)
>   // must be full comment, but %whitespace eat space :("))
> (port-count-lines! p)
>
> (let loop ([result null])
>   (define tok (my-lexer p))
>   (if ((position-token-token tok) . eq? . 'EOF)
>       (reverse (cons tok result))
>       (loop (cons tok result))))
>
>>  Thansk for answer, but it can't help. Simple example  latter  -
>>
>>  #lang racket/base
>>
>>  (require racket/string
>>           parser-tools/lex
>>           (prefix-in : parser-tools/lex-sre))
>>
>>  (define-tokens my-tokens (BCOMMENT COMMENT OT))
>>
>>  (define-empty-tokens my-empty-tokens (EOF NEWLINE))
>>
>>  (define-lex-abbrevs
>>    [%whitespace (:or #\tab #\space #\vtab)]
>>    [%newline (:or #\newline (:seq #\return #\newline))]
>>    [%other-token  "other-token"]
>>    [%bComment (:seq %newline "//" (:* (:& (char-complement #\return)
>>                                           (char-complement #\newline))))]
>>    [%comment (:seq "//" (:* (:& (char-complement #\return)
>>                                 (char-complement #\newline))))])
>>
>>  (define my-lexer
>>    (lexer-src-pos
>>     [(:+ %whitespace)
>>      (return-without-pos (my-lexer input-port))]
>>     [%newline
>>      (token-NEWLINE)]
>>     [%bComment
>>      (token-BCOMMENT lexeme)]
>>     [%comment
>>      (token-COMMENT lexeme)]
>>     [%other-token (token-OT lexeme)]
>>     [(eof)
>>      'EOF]))
>>
>>  (define p (open-input-string "//must be full line comment? but don't
>>  recognize
>>  other-token // end to line comment (it's ok)
>>  //full line comment (it's right)
>>    // must be full comment, but %whitespace eat space :("))
>>  (port-count-lines! p)
>>
>>  (let loop ([result null])
>>    (define tok (my-lexer p))
>>    (if ((position-token-token tok) . eq? . 'EOF)
>>        (reverse (cons tok result))
>>        (loop (cons tok result))))
>>
>>   output  -
>>
>>  (list
>>   (position-token (token 'COMMENT "//must be full line comment? but don't
>>  recognize") (position 1 1 0) (position 49 1 48))
>>   (position-token 'NEWLINE (position 49 1 48) (position 50 2 0))
>>   (position-token (token 'OT "other-token") (position 50 2 0) (position 61
>>  2 11))
>>   (position-token (token 'COMMENT "// end to line comment (it's ok)")
>>  (position 62 2 12) (position 94 2 44))
>>   (position-token (token 'BCOMMENT "\n//full line comment (it's right)")
>>  (position 94 2 44) (position 127 3 32))
>>   (position-token 'NEWLINE (position 127 3 32) (position 128 4 0))
>>   (position-token (token 'COMMENT "// must be full comment, but %whitespace
>>  eat space :(") (position 130 4 2) (position 183 4 55))
>>   (position-token 'EOF (position 183 4 55) (position 183 4 55)))
>>  I want to different two kind types of comment - first - is comment which
>>  begin from start line (first symbol of line is '/' or whitespaces and '/'
>>  ), second - is comment which start with '/' but beffore was reading any
>>  other symbol (except whitespaces)
>>  Javascript regexp for it is "^//.*\n", but i don't know how write '^' in
>>  lexer in drracket
>>
>>  25.09.2013, 15:10, "Evgeny Odegov" <oev-racket at sibmail.com>:
>>>  Валентин,
>>>  maybe this short example could help
>>>
>>>  http://pastebin.com/ncZpH49E
>>>>   Hello. I need write rule for lexer for recognize comment which begin
>>>>  from
>>>>   first column of line.
>>>>   I have next regexp for comment (:seq "//" (:* (:~ CR LF))), aftrer add
>>>>   cr/lf to begin of expression i write (:seq  LineTerminator (:* (:or
>>>>   #\space TAB FF)) "//" (:* (:~ CR LF))), but this regexp -first eat
>>>>   LineTerninator, second - don't recognize comment if it first in file
>>>>   Help me, please, write this rule
>>>>   ____________________
>>>>     Racket Users list:
>>>>     http://lists.racket-lang.org/users

Posted on the users mailing list.