[racket] How write regexp for lexer

From: Evgeny Odegov (oev-racket at sibmail.com)
Date: Wed Sep 25 22:39:50 EDT 2013

> I want to different two kind types of comment - first - is comment which
> begin from start line (first symbol of line is '/' or whitespaces and '/'
> ), second - is comment which start with '/' but beffore was reading any
> other symbol (except whitespaces)
> Javascript regexp for it is "^//.*\n", but i don't know how write '^' in
> lexer in drracket

The only way I see is using a flag for it.


#lang racket/base

(require racket/string
         parser-tools/lex
         (prefix-in : parser-tools/lex-sre))

(define-tokens my-tokens (BCOMMENT COMMENT OT))

(define-empty-tokens my-empty-tokens (EOF NEWLINE))

(define-lex-abbrevs
  [%whitespace (:or #\tab #\space #\vtab)]
  [%newline (:or #\newline (:seq #\return #\newline))]
  [%other-token  "other-token"]
  [%comment (:seq (:* %whitespace)
                  "//"
                  (:* (:& (char-complement #\return)
                          (char-complement #\newline))))])

(define my-lexer
  (let ([bol #true]) ;; beginning of line flag
    (lexer-src-pos
     [(:+ %whitespace)
      (return-without-pos (my-lexer input-port))]
     [%newline
      (begin
        (set! bol #true)
        #;(token-NEWLINE)
        (return-without-pos (my-lexer input-port)))] ;; if NEWLINE tokens
are not required further
     [%comment
      (let ([r (if bol
                   (token-BCOMMENT lexeme)
                   (token-COMMENT lexeme))])
        (set! bol #false)
        r)]
     [%other-token
      (begin
        (set! bol #false)
        (token-OT lexeme))]
     [(eof)
      'EOF])))

(define p (open-input-string "//must be full line comment? but don't
recognize
other-token // end to line comment (it's ok)
//full line comment (it's right)
  // must be full comment, but %whitespace eat space :("))
(port-count-lines! p)

(let loop ([result null])
  (define tok (my-lexer p))
  (if ((position-token-token tok) . eq? . 'EOF)
      (reverse (cons tok result))
      (loop (cons tok result))))



> Thansk for answer, but it can't help. Simple example  latter  -
>
> #lang racket/base
>
> (require racket/string
>          parser-tools/lex
>          (prefix-in : parser-tools/lex-sre))
>
> (define-tokens my-tokens (BCOMMENT COMMENT OT))
>
> (define-empty-tokens my-empty-tokens (EOF NEWLINE))
>
> (define-lex-abbrevs
>   [%whitespace (:or #\tab #\space #\vtab)]
>   [%newline (:or #\newline (:seq #\return #\newline))]
>   [%other-token  "other-token"]
>   [%bComment (:seq %newline "//" (:* (:& (char-complement #\return)
>                                          (char-complement #\newline))))]
>   [%comment (:seq "//" (:* (:& (char-complement #\return)
>                                (char-complement #\newline))))])
>
> (define my-lexer
>   (lexer-src-pos
>    [(:+ %whitespace)
>     (return-without-pos (my-lexer input-port))]
>    [%newline
>     (token-NEWLINE)]
>    [%bComment
>     (token-BCOMMENT lexeme)]
>    [%comment
>     (token-COMMENT lexeme)]
>    [%other-token (token-OT lexeme)]
>    [(eof)
>     'EOF]))
>
> (define p (open-input-string "//must be full line comment? but don't
> recognize
> other-token // end to line comment (it's ok)
> //full line comment (it's right)
>   // must be full comment, but %whitespace eat space :("))
> (port-count-lines! p)
>
> (let loop ([result null])
>   (define tok (my-lexer p))
>   (if ((position-token-token tok) . eq? . 'EOF)
>       (reverse (cons tok result))
>       (loop (cons tok result))))
>
>
>  output  -
>
>
> (list
>  (position-token (token 'COMMENT "//must be full line comment? but don't
> recognize") (position 1 1 0) (position 49 1 48))
>  (position-token 'NEWLINE (position 49 1 48) (position 50 2 0))
>  (position-token (token 'OT "other-token") (position 50 2 0) (position 61
> 2 11))
>  (position-token (token 'COMMENT "// end to line comment (it's ok)")
> (position 62 2 12) (position 94 2 44))
>  (position-token (token 'BCOMMENT "\n//full line comment (it's right)")
> (position 94 2 44) (position 127 3 32))
>  (position-token 'NEWLINE (position 127 3 32) (position 128 4 0))
>  (position-token (token 'COMMENT "// must be full comment, but %whitespace
> eat space :(") (position 130 4 2) (position 183 4 55))
>  (position-token 'EOF (position 183 4 55) (position 183 4 55)))
>>
> I want to different two kind types of comment - first - is comment which
> begin from start line (first symbol of line is '/' or whitespaces and '/'
> ), second - is comment which start with '/' but beffore was reading any
> other symbol (except whitespaces)
> Javascript regexp for it is "^//.*\n", but i don't know how write '^' in
> lexer in drracket
>
> 25.09.2013, 15:10, "Evgeny Odegov" <oev-racket at sibmail.com>:
>> Валентин,
>> maybe this short example could help
>>
>> http://pastebin.com/ncZpH49E
>>
>>>  Hello. I need write rule for lexer for recognize comment which begin
>>> from
>>>  first column of line.
>>>  I have next regexp for comment (:seq "//" (:* (:~ CR LF))), aftrer add
>>>  cr/lf to begin of expression i write (:seq  LineTerminator (:* (:or
>>>  #\space TAB FF)) "//" (:* (:~ CR LF))), but this regexp -first eat
>>>  LineTerninator, second - don't recognize comment if it first in file
>>>  Help me, please, write this rule
>>>  ____________________
>>>    Racket Users list:
>>>    http://lists.racket-lang.org/users
>



Posted on the users mailing list.