[racket] How write regexp for lexer
Thanks.
26.09.2013, 06:39, "Evgeny Odegov" <oev-racket at sibmail.com>:
>> I want to different two kind types of comment - first - is comment which
>> begin from start line (first symbol of line is '/' or whitespaces and '/'
>> ), second - is comment which start with '/' but beffore was reading any
>> other symbol (except whitespaces)
>> Javascript regexp for it is "^//.*\n", but i dot know how write '^' in
>> lexer in drracket
>
> The only way I see is using a flag for it.
>
> #lang racket/base
>
> (require racket/string
> parser-tools/lex
> (prefix-in : parser-tools/lex-sre))
>
> (define-tokens my-tokens (BCOMMENT COMMENT OT))
>
> (define-empty-tokens my-empty-tokens (EOF NEWLINE))
>
> (define-lex-abbrevs
> [%whitespace (:or #\tab #\space #\vtab)]
> [%newline (:or #\newline (:seq #\return #\newline))]
> [%other-token "other-token"]
> [%comment (:seq (:* %whitespace)
> "//"
> (:* (:& (char-complement #\return)
> (char-complement #\newline))))])
>
> (define my-lexer
> (let ([bol #true]) ;; beginning of line flag
> (lexer-src-pos
> [(:+ %whitespace)
> (return-without-pos (my-lexer input-port))]
> [%newline
> (begin
> (set! bol #true)
> #;(token-NEWLINE)
> (return-without-pos (my-lexer input-port)))] ;; if NEWLINE tokens
> are not required further
> [%comment
> (let ([r (if bol
> (token-BCOMMENT lexeme)
> (token-COMMENT lexeme))])
> (set! bol #false)
> r)]
> [%other-token
> (begin
> (set! bol #false)
> (token-OT lexeme))]
> [(eof)
> 'EOF])))
>
> (define p (open-input-string "//must be full line comment? but don't
> recognize
> other-token // end to line comment (it's ok)
> //full line comment (it's right)
> // must be full comment, but %whitespace eat space :("))
> (port-count-lines! p)
>
> (let loop ([result null])
> (define tok (my-lexer p))
> (if ((position-token-token tok) . eq? . 'EOF)
> (reverse (cons tok result))
> (loop (cons tok result))))
>
>> Thansk for answer, but it can't help. Simple example latter -
>>
>> #lang racket/base
>>
>> (require racket/string
>> parser-tools/lex
>> (prefix-in : parser-tools/lex-sre))
>>
>> (define-tokens my-tokens (BCOMMENT COMMENT OT))
>>
>> (define-empty-tokens my-empty-tokens (EOF NEWLINE))
>>
>> (define-lex-abbrevs
>> [%whitespace (:or #\tab #\space #\vtab)]
>> [%newline (:or #\newline (:seq #\return #\newline))]
>> [%other-token "other-token"]
>> [%bComment (:seq %newline "//" (:* (:& (char-complement #\return)
>> (char-complement #\newline))))]
>> [%comment (:seq "//" (:* (:& (char-complement #\return)
>> (char-complement #\newline))))])
>>
>> (define my-lexer
>> (lexer-src-pos
>> [(:+ %whitespace)
>> (return-without-pos (my-lexer input-port))]
>> [%newline
>> (token-NEWLINE)]
>> [%bComment
>> (token-BCOMMENT lexeme)]
>> [%comment
>> (token-COMMENT lexeme)]
>> [%other-token (token-OT lexeme)]
>> [(eof)
>> 'EOF]))
>>
>> (define p (open-input-string "//must be full line comment? but don't
>> recognize
>> other-token // end to line comment (it's ok)
>> //full line comment (it's right)
>> // must be full comment, but %whitespace eat space :("))
>> (port-count-lines! p)
>>
>> (let loop ([result null])
>> (define tok (my-lexer p))
>> (if ((position-token-token tok) . eq? . 'EOF)
>> (reverse (cons tok result))
>> (loop (cons tok result))))
>>
>> output -
>>
>> (list
>> (position-token (token 'COMMENT "//must be full line comment? but don't
>> recognize") (position 1 1 0) (position 49 1 48))
>> (position-token 'NEWLINE (position 49 1 48) (position 50 2 0))
>> (position-token (token 'OT "other-token") (position 50 2 0) (position 61
>> 2 11))
>> (position-token (token 'COMMENT "// end to line comment (it's ok)")
>> (position 62 2 12) (position 94 2 44))
>> (position-token (token 'BCOMMENT "\n//full line comment (it's right)")
>> (position 94 2 44) (position 127 3 32))
>> (position-token 'NEWLINE (position 127 3 32) (position 128 4 0))
>> (position-token (token 'COMMENT "// must be full comment, but %whitespace
>> eat space :(") (position 130 4 2) (position 183 4 55))
>> (position-token 'EOF (position 183 4 55) (position 183 4 55)))
>> I want to different two kind types of comment - first - is comment which
>> begin from start line (first symbol of line is '/' or whitespaces and '/'
>> ), second - is comment which start with '/' but beffore was reading any
>> other symbol (except whitespaces)
>> Javascript regexp for it is "^//.*\n", but i don't know how write '^' in
>> lexer in drracket
>>
>> 25.09.2013, 15:10, "Evgeny Odegov" <oev-racket at sibmail.com>:
>>> Валентин,
>>> maybe this short example could help
>>>
>>> http://pastebin.com/ncZpH49E
>>>> Hello. I need write rule for lexer for recognize comment which begin
>>>> from
>>>> first column of line.
>>>> I have next regexp for comment (:seq "//" (:* (:~ CR LF))), aftrer add
>>>> cr/lf to begin of expression i write (:seq LineTerminator (:* (:or
>>>> #\space TAB FF)) "//" (:* (:~ CR LF))), but this regexp -first eat
>>>> LineTerninator, second - don't recognize comment if it first in file
>>>> Help me, please, write this rule
>>>> ____________________
>>>> Racket Users list:
>>>> http://lists.racket-lang.org/users