[racket] How write regexp for lexer
> I want to different two kind types of comment - first - is comment which
> begin from start line (first symbol of line is '/' or whitespaces and '/'
> ), second - is comment which start with '/' but beffore was reading any
> other symbol (except whitespaces)
> Javascript regexp for it is "^//.*\n", but i don't know how write '^' in
> lexer in drracket
The only way I see is using a flag for it.
#lang racket/base
(require racket/string
parser-tools/lex
(prefix-in : parser-tools/lex-sre))
(define-tokens my-tokens (BCOMMENT COMMENT OT))
(define-empty-tokens my-empty-tokens (EOF NEWLINE))
(define-lex-abbrevs
[%whitespace (:or #\tab #\space #\vtab)]
[%newline (:or #\newline (:seq #\return #\newline))]
[%other-token "other-token"]
[%comment (:seq (:* %whitespace)
"//"
(:* (:& (char-complement #\return)
(char-complement #\newline))))])
(define my-lexer
(let ([bol #true]) ;; beginning of line flag
(lexer-src-pos
[(:+ %whitespace)
(return-without-pos (my-lexer input-port))]
[%newline
(begin
(set! bol #true)
#;(token-NEWLINE)
(return-without-pos (my-lexer input-port)))] ;; if NEWLINE tokens
are not required further
[%comment
(let ([r (if bol
(token-BCOMMENT lexeme)
(token-COMMENT lexeme))])
(set! bol #false)
r)]
[%other-token
(begin
(set! bol #false)
(token-OT lexeme))]
[(eof)
'EOF])))
(define p (open-input-string "//must be full line comment? but don't
recognize
other-token // end to line comment (it's ok)
//full line comment (it's right)
// must be full comment, but %whitespace eat space :("))
(port-count-lines! p)
(let loop ([result null])
(define tok (my-lexer p))
(if ((position-token-token tok) . eq? . 'EOF)
(reverse (cons tok result))
(loop (cons tok result))))
> Thansk for answer, but it can't help. Simple example latter -
>
> #lang racket/base
>
> (require racket/string
> parser-tools/lex
> (prefix-in : parser-tools/lex-sre))
>
> (define-tokens my-tokens (BCOMMENT COMMENT OT))
>
> (define-empty-tokens my-empty-tokens (EOF NEWLINE))
>
> (define-lex-abbrevs
> [%whitespace (:or #\tab #\space #\vtab)]
> [%newline (:or #\newline (:seq #\return #\newline))]
> [%other-token "other-token"]
> [%bComment (:seq %newline "//" (:* (:& (char-complement #\return)
> (char-complement #\newline))))]
> [%comment (:seq "//" (:* (:& (char-complement #\return)
> (char-complement #\newline))))])
>
> (define my-lexer
> (lexer-src-pos
> [(:+ %whitespace)
> (return-without-pos (my-lexer input-port))]
> [%newline
> (token-NEWLINE)]
> [%bComment
> (token-BCOMMENT lexeme)]
> [%comment
> (token-COMMENT lexeme)]
> [%other-token (token-OT lexeme)]
> [(eof)
> 'EOF]))
>
> (define p (open-input-string "//must be full line comment? but don't
> recognize
> other-token // end to line comment (it's ok)
> //full line comment (it's right)
> // must be full comment, but %whitespace eat space :("))
> (port-count-lines! p)
>
> (let loop ([result null])
> (define tok (my-lexer p))
> (if ((position-token-token tok) . eq? . 'EOF)
> (reverse (cons tok result))
> (loop (cons tok result))))
>
>
> output -
>
>
> (list
> (position-token (token 'COMMENT "//must be full line comment? but don't
> recognize") (position 1 1 0) (position 49 1 48))
> (position-token 'NEWLINE (position 49 1 48) (position 50 2 0))
> (position-token (token 'OT "other-token") (position 50 2 0) (position 61
> 2 11))
> (position-token (token 'COMMENT "// end to line comment (it's ok)")
> (position 62 2 12) (position 94 2 44))
> (position-token (token 'BCOMMENT "\n//full line comment (it's right)")
> (position 94 2 44) (position 127 3 32))
> (position-token 'NEWLINE (position 127 3 32) (position 128 4 0))
> (position-token (token 'COMMENT "// must be full comment, but %whitespace
> eat space :(") (position 130 4 2) (position 183 4 55))
> (position-token 'EOF (position 183 4 55) (position 183 4 55)))
>>
> I want to different two kind types of comment - first - is comment which
> begin from start line (first symbol of line is '/' or whitespaces and '/'
> ), second - is comment which start with '/' but beffore was reading any
> other symbol (except whitespaces)
> Javascript regexp for it is "^//.*\n", but i don't know how write '^' in
> lexer in drracket
>
> 25.09.2013, 15:10, "Evgeny Odegov" <oev-racket at sibmail.com>:
>> Валентин,
>> maybe this short example could help
>>
>> http://pastebin.com/ncZpH49E
>>
>>> Hello. I need write rule for lexer for recognize comment which begin
>>> from
>>> first column of line.
>>> I have next regexp for comment (:seq "//" (:* (:~ CR LF))), aftrer add
>>> cr/lf to begin of expression i write (:seq LineTerminator (:* (:or
>>> #\space TAB FF)) "//" (:* (:~ CR LF))), but this regexp -first eat
>>> LineTerninator, second - don't recognize comment if it first in file
>>> Help me, please, write this rule
>>> ____________________
>>> Racket Users list:
>>> http://lists.racket-lang.org/users
>