[racket] How write regexp for lexer
Thansk for answer, but it can't help. Simple example latter -
#lang racket/base
(require racket/string
parser-tools/lex
(prefix-in : parser-tools/lex-sre))
(define-tokens my-tokens (BCOMMENT COMMENT OT))
(define-empty-tokens my-empty-tokens (EOF NEWLINE))
(define-lex-abbrevs
[%whitespace (:or #\tab #\space #\vtab)]
[%newline (:or #\newline (:seq #\return #\newline))]
[%other-token "other-token"]
[%bComment (:seq %newline "//" (:* (:& (char-complement #\return)
(char-complement #\newline))))]
[%comment (:seq "//" (:* (:& (char-complement #\return)
(char-complement #\newline))))])
(define my-lexer
(lexer-src-pos
[(:+ %whitespace)
(return-without-pos (my-lexer input-port))]
[%newline
(token-NEWLINE)]
[%bComment
(token-BCOMMENT lexeme)]
[%comment
(token-COMMENT lexeme)]
[%other-token (token-OT lexeme)]
[(eof)
'EOF]))
(define p (open-input-string "//must be full line comment? but don't recognize
other-token // end to line comment (it's ok)
//full line comment (it's right)
// must be full comment, but %whitespace eat space :("))
(port-count-lines! p)
(let loop ([result null])
(define tok (my-lexer p))
(if ((position-token-token tok) . eq? . 'EOF)
(reverse (cons tok result))
(loop (cons tok result))))
output -
(list
(position-token (token 'COMMENT "//must be full line comment? but don't recognize") (position 1 1 0) (position 49 1 48))
(position-token 'NEWLINE (position 49 1 48) (position 50 2 0))
(position-token (token 'OT "other-token") (position 50 2 0) (position 61 2 11))
(position-token (token 'COMMENT "// end to line comment (it's ok)") (position 62 2 12) (position 94 2 44))
(position-token (token 'BCOMMENT "\n//full line comment (it's right)") (position 94 2 44) (position 127 3 32))
(position-token 'NEWLINE (position 127 3 32) (position 128 4 0))
(position-token (token 'COMMENT "// must be full comment, but %whitespace eat space :(") (position 130 4 2) (position 183 4 55))
(position-token 'EOF (position 183 4 55) (position 183 4 55)))
>
I want to different two kind types of comment - first - is comment which begin from start line (first symbol of line is '/' or whitespaces and '/' ), second - is comment which start with '/' but beffore was reading any other symbol (except whitespaces)
Javascript regexp for it is "^//.*\n", but i don't know how write '^' in lexer in drracket
25.09.2013, 15:10, "Evgeny Odegov" <oev-racket at sibmail.com>:
> Валентин,
> maybe this short example could help
>
> http://pastebin.com/ncZpH49E
>
>> Hello. I need write rule for lexer for recognize comment which begin from
>> first column of line.
>> I have next regexp for comment (:seq "//" (:* (:~ CR LF))), aftrer add
>> cr/lf to begin of expression i write (:seq LineTerminator (:* (:or
>> #\space TAB FF)) "//" (:* (:~ CR LF))), but this regexp -first eat
>> LineTerninator, second - don't recognize comment if it first in file
>> Help me, please, write this rule
>> ____________________
>> Racket Users list:
>> http://lists.racket-lang.org/users