[plt-scheme] Lexer capture precedence?

From: Stephen De Gabrielle (spdegabrielle at gmail.com)
Date: Tue Nov 4 10:37:12 EST 2008

Another dumb question,

I'm trying to use the lexer to recognise some basic bit's of XML, and
I'm having trouble working out what I'm doing wrong.

It seems to greedily capture more than I want it to? Is there a way to
cut the greediness down?

eg I want I to find token -->  in a string "kjas kdflkasjfd afkj
a-->flkjjasf as" but it grabs the "kjas kdflkasjfd afkj a--" instead
of capturing the "-->" as a token.

Cheers,

Stephen


(define-lex-abbrevs
  (CR #\015)
  (LF #\012)
  (LineTerminator (re:or CR LF (re:: CR LF)))
  (FF #\014)
  (TAB #\011)
  (WhiteSpace (re:or #\space TAB FF LineTerminator))
  (Tag-set (re:or "<!--" "-->" "</" "/>" "<" ">")) ;
  )

(define (syn-val lex a b c d)
  (newline) (display (list lex a b (position-offset c)
(position-offset d))) (newline)
  (values lex a b (position-offset c) (position-offset d)))
;
(define get-syntax-token
  (lexer
   ((re:+ WhiteSpace)
    (syn-val lexeme 'white-space #f start-pos end-pos))
   ((complement Tag-set) (syn-val lexeme 'no-color #f start-pos end-pos))
   (Tag-set
    (syn-val lexeme 'keyword (string->symbol lexeme) start-pos end-pos))
   ((eof) (syn-val lexeme 'no-color #f start-pos end-pos))
   ))


Posted on the users mailing list.