[racket] A simple lexer question

From: Danny Yoo (dyoo at hashcollision.org)
Date: Fri Jun 22 12:33:54 EDT 2012

On Wed, Jun 20, 2012 at 1:26 AM, Gregory Woodhouse <gregwoodhouse at me.com> wrote:
> I want to write a rule that will recognize strings in a language (MUMPS) that doubles double quotes as a means of escaping them. For example "The double quote symbol is \"." would be "The double quote symbol is ""." and "\"" would be """". That seems simple enough except that I need to write regular expression that matches any printing character (including #\spacer  and #\tab except, of course #\". There is the complement operator, but that gives me any character but #\", not quite what I want.  With a set difference, I suppose I could do something like
>
> DQUOTE (DQUOTE DQUOTE | printing - DQUOTE)* DQUOTE
>
> but again, I'm not quite sure how to express this in the lexer.


Perhaps we can use the character set complement operator.  Let's see...

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
#lang racket

(require parser-tools/lex)

(define my-lexer
  (lexer [(concatenation
           "\""
           (repetition 0 +inf.0 (union (char-complement #\")
                                       "\"\""))
           "\"")
          lexeme]))
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;


Would this work?  Here's how it behaves on a few examples:

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
> (my-lexer (open-input-string "\"hello world\""))
"\"hello world\""
> (my-lexer (open-input-string "\"hello \"\"world\""))
"\"hello \"\"world\""
> (my-lexer (open-input-string "\"hello \"world\""))
"\"hello \""
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;


Posted on the users mailing list.