[racket] pregexp to detect Japanese characters

From: Geoffrey S. Knauth (geoff at knauth.org)
Date: Tue Sep 16 10:13:38 EDT 2014

Previous message: [racket] pregexp to detect Japanese characters
Next message: [racket] OOPSLA/SPLASH early registration deadline approaching
Messages sorted by: [date] [thread] [subject] [author]

NEVER MIND.  I figured it out:

; detects japanese characters
; contains-japanese-characters? str -> bool
(define (contains-japanese-characters? s)
 (or (regexp-match #rx"[\u3041-\u3096]" s)  ; Hiragana
     (regexp-match #rx"[\u30A0-\u30FF]" s)  ; Katakana (Full Width)
     (regexp-match #rx"[\u3400-\u4DB5\u4E00-\u9FCB\uF900-\uFA6A]" s)  ; Kanji
     (regexp-match #rx"[\u2E80-\u2FD5]" s)  ; Kanji Radicals
     (regexp-match #rx"[\uFF5F-\uFF9F]" s)  ; Katakana and Punctuation (Half Width)
     (regexp-match #rx"[\u3000-\u303F]" s)  ; Japanese Symbols and Punctuation
     (regexp-match #rx"[\u31F0-\u31FF\u3220-\u3243\u3280-\u337F]" s)  ; Misc. Japanese Symbols/Chars
     (regexp-match #rx"[\uFF01-\uFF5E]" s)))  ; Alphanumeric and Punctuation (Full Width)

On Sep 16, 2014, at 09:22 , Geoffrey S. Knauth <geoff at knauth.org> wrote:

> I'm writing a function to detect Japanese characters in a string. I found this page:
>  
> http://www.localizingjapan.com/blog/2012/01/20/regular-expressions-for-japanese-text/
>  
> So, for example, the example Perl regexp [\x{3041}-\x{3096}] would detect Hiragana characters (as would \p{Hiragana}).  How do I express such a Unicode range with Racket regexps?
>  
> I looked at the docs below and it wasn't obvious to me how to do it.  In other languages there might be, for example, a \xnnnn or \uxxxx construct.
>  
> http://docs.racket-lang.org/reference/regexp.html#%28elem._%28rxex._30%29%29
>  
> --
> Geoffrey S. Knauth | http://knauth.org/gsk
>  
> ____________________
>  Racket Users list:
>  http://lists.racket-lang.org/users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.racket-lang.org/users/archive/attachments/20140916/1cba68f9/attachment-0001.html>

Posted on the users mailing list.

Previous message: [racket] pregexp to detect Japanese characters
Next message: [racket] OOPSLA/SPLASH early registration deadline approaching
Messages sorted by: [date] [thread] [subject] [author]