[racket] pregexp to detect Japanese characters
NEVER MIND. I figured it out:
; detects japanese characters
; contains-japanese-characters? str -> bool
(define (contains-japanese-characters? s)
(or (regexp-match #rx"[\u3041-\u3096]" s) ; Hiragana
(regexp-match #rx"[\u30A0-\u30FF]" s) ; Katakana (Full Width)
(regexp-match #rx"[\u3400-\u4DB5\u4E00-\u9FCB\uF900-\uFA6A]" s) ; Kanji
(regexp-match #rx"[\u2E80-\u2FD5]" s) ; Kanji Radicals
(regexp-match #rx"[\uFF5F-\uFF9F]" s) ; Katakana and Punctuation (Half Width)
(regexp-match #rx"[\u3000-\u303F]" s) ; Japanese Symbols and Punctuation
(regexp-match #rx"[\u31F0-\u31FF\u3220-\u3243\u3280-\u337F]" s) ; Misc. Japanese Symbols/Chars
(regexp-match #rx"[\uFF01-\uFF5E]" s))) ; Alphanumeric and Punctuation (Full Width)
On Sep 16, 2014, at 09:22 , Geoffrey S. Knauth <geoff at knauth.org> wrote:
> I'm writing a function to detect Japanese characters in a string. I found this page:
>
> http://www.localizingjapan.com/blog/2012/01/20/regular-expressions-for-japanese-text/
>
> So, for example, the example Perl regexp [\x{3041}-\x{3096}] would detect Hiragana characters (as would \p{Hiragana}). How do I express such a Unicode range with Racket regexps?
>
> I looked at the docs below and it wasn't obvious to me how to do it. In other languages there might be, for example, a \xnnnn or \uxxxx construct.
>
> http://docs.racket-lang.org/reference/regexp.html#%28elem._%28rxex._30%29%29
>
> --
> Geoffrey S. Knauth | http://knauth.org/gsk
>
> ____________________
> Racket Users list:
> http://lists.racket-lang.org/users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.racket-lang.org/users/archive/attachments/20140916/1cba68f9/attachment-0001.html>