[plt-scheme] Seeing some weird endianness issues on Solaris x86 platform
Hi everyone,
I just got a shiny new Sun Ultra 20 workstation. Unfortunately, since
it's running Solaris, it's useless because it doesn't have PLT Scheme
installed on it. I'm trying to fix that. *grin*
The system is a x86 Opteron system running Solaris 10. I'm running into
what looks like an encoding issue with the functions that go between paths
and strings:
;;;;;;
> (path->string (string->path "hello"))
"\U3F000000\U3F000000\U3F000000\U3F000000\U3F000000"
> (path->bytes (bytes->path (string->bytes/utf-8 "hello")))
#"hello"
;;;;;;
The bytes in the high end appear to represent the characters, but there's
something bizarre going on here.
If I just munge up current-locale to #f, then everything is happy:
######
> (current-locale #f)
> (path->string (string->path "hello"))
"hello"
######
I've been reading the mzscheme source code, and I think that it has
something to do with the way locales are handled on my system, though I
haven't been able to pinpoint it yet. In file.c, if I kludge file.c
slightly:
Index: src/file.c
===================================================================
--- src/file.c (revision 1343)
+++ src/file.c (working copy)
@@ -598,7 +598,7 @@
}
#endif
- s = scheme_byte_string_to_char_string_locale(p);
+ s = scheme_byte_string_to_char_string(p);
if (!SCHEME_CHAR_STRLEN_VAL(s))
return scheme_make_utf8_string("?");
then things work a little better --- I get the good result from the
experiment with string->path + path->string --- but I know this is the
wrong way to fix this.
My current locale is set to the system default:
######
> (current-locale)
""
######
So I think that there's must be some assymmetry between the treatment of
scheme_byte_string_to_char_string_locale and
scheme_char_string_to_byte_string_locale on my system, but my brain is a
little sleep-deprived; I'll stop for the moment and look at this tomorrow.
Best of wishes!