[plt-scheme] Why do MzScheme ports not respect the locale's encoding by default?

From: Jim Blandy (jimb at redhat.com)
Date: Sat Feb 19 15:16:24 EST 2005

Eli Barzilay <eli at barzilay.org> writes:
> On Feb 18, Jim Blandy wrote:
> > 
> > It's my belief that users want their locale settings respected ---
> > but they also want application writers to be more conscious of
> > locale's effects.  Which is why I'm raising the issue here, and
> > trying to provide background.  But in the end I want to be like John
> > Maynard Keynes: "When the facts change, sir, I change my mind."  If
> > users don't actually like locales, if they find them confusing more
> > often than they find them helpful, then it's hardly worth the
> > trouble to support them.
> > 
> > So what do you think?
> It's impossible to get a meaningful answer to that.  You'll always
> have both kind of users, and you'll always have cultural differences
> that grow out of whatever.  I can only give Israel as an example --
> where native things are much farther away from English than the
> average European language, but it's still much closer to English than
> the average oriental language...  The result is that for a long time
> nobody invested enough work to make things work, so it just became
> stupid enough that anyone who cared would just use English for
> everything.  Programmers in the company I worked at always used the
> American versions of Windows, and most people my age who did anything
> with computers still remember typing in reverse.  The result is that
> you'll get many people there who will definitely prefer working ascii
> software over locale-aware-but-hopelessly-broken junk.

Let me try to re-state your point:

    Introducing locale-sensitivity is likely to complicate
    applications to the point that even disabling locale-specific
    behavior by setting LC_ALL=C (or the equivalent) won't get you
    back to a working state.  Since locale-sensitive software is
    likely to be buggy whether locales are enabled or disabled, it's
    better to simply be locale-insensitive.

    If programmers were perfect, then certainly it would be better to
    be locale-sensitive.  But the actual choice we have is between
    software that correctly supports only a single locale, and
    software that tries to support multiple locales and is broken in
    all of them.

Is that essentially correct?

Let's break down the situation a bit:

- Assuming everything is implemented correctly, are locale-sensitive
  semantics helpful?

- How hard is it to correctly implement locale sensitivity?

For the first point, there's a lot about the idea of having this big
global setting that affects lots of things under the covers that I
don't really like.

For the second point, I think we've implemented many other messier
things just fine --- but they were all things that were important to
enough people that we ground away at them until they actually worked.
If nobody really cares, and monolingual Americans do all the coding,
then it's not going to turn out too well.

So it's an Open Source marketing question: will supporting locale
enlarge the software's audience enough to attract programmers who will
contribute enough to make it work correctly?  How big is that
enlargement?  How many technical people does it have?  You can get
some portion of that audience by hard-coding UTF-8, so locale
sensitivity has to pay for itself with the audience that won't use
UTF-8.  Hmm.

Posted on the users mailing list.