[plt-dev] pgsql package

From: Ryan Culpepper (ryanc at ccs.neu.edu)
Date: Wed Jul 29 20:34:03 EDT 2009

Robby Findler wrote:
> Thanks for the reply, Dave!
> 
> The encoding of my database is SQL_ASCII which, IIUC, means that the
> bug really is in the SCheme code side for putting the data in one way
> and expecting it to come out the other way.

The server has a communication encoding separate from the storage 
encodings of the actual databases. Communication should happen in UTF-8, 
and the server should convert data as necessary on storage and retrieval.

> I don't suppose you (or anyone!) could recommend how this should work?
> If I change the encoding to latin-1 and then change it to utf-8 and
> then change it back to SQL_ASCII, woudl htat effectively translate the
> entire database to utf-8?

I don't know if that would help anything.

> ---
> 
> Ryan said earlier that the line numbers I sent don't match up, but I'm
> actually using version (2 3) of spgsql.plt so I was a bit surprised
> about that. I'm using this package, specifically:
> 
>   http://planet.plt-scheme.org/display.ss?package=spgsql.plt&owner=schematics
> 
> and I can see the utf-8 call in this file:
> 
> http://planet.plt-scheme.org/package-source/schematics/spgsql.plt/2/3/private/io.ss
> 
> and the latin-1 call in this file:
> 
> http://planet.plt-scheme.org/package-source/schematics/spgsql.plt/2/3/private/sql-data.ss
> 
> so that's another mystery that it would be nice to have resolved.

The occurrence of 'string->bytes/latin-1' in that file is for parsing 
PostgreSQL's bytea ("byte array") type. Seems like it's probably not 
getting that far before the error, though.

Is the contributor name stored as a bytea? Depending on how the server 
sends back binary data, that might be the problem.

Ryan


> On Wed, Jul 29, 2009 at 10:26 AM, Dave Gurnell<d.j.gurnell at gmail.com> wrote:
>> Robby wrote:
>>> Hi all (Ryan?): I've got a question about pgsql. From what I can tell,
>>> string data is stored in the database in the latin-1 encoding
>>> (sql-data.ss line 191), but is then retrieved from the database in the
>>> utf-8 encoding (io.ss line 205). Am I getting that right?
>>>
>>> This doesn't mean much, but I changed planet's copy of io.ss to use
>>> bytes->string/latin-1 instead of bytes->string/utf-8, and I was able
>>> to avoid crashing (but the latin-1 encoding might not have any
>>> unencodable octets, so that isn't really saying too too much).
>> PostgreSQL lets you specify character encodings on a per-database and
>> per-client basis. The shell command:
>>
>>  psql -l
>>
>> will show you your setting for each database on your server. The psql
>> command:
>>
>>   \encoding
>>
>> will show you your client encoding for your current session. You can set
>> your client encoding using the psql command:
>>
>>    \encoding UTF8
>>
>> I'm guessing backslash commands can also be sent over an SPGSQL
>> connection... you could try doing:
>>
>>    (send conn exec "\\encoding UTF8")
>>
>> as soon as you connect. If that works, perhaps it's something that could be
>> rolled into SPGSQL's "connect" procedure?
>>
>> Hope this helps,
>>
>> -- Dave
>>
>> _________________________________________________
>>  For list-related administrative tasks:
>>  http://list.cs.brown.edu/mailman/listinfo/plt-dev
>>
> _________________________________________________
>   For list-related administrative tasks:
>   http://list.cs.brown.edu/mailman/listinfo/plt-dev



Posted on the dev mailing list.