[plt-scheme] v299: process* file args, paths, strings, and byte-strings

From: Matthew Flatt (mflatt at cs.utah.edu)
Date: Wed Oct 20 09:35:01 EDT 2004

At Tue, 19 Oct 2004 21:55:27 -0400, John Clements wrote:
> IIUC, MzScheme should 'know' when it gets a path that can't be 
> expressed in the locale's encoding, right?  So it should be possible to 
> have a 'path->string/conservative' that signals an error if the path 
> can't be expressed as a string?

Here's an implementation:

 (define (path->string/conservative p)
   (bytes->string/locale (path->bytes p)))

> I admit that this is a peculiar corner case, but it _would_ seem 
> possible to have two independent file-system entities (A & B, say) 
> whose paths mapped to the same string, a string that is interpreted by 
> a string-expecting system call to refer to B.  So references to A would 
> wind up (in system call) affecting B instead. Is my reading correct?

This is not a problem for system calls under Unix or OS X, because
system calls take bytes.

It's also not a problem for Windows system calls, which take strings.
You can construct a bad path using `bytes->path' where the bytes are
not a valid encoding, but that's where a new MzScheme hack takes over.
When MzScheme converts a path to a UTF-16 string for a Windows system
call, invalid encoding bytes are converted to "\t" --- which is not
allowed in a Windows path, so the file definitely won't exist. (The
UTF-16 is only for the system call, so "\t" doesn't show up in any
MzScheme error message.)

That's why byte strings are a reliable representation of paths within
MzScheme, and why I didn't follow Java's lead by representing paths as
strings. When you try to talk to another application through strings,
though, lots can go wrong, and MzScheme does the best that I can figure
out.

Matthew



Posted on the users mailing list.