[racket-dev] Symlink trouble

From: Tobias Hammer (tobias.hammer at dlr.de)
Date: Thu Apr 18 12:24:36 EDT 2013

Tried it and it works perfectly. Thanks!

On Wed, 17 Apr 2013 16:39:22 +0200, Matthew Flatt <mflatt at cs.utah.edu>  

> Yes, I think Racket should use PWD --- if the expansion of soft links
> produces the same path as getcwd(), which seems to be what "/bin/pwd"
> does.
> Should Racket also set PWD (optionally, but by default) when it creates
> a subprocess? I think probably so.
> To make sure we're all on the same page:
> The general problem is that there can be more than one filesystem path
> that reaches a file. It would be great if we could normalize every path
> to a canonical form, but path normalization in general seems to
> intractable due to the possibilities of soft links, hard links,
> multiple mount points, case-sensitivity choices, and probably other
> twists that I'm forgetting. We have therefore settled on different
> definitions of "same file", depending on the context.
> For module paths, "same file" involves only syntactic normalizations of
> the pathname (e.g., no checking for soft links). Various pieces of the
> system are carefully implemented to be consistent with syntactic
> normalization. For example, suppose that PLTCOLLECTS is set to
> "/home/mflatt/plt", but "/home/mflatt" is a symlink to "/Users/mflatt";
> pathnames associated to modules that are accessed via collection will
> consistently use "/home/mflatt", and not somehow hop over to
> "/Users/mflatt". As long as a user is similarly consistent when
> supplying paths, it all works out.
> Unfortunately, `current-directory' is a place where you don't get to
> choose the path. You might say "/home/mflatt/plt" to get to a Racket
> installation, but to initialize `current-directory', the path gets
> turned into an inode and back to a path via getcwd() --- exactly the
> sort of thing that breaks a syntactic view of "same".
> The PWD environment variable addresses the problem with getcwd(): nice
> shells set PWD based on a syntactic derivation of the current
> directory, instead of an inode-based derivation.
> So, Racket should take advantage of the information that nice shells
> provide. Probably it should also act as a nice shell by default.
> (As it happens, I use "csh" on Mac OS X, and it's not nice in the above
> sense. That helps explain why I never got PWD vs. cwd() before.)
> At Wed, 17 Apr 2013 12:06:29 +0200, Tobias Hammer wrote:
>> Hi,
>> i am currently implementing an application that heavily relies on  
>> rackets
>> great serialize functionality to exchange data between racket processes  
>> on
>> different computers. That works well until i stumbled over a very
>> confusion behavior of rackets filesystem and module path resolution.
>> I will explain first, what i observed and then why this causes some
>> trouble:
>> * relative (module) paths are resolved with something like (or
>> (current-load-directory) (current-directory))
>> * collection paths are resolved with
>>   (find-executable-path (find-system-path 'exec-file) (find-system-path
>> 'collects-dir)) for the system collection and with the given path for  
>> the
>> others
>> * you can require a module relative and via collection, if they resolve  
>> to
>> the same name, there is no error
>> serialize stores the module path and symbol where the deserialize  
>> function
>> can be found. It's interesting how this module path is determined
>> * If the file containing the deserialize identifier (if implemented by
>> hand or the file where e.g serializable-stuct is used) is loaded via
>> collection, then the serialized stream contains a collection path
>> (determined via identifier binding and mpi magic)
>> * If this file is loaded relative, the fallback method with
>> current-(load)-directory is used
>> Nothing special so far, but the fun starts with how current-directory is
>> initialized. It uses (on *nix systems) getcwd() but this function  
>> returns
>> the path with all symbolic links resolved (getcwd is only a thin
>> OS-wrapper, and the OS provides nothing else).
>> This little detail can easily break the serialization framework (and  
>> maybe
>> other things too).
>> The scenario is a file that is in a path containing a symlink and that  
>> is
>> in the current collections, e.g
>> /abc/symlink/more/def/file.rkt
>> and PLTCOLLECTS="/abc/symlink/more:"
>> and file.rkt contains a serializable-struct definition.
>> Now one racket process loads "file.rkt" relative, serializes a struct
>> instance and sends it to another racket process. The other process loads
>> def/file via collection and deserialies the struct. The receiver now  
>> has a
>> struct that is of a different type and that he can't access.
>> This fails because the serialized data contains the absolute  
>> symlink-free
>> path that differs from the path the receiver used to load file.rkt
>> (because for collection dirs symlinks are not resolved).
>> The same happens of course when the data is send to another computer  
>> that
>> has a symlink in the path to file.rkt, even if they both load the same  
>> way.
>> The confusing thing is that from the users point of view everything is
>> consistent. His working directory and collections all point to the same
>> location.
>> It is clear that this behavior is by far not limited to racket as nearly
>> all programming languages use getcwd internally. A quick google search  
>> for
>> getcwd and symlinks gives a lot of results...
>> I came up with a few solutions but i would like to get some feedback on
>> them. They all more or less use that the shell keeps track of the 'real'
>> (better: visible) working directory. Most *nix shells set 'PWD' in the
>> environment but it is not guaranteed and can of cause be altered by the
>> user.
>> - The quick and very dirty hack is to set the current-directoy before  
>> any
>> use code is executed
>> racket -e '(current-directory (or (getenv "PWD") (current-directory)))'
>> program.rkt
>> Too ugly to really use it...
>> - A better fix would be to change how the current-directory parameter is
>> initialized during the startup. It could be some heuristic that tries to
>> use the env-variable if it is a complete and existing path and falls  
>> back
>> to getcwd otherwise. As far as i can tell this won't break anything
>> because after this one time at startup the C-sides cwd and rackets
>> parameter are completely decoupled.
>> - A more conservative solution would be a command line argument to  
>> racket
>> to set the initial value for current-directory. One could then populate  
>> it
>> with env's PWD or from `pwd` or whatever suits.
>> I would appreciate any feedback on how i can work around this behavior
>> (except don't use symlinks ...) or if i missed something obvious. If  
>> not,
>> would any of the two real solutions be viable? They shouldn't be too  
>> hard
>> to implement i could create a patch if one of them seems ok.
>> Tobias
>> --
>> ---------------------------------------------------------
>> Tobias Hammer
>> DLR / Robotics and Mechatronics Center (RMC)
>> Muenchner Str. 20, D-82234 Wessling
>> Tel.: 08153/28-1487
>> Mail: tobias.hammer at dlr.de
>> _________________________
>>   Racket Developers list:
>>   http://lists.racket-lang.org/dev

Tobias Hammer
DLR / Robotics and Mechatronics Center (RMC)
Muenchner Str. 20, D-82234 Wessling
Tel.: 08153/28-1487
Mail: tobias.hammer at dlr.de

Posted on the dev mailing list.