[plt-scheme] (Slightly Off) Looking for a Sorting Algorithm

From: Erich Rast (erich at snafu.de)
Date: Mon Nov 21 05:04:58 EST 2005

Many thanks to all the people who have given me advice. Your comments 
have been very helpful for me to narrow down the problem, which is more 
complicated than I initially thought. Noel Welsh suspected that it 
might be a combinatorial optimisation problem, but hopefully it's 
simpler than that. To satisfy your curiosity, I'm working on a generic 
syntax-semantics interface, i.e. the transformation from natural 
language syntax to a formal system, which in this case is just 
predicate logic with S-expression syntax. Consider my first example, 
but with other labels:

(((NP) (every sailor)) ((VP) (loves)) ((NP) (a woman)))

Solution 1: (((NP) (a woman)) ((NP) (every sailor)) ((VP) (loves)))
Solution 2: (((NP) (every sailor)) ((NP) (a woman)) ((VP) (loves)))

So given that NPs are binders, quantifier scope ambiguity is solved by 
one rule NP<VP (within sentence boundary). However, I've realized that 
I need two kinds of rules. One that gives me the solution without 
further reordering, i.e. Peter gives Mary the book==> Peter Mary the 
book gives, and one for getting all the 'admitted' permutations, like 
the above sailor example. So I need < and <+ that give me all solutions 
and  .< and .<+ that give me the 'minimal' solution respectively. In 
the above, NP.<VP would only yield solution 2, and (((a) 1) ((b) 2) 
((c) 3) ((c) 4)) with (.< c b) should only yield (((a) 1) ((c) 3) ((c) 
4) ((b) 2)).

>> Example: e is ((a 1) (b 2) (a 3) (b 4) (a 5) (c 6) ((a d) 7)), c is 
>> ((< a b) (<+ d c))
>> Solutions: any permutation of thist of a-entries followed by (b 2) 
>> (b4) (c 6) i.e. the second rule is actually cancelled out by the 
>> first one
>> Example: e is ((a 1) (b 2) (b 3) (a 4)), c is ((<+ a b))
>> Solution: ((a 1) (b 2) (a 4) (b 3))
> Your problem still doesn't make complete sense to me.
> 1) does the initial ordering of the elements affect the set of legal 
> solutions?

Yes. Any elements to which no rule applies must maintain the ordering 
relative to each other that they have in the input. That's why the 
sorting algorithm in my original approach had to be stable.

> 2) in your second example you have ... (c 6) ((a d) 7).... That is, 
> some of the elements have a symbol, and some have a list of symbols.  
> Is (c 6) an abbreviation for ((c) 6) ?

Sorry, yes. I should have been more precise. I took (c 6) as a shortcut 
for ((c) 6). Payloads can have more than one label and even no label at 
all. If there's no label, no rule applies.

> 3) your use of the word "moved" in the <+ definition is worrisome.  Do 
> you mean simply that "a must occur immediately before b"?  Also, in 
> the presence of multiple 'a' labels, what does this mean?  Your third 
> example would seem to suggest that it means "every element containing 
> an 'a' label must be followed by an element with a 'b' label.  Is this 
> correct?

(3a) I initially also took a<+b to mean "a must occur immediately 
before b", but there two problems. First, suppose I have  (((a) 1) ((c) 
2) ((d) 3)) and rules (< d a) and (<+ d c), then (((d) 3) ((a) 1) ((c) 
2) should be a valid solution. Yet, (((e) 1) ((c) 2) ((d) 3)) would 
yield (((e) 1) ((d) 3) ((c) 2)). That's why I've been talking about 
"movement", which is indeed troublesome, since a naive movement 
approach might not terminate. Second, a<+b rather means "a must occur 
immediately before a*b, where a* is zero or more occurrences of a".In 
case of the new .<+ the order of the a-occurrences must furthermore be 
preserved, i.e. a sequence ba_1 x ... a_n y is mapped to the sequence 
a_1 ... a_n b x y, where x and y are any sequences of non-a's.

(3c) Yes, (((b) 1) ((a) 2) ((a) 3)) with rule (< a b) should yield 
(((a) 2) ((a) 3) ((b) 1)) and (((a) 3) ((a) 2) ((b) 1)).

> 4) along the same lines, does (< a b) mean that all a's must occur 
> before all b's, or that every a must occur before a b?

Yes, all a's must occur before all b's. The  brute force approach (not 
yet implemented) seems to be to compile all rules into one conjunctive 
rule, then do a stable sort that preserves the order (for .< and .<+), 
and then permute the entries that are allowed to be permuted for < and 
<+. But I'm unsure how to deal with rule conflicts like (((a d) 1) ((c 
b) 2)) with rules (< a b) and (< c d), and I really wonder if there 
isn't a smarter way to do it.

Does perhaps someone know this problem from another domain?

Best regards,

Erich Rast

P.S. Sorry for the longish post. Anyone who is interested in this 
problem may also contact me off-list. 

Posted on the users mailing list.