[plt-scheme] (Slightly Off) Looking for a Sorting Algorithm
Many thanks to all the people who have given me advice. Your comments
have been very helpful for me to narrow down the problem, which is more
complicated than I initially thought. Noel Welsh suspected that it
might be a combinatorial optimisation problem, but hopefully it's
simpler than that. To satisfy your curiosity, I'm working on a generic
syntax-semantics interface, i.e. the transformation from natural
language syntax to a formal system, which in this case is just
predicate logic with S-expression syntax. Consider my first example,
but with other labels:
(((NP) (every sailor)) ((VP) (loves)) ((NP) (a woman)))
Solution 1: (((NP) (a woman)) ((NP) (every sailor)) ((VP) (loves)))
Solution 2: (((NP) (every sailor)) ((NP) (a woman)) ((VP) (loves)))
So given that NPs are binders, quantifier scope ambiguity is solved by
one rule NP<VP (within sentence boundary). However, I've realized that
I need two kinds of rules. One that gives me the solution without
further reordering, i.e. Peter gives Mary the book==> Peter Mary the
book gives, and one for getting all the 'admitted' permutations, like
the above sailor example. So I need < and <+ that give me all solutions
and .< and .<+ that give me the 'minimal' solution respectively. In
the above, NP.<VP would only yield solution 2, and (((a) 1) ((b) 2)
((c) 3) ((c) 4)) with (.< c b) should only yield (((a) 1) ((c) 3) ((c)
4) ((b) 2)).
>> Example: e is ((a 1) (b 2) (a 3) (b 4) (a 5) (c 6) ((a d) 7)), c is
>> ((< a b) (<+ d c))
>>
>> Solutions: any permutation of thist of a-entries followed by (b 2)
>> (b4) (c 6) i.e. the second rule is actually cancelled out by the
>> first one
>>
>> Example: e is ((a 1) (b 2) (b 3) (a 4)), c is ((<+ a b))
>>
>> Solution: ((a 1) (b 2) (a 4) (b 3))
>
> Your problem still doesn't make complete sense to me.
>
> 1) does the initial ordering of the elements affect the set of legal
> solutions?
Yes. Any elements to which no rule applies must maintain the ordering
relative to each other that they have in the input. That's why the
sorting algorithm in my original approach had to be stable.
> 2) in your second example you have ... (c 6) ((a d) 7).... That is,
> some of the elements have a symbol, and some have a list of symbols.
> Is (c 6) an abbreviation for ((c) 6) ?
Sorry, yes. I should have been more precise. I took (c 6) as a shortcut
for ((c) 6). Payloads can have more than one label and even no label at
all. If there's no label, no rule applies.
> 3) your use of the word "moved" in the <+ definition is worrisome. Do
> you mean simply that "a must occur immediately before b"? Also, in
> the presence of multiple 'a' labels, what does this mean? Your third
> example would seem to suggest that it means "every element containing
> an 'a' label must be followed by an element with a 'b' label. Is this
> correct?
(3a) I initially also took a<+b to mean "a must occur immediately
before b", but there two problems. First, suppose I have (((a) 1) ((c)
2) ((d) 3)) and rules (< d a) and (<+ d c), then (((d) 3) ((a) 1) ((c)
2) should be a valid solution. Yet, (((e) 1) ((c) 2) ((d) 3)) would
yield (((e) 1) ((d) 3) ((c) 2)). That's why I've been talking about
"movement", which is indeed troublesome, since a naive movement
approach might not terminate. Second, a<+b rather means "a must occur
immediately before a*b, where a* is zero or more occurrences of a".In
case of the new .<+ the order of the a-occurrences must furthermore be
preserved, i.e. a sequence ba_1 x ... a_n y is mapped to the sequence
a_1 ... a_n b x y, where x and y are any sequences of non-a's.
(3c) Yes, (((b) 1) ((a) 2) ((a) 3)) with rule (< a b) should yield
(((a) 2) ((a) 3) ((b) 1)) and (((a) 3) ((a) 2) ((b) 1)).
> 4) along the same lines, does (< a b) mean that all a's must occur
> before all b's, or that every a must occur before a b?
>
Yes, all a's must occur before all b's. The brute force approach (not
yet implemented) seems to be to compile all rules into one conjunctive
rule, then do a stable sort that preserves the order (for .< and .<+),
and then permute the entries that are allowed to be permuted for < and
<+. But I'm unsure how to deal with rule conflicts like (((a d) 1) ((c
b) 2)) with rules (< a b) and (< c d), and I really wonder if there
isn't a smarter way to do it.
Does perhaps someone know this problem from another domain?
Best regards,
Erich Rast
P.S. Sorry for the longish post. Anyone who is interested in this
problem may also contact me off-list.