[racket] Looking for feedback on code style

From: Will M. Farr (wmfarr at gmail.com)
Date: Thu Sep 9 13:27:17 EDT 2010

Previous message: [racket] Looking for feedback on code style
Next message: [racket] Looking for feedback on code style
Messages sorted by: [date] [thread] [subject] [author]

On Sep 9, 2010, at 11:37 AM, Prabhakar Ragde wrote:

> And then what? Are we justified in saying T(N) = T(3N/4) + O(N)? Really, we have T(N) = T(X) + O(N), where X is a random variable uniformly distributed over [0..N-1]. With a suitable induction hypothesis, we can dig ourselves out of this, but it's not simple, unfortunately. --PR

I don't think even this is quite right (I'm really reaching above my pay grade here, so I apologize if I'm making stupid mistakes):

We know that each iteration spends O(N) time for the partition, plus whatever time is spent on the chosen half of the partition.  So, we definitely have 

T(N) = T(X) + O(N),

where X is the length of the chosen partition, which must lie between 1 and N, since it must contain the desired element and could contain all the elements if the pivot is the smallest element from the list.  What is the distribution for X?  The distribution of N-lt (the length of the set of elements smaller than the pivot) is uniform random on [0...N-1] because the pivot is chosen uniformly at random from the list.  (We are assuming that the distribution of elements in the list is uniform, and that the set of possible list elements has cardinality much greater than the length of the list; if there are likely to be lots of repeated elements, then N-lt is biased toward smaller values, since repeated pivot elements wind up in the greater-or-equal list by default.)

But, even assuming N-lt ~ U(0,N-1), it's not obvious to me that the selection of the less-than or greater-or-equal lists is 50/50.  It seems like the longer lists are more likely to be chosen.  So, X should have a distribution something like

P(X) = P(N-lt = X) P(lt chosen) + P(N-gte = X) P(gte chosen) = P(N-lt = X)P(lt chosen) + P(N-lt = N-X) P(gte chosen) = 1/N X/N + 1/N X/N = 2X/N^2,

except that there is one case where this formula breaks down: when X = N, we cannot have N-lt = N since at least the pivot element must be found in the greater-or-equal list.  So 

P(X) = 2X/N^2 if 1 <= X < N, and 1/N if X = N.

(Note that this sums to 1 = sum(2X/N^2, X = 1..N-1) + 1/N, so it at least passes that simple check.)  Now we can arrive at a recursion relation for < T(N) >:

< T(N) > = < T(X) > + O(N) = 1/N (sum(2X/N^2 T(X), X=1..N-1) + 1/N T(N)) + O(N)

I'm not exactly sure where to go from here, however; as you say, it's tricky to analyze.  Nevertheless, it sure feels like it's O(N) (I've experimented quite a bit with timing tests, and the simple argument about averages I gave before "feels right").

Will

Posted on the users mailing list.

Previous message: [racket] Looking for feedback on code style
Next message: [racket] Looking for feedback on code style
Messages sorted by: [date] [thread] [subject] [author]