[plt-scheme] please help with generating random permutation

From: Will M Farr (farr at MIT.EDU)
Date: Thu Sep 10 15:46:48 EDT 2009

By the way, you might like to check out http://www.random.org .  I'm  
pretty sure they have the facilities for this sort of drawing, and I  
know they will provide a link where people can "observe" the results  
of the drawing after-the-fact.  They use random atmospheric noise to  
generate their random bits, which may help re-assure any skeptical  
employees.

I second what was said about 20 samples being *way* too small for  
testing whether the drawing is random.

Some math: the number of times a given person will be selected in this  
type of drawing is probably Poisson-distributed.  In this case, with  
20 drawings, and 8 people, the mean number of times somebody gets  
selected is 20/8 ~ 2, and the standard deviation is sqrt(2).  Thus, in  
your sample, number 0 has been drawn approximately 2 + 2*sqrt(2)  
times, which is a "2-sigma" deviation, and is expected to occur about  
5% of the time.  Over 8 different people, one would expect one 2-sigma  
deviation in the group about 45% of the time.  All others people are  
within one sigma of the mean number of drawings, which is also  
typical.  All this is to say: the results you show for 20 drawings are  
*completely* consistent with the shuffle being random.

Now, for 10000 drawings, you would expect that each person would be  
chosen about 1000 times, with a standard deviation of about sqrt(1000)  
~ 30.  So, anybody appearing in the list more than 1100, or fewer than  
900 times would represent a "3-sigma" deviation (which has about one  
chance in 10 to occur "naturally"), and deviations much larger than  
this would imply that there is, in fact, a problem with the shuffle  
algorithm.  For 10^6 drawings, the situation is even better;  
generally, if the expected number of appearances of a person is N, the  
standard deviation of this number is sqrt(N), so the fractional change  
from one standard deviation is sqrt(N)/N = 1/sqrt(N), so more samples  
decreases the relative amount of fluctuation expected by sqrt(N).

So, give it more draws before you conclude that there is a problem  
with the shuffle.

Will

P.S.---This might give you pause about how "fair" a random drawing can  
be.  Over one year of 250 working days, the typical variation in days  
worked from person to person will be 16, or 6%!  I don't think I would  
like a "random" 6% variation in my salary, for example....

On Sep 10, 2009, at 1:49 PM, keydana at gmx.de wrote:

> Hi,
>
> I am in a rather uncomfortable situation after writing a script to  
> select people for the (not too popular) rotating job of "duty  
> person" in our department...
> One colleague complains he is always chosen on Monday. Up till now  
> the sample is very small, which would make "proving" this hypothesis  
> impossible, but it's not a nice situation all the same, so I would  
> like to make "as sure as possible" that the distribution is random.
>
> The part of the script in question works on a weekly basis,  
> generating a different "duty person" for every day of a week. It  
> gets any known "unavailability events" for the week from our  
> calendar and filters the input list accordingly. Then, for every  
> day, it divides the list of people who are available into two parts,  
> those who did duty the week before (list A) and those who did not  
> (list B). It then uses the amb operator to select a person for every  
> day (separately), preferredly from list B but if necessary also from  
> list A. In the end, it checks that the 5 people selected are unique,  
> else it will backtrack.
>
> Now because it was clear that amb would backtrack in a foreseeable  
> way, when the script starts it first shuffles the input list. For  
> the shuffle, I am using the function P. Ragde posted here (seehttp://groups.google.com/group/plt-scheme/browse_thread/thread/2b0de444dc77e11f 
> , at the end), which was said to have a good random distribution (I  
> also can confirm this when I run it e.g. 1000 times in a row).
> However, in my script I run the shuffle only once per execution, and  
> when I e.g. run the whole script 20 times and look who is first in  
> the shuffled list, I get the following distribution (the numbers  
> stand for the position of the collegue in the input list):
>
> 0 xxxxx
> 1 x
> 2 xxx
> 3 xx
> 4 xx
> 5 xx
> 6 x
> 7
> 8 x
>
> Unfortunately this looks like a correlation, and the aforementioned  
> colleague is the one above :-;
>
> However, I do not understand this, as the mentioned shuffle function  
> uses random, and as I read in the documentation random is seeded  
> with current-milliseconds... So it really should get a different  
> seed at every execution.
>
> I think that somewhere I must be making a mistake here... I would be  
> very grateful for any hints.
> I attach a bit of the code, hopefully  as a clarification of how the  
> selection works, although it is not runnable without the rest and  
> the data provider of course.
>
> Thanks a lot
> Sigrid
>
>
> (define *morning-duty-people* (shuffle '(AG MB SK MM AK RA AV CM HM)))
>
> (define make-morning-duty-list
>  (lambda (start-date-string type)
>     (with-output-to-file (string-append logs-dir (duty-type-logfile  
> type)) #:exists 'append
>       (lambda ()
>         (printf "~nGoing to choose duty of type ~a for week starting  
> from ~a~n" (duty-type-name type) start-date-string)
>         (let ((from (string-to-date start-date-string)))
>           (if (not (= (date-week-day from) 1))
>               (error "Start date has to be a Monday.")
>               (let ((to (get-date-for-interval from 4))
>                     (previous-from (get-date-for-interval from -7)))
>                 (let ((previous-to (get-date-for-interval previous- 
> from 4)))
>                   (let ((known-events-list (get-events from to))
>                         (previous-week-list (get-events previous- 
> from previous-to)))
>                     (let ((already-selected (extract-morning-duties  
> known-events-list from to))
>                           (unavailables (extract-unavailables known- 
> events-list from to))
>                           (previous-week-victims (flatten (extract- 
> morning-duties previous-week-list previous-from previous-to))))
>                       (if (all-true? (map (lambda (days-list) (not  
> (null? days-list))) already-selected))
>                           (let ((week-of-year (get-week-of-year  
> from)))
>                             (printf "Already chosen for morning  
> duty: ~a~n"  (map car already-selected))
>                             (send-email *SERVER* *MAIL-TO* *SENDER*  
> (if dry-run (format "DRY RUN: ~a ~a" (duty-type-mail-subject type)  
> week-of-year) (format "~a ~a" (duty-type-mail-subject type) week-of- 
> year)) (insert-variables (duty-type-mail-body-exists type) week-of- 
> year (date-to-string from 'short) (date-to-string to 'short) (map  
> car already-selected))))
>                           (let ((availables (map (lambda (lst)  
> (complement *morning-duty-people* lst)) unavailables))
>                                 (previous-week-spared (complement  
> *morning-duty-people* previous-week-victims)))
>                             (when debug (printf "Going to determine  
> morning duties for period from ~a to ~a:~nUnavailable are:  
> ~s~nAlready entered are: ~s~nChoosable are: ~s~nPrevious week's  
> victims were: ~s~n" (date->string from "~Y-~m-~d") (date->string to  
> "~Y-~m-~d") unavailables already-selected availables previous-week- 
> victims))
>                             (let ((possible-duties (choose-possible- 
> duties availables already-selected previous-week-spared)))
>                               (when debug (printf "Choose-possible- 
> duties output: ~s~n" possible-duties))
>                               (check-unique possible-duties)
>                               (when debug (printf "Unique duties  
> are: ~s~n" possible-duties))
>                               (if dry-run
>                                   (printf "Not sending events to  
> calendar 'cause dry-run~n")
>                                   (let ((result (build-morning-duty- 
> events possible-duties from)))
>                                     (let ((new-only (filter (lambda  
> (event) (not (event-exists? known-events-list event))) result)))
>                                       (store-events new-only))))
>                               (let ((week-of-year (get-week-of-year  
> from)))
>                                 (send-email *SERVER* *MAIL-TO*  
> *SENDER* (if dry-run (format "DRY RUN: ~a ~a" (duty-type-mail- 
> subject type) week-of-year) (format "~a ~a" (duty-type-mail-subject  
> type) week-of-year)) (insert-variables (duty-type-mail-body type)  
> week-of-year (date-to-string from 'short) (date-to-string to 'short)  
> possible-duties))))))))))))))))
>
>
> (define choose-possible-duties
>  (lambda (availables-list already-selected-list preferred-list)
>    (when debug (printf "Choose-possible-duties called with  
> input:~nAvailable are: ~s~nAlready registered are: ~s~n" availables- 
> list already-selected-list))
>    (let loop ((availables availables-list) (selected already- 
> selected-list) (result '()))
>      (if (null? availables)
>          (reverse result)
>          (let ((current-date-availables (car availables)) (current- 
> date-selected (car selected)))
>            (if (not (null? current-date-selected))
>                (let ((already-entered (car current-date-selected)))
>                  (loop (cdr availables) (cdr selected) (cons already- 
> entered result)))
>                (let-values (((preferred&available  
> availablebutrathernot) (partition (lambda (person) (member person  
> preferred-list)) current-date-availables)))
>                  (let ((amb-result (grouped-list-amb  
> preferred&available availablebutrathernot)))
>                    (loop (cdr availables) (cdr selected) (cons amb- 
> result result))))))))))
>
>
>
> _________________________________________________
>  For list-related administrative tasks:
>  http://list.cs.brown.edu/mailman/listinfo/plt-scheme

-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 203 bytes
Desc: This is a digitally signed message part
URL: <http://lists.racket-lang.org/users/archive/attachments/20090910/7eca8705/attachment.sig>

Posted on the users mailing list.