[plt-scheme] please help with generating random permutation
By the way, you might like to check out http://www.random.org . I'm
pretty sure they have the facilities for this sort of drawing, and I
know they will provide a link where people can "observe" the results
of the drawing after-the-fact. They use random atmospheric noise to
generate their random bits, which may help re-assure any skeptical
employees.
I second what was said about 20 samples being *way* too small for
testing whether the drawing is random.
Some math: the number of times a given person will be selected in this
type of drawing is probably Poisson-distributed. In this case, with
20 drawings, and 8 people, the mean number of times somebody gets
selected is 20/8 ~ 2, and the standard deviation is sqrt(2). Thus, in
your sample, number 0 has been drawn approximately 2 + 2*sqrt(2)
times, which is a "2-sigma" deviation, and is expected to occur about
5% of the time. Over 8 different people, one would expect one 2-sigma
deviation in the group about 45% of the time. All others people are
within one sigma of the mean number of drawings, which is also
typical. All this is to say: the results you show for 20 drawings are
*completely* consistent with the shuffle being random.
Now, for 10000 drawings, you would expect that each person would be
chosen about 1000 times, with a standard deviation of about sqrt(1000)
~ 30. So, anybody appearing in the list more than 1100, or fewer than
900 times would represent a "3-sigma" deviation (which has about one
chance in 10 to occur "naturally"), and deviations much larger than
this would imply that there is, in fact, a problem with the shuffle
algorithm. For 10^6 drawings, the situation is even better;
generally, if the expected number of appearances of a person is N, the
standard deviation of this number is sqrt(N), so the fractional change
from one standard deviation is sqrt(N)/N = 1/sqrt(N), so more samples
decreases the relative amount of fluctuation expected by sqrt(N).
So, give it more draws before you conclude that there is a problem
with the shuffle.
Will
P.S.---This might give you pause about how "fair" a random drawing can
be. Over one year of 250 working days, the typical variation in days
worked from person to person will be 16, or 6%! I don't think I would
like a "random" 6% variation in my salary, for example....
On Sep 10, 2009, at 1:49 PM, keydana at gmx.de wrote:
> Hi,
>
> I am in a rather uncomfortable situation after writing a script to
> select people for the (not too popular) rotating job of "duty
> person" in our department...
> One colleague complains he is always chosen on Monday. Up till now
> the sample is very small, which would make "proving" this hypothesis
> impossible, but it's not a nice situation all the same, so I would
> like to make "as sure as possible" that the distribution is random.
>
> The part of the script in question works on a weekly basis,
> generating a different "duty person" for every day of a week. It
> gets any known "unavailability events" for the week from our
> calendar and filters the input list accordingly. Then, for every
> day, it divides the list of people who are available into two parts,
> those who did duty the week before (list A) and those who did not
> (list B). It then uses the amb operator to select a person for every
> day (separately), preferredly from list B but if necessary also from
> list A. In the end, it checks that the 5 people selected are unique,
> else it will backtrack.
>
> Now because it was clear that amb would backtrack in a foreseeable
> way, when the script starts it first shuffles the input list. For
> the shuffle, I am using the function P. Ragde posted here (seehttp://groups.google.com/group/plt-scheme/browse_thread/thread/2b0de444dc77e11f
> , at the end), which was said to have a good random distribution (I
> also can confirm this when I run it e.g. 1000 times in a row).
> However, in my script I run the shuffle only once per execution, and
> when I e.g. run the whole script 20 times and look who is first in
> the shuffled list, I get the following distribution (the numbers
> stand for the position of the collegue in the input list):
>
> 0 xxxxx
> 1 x
> 2 xxx
> 3 xx
> 4 xx
> 5 xx
> 6 x
> 7
> 8 x
>
> Unfortunately this looks like a correlation, and the aforementioned
> colleague is the one above :-;
>
> However, I do not understand this, as the mentioned shuffle function
> uses random, and as I read in the documentation random is seeded
> with current-milliseconds... So it really should get a different
> seed at every execution.
>
> I think that somewhere I must be making a mistake here... I would be
> very grateful for any hints.
> I attach a bit of the code, hopefully as a clarification of how the
> selection works, although it is not runnable without the rest and
> the data provider of course.
>
> Thanks a lot
> Sigrid
>
>
> (define *morning-duty-people* (shuffle '(AG MB SK MM AK RA AV CM HM)))
>
> (define make-morning-duty-list
> (lambda (start-date-string type)
> (with-output-to-file (string-append logs-dir (duty-type-logfile
> type)) #:exists 'append
> (lambda ()
> (printf "~nGoing to choose duty of type ~a for week starting
> from ~a~n" (duty-type-name type) start-date-string)
> (let ((from (string-to-date start-date-string)))
> (if (not (= (date-week-day from) 1))
> (error "Start date has to be a Monday.")
> (let ((to (get-date-for-interval from 4))
> (previous-from (get-date-for-interval from -7)))
> (let ((previous-to (get-date-for-interval previous-
> from 4)))
> (let ((known-events-list (get-events from to))
> (previous-week-list (get-events previous-
> from previous-to)))
> (let ((already-selected (extract-morning-duties
> known-events-list from to))
> (unavailables (extract-unavailables known-
> events-list from to))
> (previous-week-victims (flatten (extract-
> morning-duties previous-week-list previous-from previous-to))))
> (if (all-true? (map (lambda (days-list) (not
> (null? days-list))) already-selected))
> (let ((week-of-year (get-week-of-year
> from)))
> (printf "Already chosen for morning
> duty: ~a~n" (map car already-selected))
> (send-email *SERVER* *MAIL-TO* *SENDER*
> (if dry-run (format "DRY RUN: ~a ~a" (duty-type-mail-subject type)
> week-of-year) (format "~a ~a" (duty-type-mail-subject type) week-of-
> year)) (insert-variables (duty-type-mail-body-exists type) week-of-
> year (date-to-string from 'short) (date-to-string to 'short) (map
> car already-selected))))
> (let ((availables (map (lambda (lst)
> (complement *morning-duty-people* lst)) unavailables))
> (previous-week-spared (complement
> *morning-duty-people* previous-week-victims)))
> (when debug (printf "Going to determine
> morning duties for period from ~a to ~a:~nUnavailable are:
> ~s~nAlready entered are: ~s~nChoosable are: ~s~nPrevious week's
> victims were: ~s~n" (date->string from "~Y-~m-~d") (date->string to
> "~Y-~m-~d") unavailables already-selected availables previous-week-
> victims))
> (let ((possible-duties (choose-possible-
> duties availables already-selected previous-week-spared)))
> (when debug (printf "Choose-possible-
> duties output: ~s~n" possible-duties))
> (check-unique possible-duties)
> (when debug (printf "Unique duties
> are: ~s~n" possible-duties))
> (if dry-run
> (printf "Not sending events to
> calendar 'cause dry-run~n")
> (let ((result (build-morning-duty-
> events possible-duties from)))
> (let ((new-only (filter (lambda
> (event) (not (event-exists? known-events-list event))) result)))
> (store-events new-only))))
> (let ((week-of-year (get-week-of-year
> from)))
> (send-email *SERVER* *MAIL-TO*
> *SENDER* (if dry-run (format "DRY RUN: ~a ~a" (duty-type-mail-
> subject type) week-of-year) (format "~a ~a" (duty-type-mail-subject
> type) week-of-year)) (insert-variables (duty-type-mail-body type)
> week-of-year (date-to-string from 'short) (date-to-string to 'short)
> possible-duties))))))))))))))))
>
>
> (define choose-possible-duties
> (lambda (availables-list already-selected-list preferred-list)
> (when debug (printf "Choose-possible-duties called with
> input:~nAvailable are: ~s~nAlready registered are: ~s~n" availables-
> list already-selected-list))
> (let loop ((availables availables-list) (selected already-
> selected-list) (result '()))
> (if (null? availables)
> (reverse result)
> (let ((current-date-availables (car availables)) (current-
> date-selected (car selected)))
> (if (not (null? current-date-selected))
> (let ((already-entered (car current-date-selected)))
> (loop (cdr availables) (cdr selected) (cons already-
> entered result)))
> (let-values (((preferred&available
> availablebutrathernot) (partition (lambda (person) (member person
> preferred-list)) current-date-availables)))
> (let ((amb-result (grouped-list-amb
> preferred&available availablebutrathernot)))
> (loop (cdr availables) (cdr selected) (cons amb-
> result result))))))))))
>
>
>
> _________________________________________________
> For list-related administrative tasks:
> http://list.cs.brown.edu/mailman/listinfo/plt-scheme
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 203 bytes
Desc: This is a digitally signed message part
URL: <http://lists.racket-lang.org/users/archive/attachments/20090910/7eca8705/attachment.sig>