From: Will M Farr (farr at MIT.EDU) Date: Thu Sep 10 15:46:48 EDT 2009 Previous message: [plt-scheme] please help with generating random permutation Next message: [plt-scheme] Doc pointer? for foreign library memory management Messages sorted by: [date] [thread] [subject] [author]

```By the way, you might like to check out http://www.random.org .  I'm
pretty sure they have the facilities for this sort of drawing, and I
know they will provide a link where people can "observe" the results
of the drawing after-the-fact.  They use random atmospheric noise to
generate their random bits, which may help re-assure any skeptical
employees.

I second what was said about 20 samples being *way* too small for
testing whether the drawing is random.

Some math: the number of times a given person will be selected in this
type of drawing is probably Poisson-distributed.  In this case, with
20 drawings, and 8 people, the mean number of times somebody gets
selected is 20/8 ~ 2, and the standard deviation is sqrt(2).  Thus, in
your sample, number 0 has been drawn approximately 2 + 2*sqrt(2)
times, which is a "2-sigma" deviation, and is expected to occur about
5% of the time.  Over 8 different people, one would expect one 2-sigma
deviation in the group about 45% of the time.  All others people are
within one sigma of the mean number of drawings, which is also
typical.  All this is to say: the results you show for 20 drawings are
*completely* consistent with the shuffle being random.

Now, for 10000 drawings, you would expect that each person would be
~ 30.  So, anybody appearing in the list more than 1100, or fewer than
900 times would represent a "3-sigma" deviation (which has about one
chance in 10 to occur "naturally"), and deviations much larger than
this would imply that there is, in fact, a problem with the shuffle
algorithm.  For 10^6 drawings, the situation is even better;
generally, if the expected number of appearances of a person is N, the
standard deviation of this number is sqrt(N), so the fractional change
from one standard deviation is sqrt(N)/N = 1/sqrt(N), so more samples
decreases the relative amount of fluctuation expected by sqrt(N).

So, give it more draws before you conclude that there is a problem
with the shuffle.

Will

P.S.---This might give you pause about how "fair" a random drawing can
be.  Over one year of 250 working days, the typical variation in days
worked from person to person will be 16, or 6%!  I don't think I would
like a "random" 6% variation in my salary, for example....

On Sep 10, 2009, at 1:49 PM, keydana at gmx.de wrote:

> Hi,
>
> I am in a rather uncomfortable situation after writing a script to
> select people for the (not too popular) rotating job of "duty
> person" in our department...
> One colleague complains he is always chosen on Monday. Up till now
> the sample is very small, which would make "proving" this hypothesis
> impossible, but it's not a nice situation all the same, so I would
> like to make "as sure as possible" that the distribution is random.
>
> The part of the script in question works on a weekly basis,
> generating a different "duty person" for every day of a week. It
> gets any known "unavailability events" for the week from our
> calendar and filters the input list accordingly. Then, for every
> day, it divides the list of people who are available into two parts,
> those who did duty the week before (list A) and those who did not
> (list B). It then uses the amb operator to select a person for every
> day (separately), preferredly from list B but if necessary also from
> list A. In the end, it checks that the 5 people selected are unique,
> else it will backtrack.
>
> Now because it was clear that amb would backtrack in a foreseeable
> way, when the script starts it first shuffles the input list. For
> , at the end), which was said to have a good random distribution (I
> also can confirm this when I run it e.g. 1000 times in a row).
> However, in my script I run the shuffle only once per execution, and
> when I e.g. run the whole script 20 times and look who is first in
> the shuffled list, I get the following distribution (the numbers
> stand for the position of the collegue in the input list):
>
> 0 xxxxx
> 1 x
> 2 xxx
> 3 xx
> 4 xx
> 5 xx
> 6 x
> 7
> 8 x
>
> Unfortunately this looks like a correlation, and the aforementioned
> colleague is the one above :-;
>
> However, I do not understand this, as the mentioned shuffle function
> uses random, and as I read in the documentation random is seeded
> with current-milliseconds... So it really should get a different
> seed at every execution.
>
> I think that somewhere I must be making a mistake here... I would be
> very grateful for any hints.
> I attach a bit of the code, hopefully  as a clarification of how the
> selection works, although it is not runnable without the rest and
> the data provider of course.
>
> Thanks a lot
> Sigrid
>
>
> (define *morning-duty-people* (shuffle '(AG MB SK MM AK RA AV CM HM)))
>
> (define make-morning-duty-list
>  (lambda (start-date-string type)
>     (with-output-to-file (string-append logs-dir (duty-type-logfile
> type)) #:exists 'append
>       (lambda ()
>         (printf "~nGoing to choose duty of type ~a for week starting
> from ~a~n" (duty-type-name type) start-date-string)
>         (let ((from (string-to-date start-date-string)))
>           (if (not (= (date-week-day from) 1))
>               (error "Start date has to be a Monday.")
>               (let ((to (get-date-for-interval from 4))
>                     (previous-from (get-date-for-interval from -7)))
>                 (let ((previous-to (get-date-for-interval previous-
> from 4)))
>                   (let ((known-events-list (get-events from to))
>                         (previous-week-list (get-events previous-
> from previous-to)))
> known-events-list from to))
>                           (unavailables (extract-unavailables known-
> events-list from to))
>                           (previous-week-victims (flatten (extract-
> morning-duties previous-week-list previous-from previous-to))))
>                       (if (all-true? (map (lambda (days-list) (not
>                           (let ((week-of-year (get-week-of-year
> from)))
>                             (printf "Already chosen for morning
> duty: ~a~n"  (map car already-selected))
>                             (send-email *SERVER* *MAIL-TO* *SENDER*
> (if dry-run (format "DRY RUN: ~a ~a" (duty-type-mail-subject type)
> week-of-year) (format "~a ~a" (duty-type-mail-subject type) week-of-
> year)) (insert-variables (duty-type-mail-body-exists type) week-of-
> year (date-to-string from 'short) (date-to-string to 'short) (map
>                           (let ((availables (map (lambda (lst)
> (complement *morning-duty-people* lst)) unavailables))
>                                 (previous-week-spared (complement
> *morning-duty-people* previous-week-victims)))
>                             (when debug (printf "Going to determine
> morning duties for period from ~a to ~a:~nUnavailable are:
> ~s~nAlready entered are: ~s~nChoosable are: ~s~nPrevious week's
> victims were: ~s~n" (date->string from "~Y-~m-~d") (date->string to
> "~Y-~m-~d") unavailables already-selected availables previous-week-
> victims))
>                             (let ((possible-duties (choose-possible-
>                               (when debug (printf "Choose-possible-
> duties output: ~s~n" possible-duties))
>                               (check-unique possible-duties)
>                               (when debug (printf "Unique duties
> are: ~s~n" possible-duties))
>                               (if dry-run
>                                   (printf "Not sending events to
> calendar 'cause dry-run~n")
>                                   (let ((result (build-morning-duty-
> events possible-duties from)))
>                                     (let ((new-only (filter (lambda
> (event) (not (event-exists? known-events-list event))) result)))
>                                       (store-events new-only))))
>                               (let ((week-of-year (get-week-of-year
> from)))
>                                 (send-email *SERVER* *MAIL-TO*
> *SENDER* (if dry-run (format "DRY RUN: ~a ~a" (duty-type-mail-
> subject type) week-of-year) (format "~a ~a" (duty-type-mail-subject
> type) week-of-year)) (insert-variables (duty-type-mail-body type)
> week-of-year (date-to-string from 'short) (date-to-string to 'short)
> possible-duties))))))))))))))))
>
>
> (define choose-possible-duties
>    (when debug (printf "Choose-possible-duties called with
> input:~nAvailable are: ~s~nAlready registered are: ~s~n" availables-
>    (let loop ((availables availables-list) (selected already-
> selected-list) (result '()))
>      (if (null? availables)
>          (reverse result)
>          (let ((current-date-availables (car availables)) (current-
> date-selected (car selected)))
>            (if (not (null? current-date-selected))
>                  (loop (cdr availables) (cdr selected) (cons already-
> entered result)))
>                (let-values (((preferred&available
> availablebutrathernot) (partition (lambda (person) (member person
> preferred-list)) current-date-availables)))
>                  (let ((amb-result (grouped-list-amb
> preferred&available availablebutrathernot)))
>                    (loop (cdr availables) (cdr selected) (cons amb-
> result result))))))))))
>
>
>
> _________________________________________________