[plt-scheme] more possible planet functions

From: Joshua Zucker (joshua.zucker at gmail.com)
Date: Wed Feb 7 19:48:33 EST 2007

John and all,
Could you recommend some good reading on breadth-first searches for
me, or good references on implementations of them in DrScheme?

One problem I'm currently working on seems to require more about BFS
than I know or understand.

Specifically: the breadth of the full search is WAY too big (factorial
size in n, with n arbitrarily large ...).

So, to search for "pretty good" nodes of the tree, my idea is to keep
the best k at each step and search one more node from there, then keep
the best k of those, and so on until eventually I've found the best
items I can.

But if I allow the breadth to get too big (with the sorting needed to
keep the best k -- or is there a smarter way to keep the best k
instead of sorting and then keeping the first k?  See, I'm lazy, and
sort! is built-in which sounded like a good idea for memory use) ...
well, anyway, when the breadth gets too big it starts spending 90+% of
its time on swapping instead of actually using the CPU to evaluate
more nodes of the search tree.

More specifically:
I have a moderately-cheap "rough guess at quality" function.
I have a tree where I start with some number of possible start-nodes
(there are O(n!) of them too, for n pretty big, but there are some
that are clearly a lot better than others so it's easy to
heuristically pick the good ones here).
Then each node has O(n) descendants, roughly speaking.
My goal is to find the "farthest' node in the tree -- that is, the one
that's the most generations from one of the starting points (so I
guess it's really a forest, not a tree ... but I can pick just one
starting point at a time).
So I pretty quickly get up to several thousand nodes in my search as
I'm hunting .. then I keep the "best' several thousand (based on the
rough guess at quality), and go one more node from there, and repeat.
Eventually the branching factor starts to die down and things go a
little faster, and ultimately I'm left with just one "deepest" node.

But I know I'm not finding the optimal spot.  One approach is to work
on improving my rough guess at quality, but my attempts there have led
nowhere -- in fact to fancier, slower, and less effective guesses.

So the other approach is to understand BFS better -- how to keep more
than just several thousand, say several million, without dying in an
agony of swap.  Or ways of dividing up the search in other ways ...

I'm sure I can get O(1) improvements in lots of ways, like the data
structure I use to store each node (right now a vector of small
integers, but I bet it uses less memory if it were bytes instead of
vector, for example?) but I think I can get more improvement from
doing smarter things than sorting and keeping the first k at each step
down.  But I'm too ignorant to know what those things ought to be.

Fortunately, ignorance is curable!

--Joshua Zucker

Posted on the users mailing list.