[racket-dev] Replacing the splay tree token-tree% with an rb-tree?

From: Robby Findler (robby at eecs.northwestern.edu)
Date: Tue Nov 13 22:12:11 EST 2012

You know I'm reminded that I did some (simpler) experiments on those
trees too and found that hundreds of thousands of cons cells (I think,
assuming I didn't count wrong) were being implemented when you put the
insertion point just before the first double quote character in
unit.rkt and then type an open quote (and then wait for the colorer to
finish (check the parens in the bottom right of the drracket window)
and over and over).

That's probably another stressful type thing to try for performance evaluation.

Robby

On Tue, Nov 13, 2012 at 5:21 PM, Danny Yoo <dyoo at hashcollision.org> wrote:
> Hi Robby,
>
> I've been looking at a profile of DrRacket when opening/indenting large
> files.  Using the statistical profiler, I've measured a large chunk of time
> (around 40%) is dedicated to the search operation of the existing
> syntax-color token-tree.  That seems too high, so I've been trying to
> understand what's going on there.
>
> I suspect that the access pattern that's applied on the tree during
> indentation resembles the worst-case scenario of splay trees: if we search
> each node in sequential order, the structure of a splay tree will linearize,
> ruining the dynamic balance.  Indentation uses a repeated, sequential search
> across the tokens.
>
> Concretely, if I add the following diagnostic to the token-tree% class:
>
> --------------------------------------------------
> +  (define (height node)
> +    (cond
> +     ((not node) 0)
> +     (else
> +      (add1 (max (height (node-left node))
> +                 (height (node-right node)))))))
> +
>        ;; Moves the node at key-position to the root
>        (define/public (search! key-position)
>          (when root
> -          (set! root (internal-search root key-position))))
> +          (set! root (internal-search root key-position))
> +          (printf "debug: size: ~a, height: ~a\n" (size root 0) (height
> root)))
> +
> --------------------------------------------------
>
> and open up a large file like collects/drracket/private/unit.rkt, I can see
> that the tree is much deeper than it deserves to be.  During a tabify-all,
> the tree does appear to get huge; it swings between a height in the hundreds
> into the thousands thousands.  That's much too tall, considering that there
> are only about 40000 tokens in the tree.
>
>
> I believe that a balanced BST should give us better performance.  I have
> been writing an alternative implementation of the token tree based on
> red-black trees.
>
>     https://github.com/dyoo/new-token-tree
>
> It is unfortunately missing the necessary splitting operations.  But I plan
> to implement split and concat with the algorithm described in:
>
>     http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.109.4875
>
> so I don't anticipate fundamental problems.
>
>
> I wanted to surprise you by getting this all working by mid-week, but it's
> taking longer than I thought... :)  So I might as well run it by you to make
> sure the idea is sound before I go further on this track.

Posted on the dev mailing list.