[racket-dev] Replacing the splay tree token-tree% with an rb-tree?

From: Danny Yoo (dyoo at hashcollision.org)
Date: Tue Nov 13 18:21:41 EST 2012

Hi Robby,

I've been looking at a profile of DrRacket when opening/indenting large
files.  Using the statistical profiler, I've measured a large chunk of time
(around 40%) is dedicated to the search operation of the existing
syntax-color token-tree.  That seems too high, so I've been trying to
understand what's going on there.

I suspect that the access pattern that's applied on the tree during
indentation resembles the worst-case scenario of splay trees: if we search
each node in sequential order, the structure of a splay tree will
linearize, ruining the dynamic balance.  Indentation uses a repeated,
sequential search across the tokens.

Concretely, if I add the following diagnostic to the token-tree% class:

--------------------------------------------------
+  (define (height node)
+    (cond
+     ((not node) 0)
+     (else
+      (add1 (max (height (node-left node))
+                 (height (node-right node)))))))
+
       ;; Moves the node at key-position to the root
       (define/public (search! key-position)
         (when root
-          (set! root (internal-search root key-position))))
+          (set! root (internal-search root key-position))
+          (printf "debug: size: ~a, height: ~a\n" (size root 0) (height
root)))
+
--------------------------------------------------

and open up a large file like collects/drracket/private/unit.rkt, I can see
that the tree is much deeper than it deserves to be.  During a tabify-all,
the tree does appear to get huge; it swings between a height in the
hundreds into the thousands thousands.  That's much too tall, considering
that there are only about 40000 tokens in the tree.


I believe that a balanced BST should give us better performance.  I have
been writing an alternative implementation of the token tree based on
red-black trees.

    https://github.com/dyoo/new-token-tree

It is unfortunately missing the necessary splitting operations.  But I plan
to implement split and concat with the algorithm described in:

    http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.109.4875

so I don't anticipate fundamental problems.


I wanted to surprise you by getting this all working by mid-week, but it's
taking longer than I thought... :)  So I might as well run it by you to
make sure the idea is sound before I go further on this track.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.racket-lang.org/dev/archive/attachments/20121113/9cf38bbf/attachment.html>

Posted on the dev mailing list.