<br><br><div class="gmail_quote">On Sun, Nov 18, 2012 at 4:24 PM, Pierpaolo Bernardi <span dir="ltr"><<a href="mailto:olopierpa@gmail.com" target="_blank">olopierpa@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
How does compare to builtin mutable hashes?<br></blockquote><div><br></div><div> </div><div>The following code represents a rough hashtable equivalent of what my rb code would be enabling (quick search for word by position):</div>
<div><br></div><div><br></div><div><div> ;; We might be curious as to the overhead of the tree structure.</div><div> ;; (of course, it's worth it because we are dealing with a dynamic set here.)</div><div> ;; Still, let's compare with inserting into a native hash:</div>
<div> (printf "just for comparison: inserting in a native hash:\n")</div><div> (let ([ht (make-hash)])</div><div> (time</div><div> (for/fold ([acc 0]) ([word (in-list (force all-words))])</div>
<div> (hash-set! ht acc word)</div><div> (+ acc (string-length word)))))</div></div><div><br></div><div><br></div><div>It's also useful to compare this vs the existing splay tree approach in syntax-color/token-tree:</div>
<div><br></div><div><div> (printf "just for comparison: inserting in the original token tree:\n")</div><div> (let ([t (new token-tree%)])</div><div> (time</div><div> (for ([word (in-list (force all-words))])</div>
<div> (insert-last-spec! t (string-length word) word))))</div></div><div><br></div><div><br></div><div>Here's the output of the insertion benchmark:</div><div><br></div><div> ;; (from the rb-tree insertion)</div>
<div> inserting 235886 words at the end...</div><div><div> cpu time: 204 real time: 205 gc time: 0</div><div><br></div><div> just for comparison: inserting in a native hash:</div><div> cpu time: 108 real time: 107 gc time: 0</div>
<div><br></div></div><div><div> just for comparison: inserting in the original token tree:</div><div> cpu time: 51 real time: 51 gc time: 0</div></div><div><br></div><div><br></div><div>So compared to the rb-tree version, insertions into the hashtable are about twice as fast. And as one might expect, the splay tree bulk insertion is the fastest: it doesn't deal with balance at insertion time and can it delay that work until we start searching the structure.</div>
<div><br></div><div>The rb-tree (as well as the original splay code) allows for much more flexible searches and dynamic updates into the sequence structure than the hash, so it's definitely worth the extra complexity. My conjecture is that the non-allocating nature of the rb-tree, as well as its always-balanced structure, will be worth the cost of the extra insertion time vs splays. I just hope I can show it! :) I'll see tomorrow when I code up a token-tree% implementation and start measuring times in DrRacket.</div>
<div><br></div><div>I just got done with the fundamental rb-tree data structures this evening. Thank goodness Robby strongly recommended me to write a randomized testing approach. It caught a lot of things I hadn't even considered.</div>
</div>