[racket-dev] gc vs assignment
Here's a program that tries to expose various costs.
On my machine, the output is:
'cons-of-cXr+barrier-set!
cpu time: 13137 real time: 13206 gc time: 552
'cons-of-cXr+free-set!
cpu time: 12832 real time: 12995 gc time: 541
'cons-of-cXr
cpu time: 10023 real time: 10103 gc time: 526
'cons-of-unsafe-cXr
cpu time: 9363 real time: 9456 gc time: 514
'cons-of-consts
cpu time: 9026 real time: 9115 gc time: 538
'set-mcXr-of-mcXr
cpu time: 7777 real time: 7772 gc time: 0
'set-mcXr-of-unsafe-mcXr
cpu time: 6855 real time: 6852 gc time: 0
'set-mcXr-of-const
cpu time: 5630 real time: 5650 gc time: 0
'unsafe-set-mcXr-of-const
cpu time: 4637 real time: 4634 gc time: 0
'nothing
cpu time: 3715 real time: 3713 gc time: 0
My overall conclusions:
* When the GC is well-tuned, it's difficult to slow a program down by
using mutation --- especially relative to all the other ways you can
slow a program down.
* If the mutation versus allocation choice is about arriving at the
same data structure in the end, mutation tends to win. Even if
allocation were completely free, you can usually mutate to the
desired structure by writing fewer bytes than if you have to
initialize an object from scratch.
I tried experimenting with other systems, and to the best of my ability
to try them, I didn't find significantly different results.
So, I don't think you have to worry about especially counter-intuitive
performance effects from mutation. (There are plenty of worries left
for mutation, of course.)
Meanwhile, our GC could probably auto-tune better; try commenting out
the definition of `space'. The default configuration is really tuned
for a larger old-generation heap.
----------------------------------------
#lang racket/base
(require ffi/unsafe
racket/unsafe/ops)
;; Effectively tunes the GC to have a larger nursery:
(define space (make-bytes (* 32 (expt 2 20))))
(define iters 1000000000)
(define (make-barrier-bytes len)
(malloc len 'nonatomic))
(define-syntax-rule (timeit expr)
(begin
(collect-garbage)
(collect-garbage)
(time (void expr))))
(define bsize 4096)
;; Try to expose the cost of a write barrier, including its
;; effect on GC time:
'cons-of-cXr+barrier-set!
(timeit
(let ([b (make-barrier-bytes bsize)])
(for/fold ([l (cons #f #f)]) ([i (in-range iters)])
(unsafe-bytes-set! b 0 10)
(cons (car l) (cdr l)))))
;; For comparion with the first one, write to a normal byte string
;; that holds atomic data (and therefore doesn't trigger a write barrier)
'cons-of-cXr+free-set!
(timeit
(let ([b (make-bytes bsize)])
(for/fold ([l (cons #f #f)]) ([i (in-range iters)])
(unsafe-bytes-set! b 0 10)
(cons (car l) (cdr l)))))
;; Allocate on every iteration, also using checked car & cdr
'cons-of-cXr
(timeit
(for/fold ([l (cons #f #f)]) ([i (in-range iters)])
(cons (car l) (cdr l))))
;; For compariosn, use unchecked car & cdr
'cons-of-unsafe-cXr
(timeit
(for/fold ([l (cons #f #f)]) ([i (in-range iters)])
(cons (unsafe-car l) (unsafe-cdr l))))
;; For comparison, used constants instead of looking in a pair
'cons-of-consts
(timeit
(for/fold ([l (cons #f #f)]) ([i (in-range iters)])
(cons 1 2)))
;; Mutate the same cell every time
'set-mcXr-of-mcXr
(timeit
(for/fold ([m (mcons 1 2)]) ([i (in-range iters)])
(set-mcar! m (mcdr m))
(set-mcdr! m (mcar m))
m))
;; For comparion, extarct new cell values using unchecked ops
'set-mcXr-of-unsafe-mcXr
(timeit
(for/fold ([m (mcons 1 2)]) ([i (in-range iters)])
(set-mcar! m (unsafe-mcdr m))
(set-mcdr! m (unsafe-mcar m))
m))
;; For comparison, use constants for new values
'set-mcXr-of-const
(timeit
(for/fold ([m (mcons 1 2)]) ([i (in-range iters)])
(set-mcar! m 1)
(set-mcdr! m 2)
m))
;; For comparison, use unchecked mutation
'unsafe-set-mcXr-of-const
(timeit
(for/fold ([m (cons 1 2)]) ([i (in-range iters)])
(unsafe-set-mcar! m 1)
(unsafe-set-mcdr! m 2)
m))
;; For comparsion, do nothing
'nothing
(timeit
(for/fold ([m (mcons 1 2)]) ([i (in-range iters)])
m))