[racket] help me speed up string split?

From: Ryan Davis (zenspider at gmail.com)
Date: Wed Jun 18 00:27:54 EDT 2014

I've got the following code that is stumping me. My original code was in ruby and only took 3.9 seconds. I tried to write the equivalent in racket and was surprised when it came in at 25 seconds. Got some help in IRC and got it down to ~12 seconds by cheating using read. I'm guessing I'm missing something. All timing was done in emacs w/ racket-mode's repl. I figure variations < 2s are due to GC or trivial differences. Suggestions?

#lang racket

(require (only-in 2htdp/batch-io read-words))

(define (fast-read path)
  (with-input-from-file path (lambda () (port->string))))

(define (fast-bytes path)
  (with-input-from-file path (lambda () (port->bytes))))

(define path (path->string (expand-user-path "~/Desktop/X_train.txt")))

(time (take (map string->number (read-words path)) 10))                                ; 25154 ms
(time (take (map string->number (string-split (fast-read path))) 10))                  ; 27195 ms
(time (take (read (open-input-string (string-append "(" (fast-read path)   ")"))) 10)) ; 13683 ms
(time (take (read (open-input-bytes  (bytes-append #"(" (fast-bytes path) #")"))) 10)) ; 11930 ms

;; 66,006,256 bytes in the file
(string-length (fast-read path))
;; 4,124,472 floats in ascii format
(length (read (open-input-bytes (bytes-append #"(" (fast-bytes path) #")"))))
;; File contents look like:
;;  2.8858451e-001 -2.0294171e-002 -1.3290514e-001 -9.9527860e-001 -9.8311061e-001

;; chicken
;; (require 'utils)
;; (time (length (read (open-input-string (string-append "(" (read-all path) ")")))))
;; 11.9 s

;; ruby
;; % time ruby -e 'p File.read("X_train.txt").split(/\s+/).map(&:to_f).size'
;; real 0m3.882s



Posted on the users mailing list.