[racket-dev] Extflonum type for windows

From: Matthew Flatt (mflatt at cs.utah.edu)
Date: Mon Mar 18 10:53:01 EDT 2013

At Mon, 4 Mar 2013 19:06:32 +0300, Michael Filonenko wrote:
> The following pull request provides long double type (extflonum) on
> win32: https://github.com/plt/racket/pull/265

Merged --- with some changes, as usual...

> It seems that RacketCGC is supposed to be built without any
> third-party DLLs (longdouble.dll being one of them), so the
> following building process seems natural: [...]

I changed the way that "longdouble.dll" is loaded and linked so that
`extflonum-available?' returns #f if "longdouble.dll" isn't found.
Since extflonums are not needed to build Racket 3m, that solves the
build-order problem.

> 1. There is currently a problem with the code generation in
> foreign.rktc. [...]
> Probably there is a need for an option in foreign.rktc declaratons
> to turn off casting in this partuicular case. Matthew, it is possible
> to add such an option?

Yes, done.

> 2. xform.rkt contains a long list of all long double arithmetic
> [non-]functions.

Instead of changing "xform.rkt", I added XFORM_NONGCING annotations
to the function prototypes.

> 3. In numstr.c I could not avoid separating the parsing of
> double and long double values.

That looks ok.


Another problem is that Windows seems unhappy with changing the
precision mode. The _control87() function apparently ignores an attempt
to change the mode, and when I set the mode using a FLDCW instruction,
some library function resets it back.

After a brief and unsuccessful attempt to track down where the mode is
reset, I changed the DLL and JIT to set the mode just before performing
extflonum arithmetic, and then set it back afterward.

Of course, there can be a cost to changing the mode at such a fine
granularity. When I run the first program below in Mac OS X 64-bit
mode, I get

 'flonum
 cpu time: 483 real time: 483 gc time: 0
 1.0
 cpu time: 474 real time: 474 gc time: 0
 1.0
 cpu time: 789 real time: 787 gc time: 0
 1.0
 'extflonum
 cpu time: 641 real time: 640 gc time: 0
 1.0t0
 cpu time: 885 real time: 884 gc time: 0
 1.0t0
 cpu time: 959 real time: 958 gc time: 0
 1.0t0

but if I force the JIT to set the control word on every extflonum
operation, I get

 ....
 'extflonum
 cpu time: 806 real time: 806 gc time: 0
 1.0t0
 cpu time: 1054 real time: 1053 gc time: 0
 1.0t0
 cpu time: 959 real time: 957 gc time: 0
 1.0t0

It looks like division is slow enough to mask the overhead of setting
the mode. For addition and subtraction, the overhead seems to be on the
order of the cost of switching flonums to extflonums.

The JIT could be improved to avoid switching between consecutive
operations, but does the cost of this approach look reasonable as a
start?

----------------------------------------

#lang racket
(require racket/flonum
         racket/extflonum)

'flonum
(time
 (for/fold ([v 1.0]) ([i (in-range 100000000)])
   (fl- (fl+ v v) v)))
(time
 (for/fold ([v 1.0]) ([i (in-range 100000000)])
   (fl- (fl+ v v) 1.0)))
(time
 (for/fold ([v 1.0]) ([i (in-range 100000000)])
   (fl/ (fl* v v) v)))

'extflonum
(time
 (for/fold ([v 1.0t0]) ([i (in-range 100000000)])
   (extfl- (extfl+ v v) v)))
(time
 (for/fold ([v 1.0t0]) ([i (in-range 100000000)])
   (extfl- (extfl+ v v) 1.0t0)))
(time
 (for/fold ([v 1.0t0]) ([i (in-range 100000000)])
   (extfl/ (extfl* v v) v)))


Posted on the dev mailing list.