[racket-dev] Extflonum type for windows
Matthew, thank you very much.
It seems that your changes with precision switching on every function
call is not required on 32-bit windows. I have prepared a little pull
request that fixes it. All tests on my VirtualBox machines
(both 32-bit and 64-bit) pass. Our own tests related to ffi pass too.
https://github.com/plt/racket/pull/280
Small note about extflonum ffi on win platforms:
It is possible to use long double on win platforms with gcc (mingw,
mingw-w64) compiler. It is also possible to use compiled DLL with
Racket FFI. The only one note is that long double data must be 16-byte
aligned, therefore you should use gcc command line option
-m128bit-long-double on win32 platform. On win64 platform aligning is
16 byte by default.
Please let me know if I can help further with testing or documentation.
2013/3/18 Matthew Flatt <mflatt at cs.utah.edu>:
> At Mon, 4 Mar 2013 19:06:32 +0300, Michael Filonenko wrote:
>> The following pull request provides long double type (extflonum) on
>> win32: https://github.com/plt/racket/pull/265
>
> Merged --- with some changes, as usual...
>
>> It seems that RacketCGC is supposed to be built without any
>> third-party DLLs (longdouble.dll being one of them), so the
>> following building process seems natural: [...]
>
> I changed the way that "longdouble.dll" is loaded and linked so that
> `extflonum-available?' returns #f if "longdouble.dll" isn't found.
> Since extflonums are not needed to build Racket 3m, that solves the
> build-order problem.
>
>> 1. There is currently a problem with the code generation in
>> foreign.rktc. [...]
>> Probably there is a need for an option in foreign.rktc declaratons
>> to turn off casting in this partuicular case. Matthew, it is possible
>> to add such an option?
>
> Yes, done.
>
>> 2. xform.rkt contains a long list of all long double arithmetic
>> [non-]functions.
>
> Instead of changing "xform.rkt", I added XFORM_NONGCING annotations
> to the function prototypes.
>
>> 3. In numstr.c I could not avoid separating the parsing of
>> double and long double values.
>
> That looks ok.
>
>
> Another problem is that Windows seems unhappy with changing the
> precision mode. The _control87() function apparently ignores an attempt
> to change the mode, and when I set the mode using a FLDCW instruction,
> some library function resets it back.
>
> After a brief and unsuccessful attempt to track down where the mode is
> reset, I changed the DLL and JIT to set the mode just before performing
> extflonum arithmetic, and then set it back afterward.
>
> Of course, there can be a cost to changing the mode at such a fine
> granularity. When I run the first program below in Mac OS X 64-bit
> mode, I get
>
> 'flonum
> cpu time: 483 real time: 483 gc time: 0
> 1.0
> cpu time: 474 real time: 474 gc time: 0
> 1.0
> cpu time: 789 real time: 787 gc time: 0
> 1.0
> 'extflonum
> cpu time: 641 real time: 640 gc time: 0
> 1.0t0
> cpu time: 885 real time: 884 gc time: 0
> 1.0t0
> cpu time: 959 real time: 958 gc time: 0
> 1.0t0
>
> but if I force the JIT to set the control word on every extflonum
> operation, I get
>
> ....
> 'extflonum
> cpu time: 806 real time: 806 gc time: 0
> 1.0t0
> cpu time: 1054 real time: 1053 gc time: 0
> 1.0t0
> cpu time: 959 real time: 957 gc time: 0
> 1.0t0
>
> It looks like division is slow enough to mask the overhead of setting
> the mode. For addition and subtraction, the overhead seems to be on the
> order of the cost of switching flonums to extflonums.
>
> The JIT could be improved to avoid switching between consecutive
> operations, but does the cost of this approach look reasonable as a
> start?
>
> ----------------------------------------
>
> #lang racket
> (require racket/flonum
> racket/extflonum)
>
> 'flonum
> (time
> (for/fold ([v 1.0]) ([i (in-range 100000000)])
> (fl- (fl+ v v) v)))
> (time
> (for/fold ([v 1.0]) ([i (in-range 100000000)])
> (fl- (fl+ v v) 1.0)))
> (time
> (for/fold ([v 1.0]) ([i (in-range 100000000)])
> (fl/ (fl* v v) v)))
>
> 'extflonum
> (time
> (for/fold ([v 1.0t0]) ([i (in-range 100000000)])
> (extfl- (extfl+ v v) v)))
> (time
> (for/fold ([v 1.0t0]) ([i (in-range 100000000)])
> (extfl- (extfl+ v v) 1.0t0)))
> (time
> (for/fold ([v 1.0t0]) ([i (in-range 100000000)])
> (extfl/ (extfl* v v) v)))
>
>
--
With best regards, Michael Filonenko