[racket-dev] long double for racket

From: Matthew Flatt (mflatt at cs.utah.edu)
Date: Sun Dec 23 08:10:14 EST 2012

Thanks!

I can work with this, though I'd like to start merging after the next
release branch on January 7.

Much of the cut-and-paste is difficult to abstract over, but I worry
about the amount of cut-and-paste in the JIT. The inlined arithmetic
functions, like scheme_generate_arith(), seem like too much to
duplicate. I'm willing to work on replacing cut-and-paste with
abstraction when I merge the changes, but anything you can do to reduce
the cut-and-paste would be appreciated.

I don't think we should use "l" in literal numbers to mean extfl,
because those literals are currently numbers. In other words, using "l"
for extfls would be a backward-incompatible change. I think we may need
to leave "l" as `double' and pick a different letter to indicate `long
double'.

At Sat, 22 Dec 2012 20:24:15 +0300, Michael Filonenko wrote:
> Hi all.
> 
> Modern FPUs can accelerate three types of floating-point arithmetic:
> single (32 bit), double (64 bit), long double (80 bit).
> 
> Currently Racket supports single and double precisions (flonum) and
> is able to JIT operations on them.
> 
> The task that is currently being done is adding long double type
> (hereinafter extflonum) arithmetic into racket, along with the
> corresponding vector type (hereinafter extflvector).
> 
> Here we go:
> 
> "Long double" requires modification of three racket parts:
>   - C core;
>   - JIT;
>   - Racket library.
> 
> Also, long double arithmetic requires setting "extended mode" flag on
> FPU, which forces the FPU to use 80-bit registers. The side effect on
> that flag is that the FPU gives slightly different (more accurate, but
> not IEEE-compliant) results for 64-bit operations. That is usually
> not a problem on machines who have SSE2 (introduced in Pentium 4 in
> 2001). In presense of SSE2, Racket performs 64-bit operations solely
> on 64-bit SSE2 registers (see MZ_USE_JIT_SSE and --mfpmath=sse), so the
> results are IEEE-compliant. 80-bit operations are done on FPU anyway
> as SSE2 can not do them. Therefore, by setting the "extended mode" on
> FPU, we introduce a subtle difference in ordinary flonums, but only on
> old machines that do not have SSE2. Also, on PowerPC machines
> the whole thing with extflonums will be canceled because they
> do not have 80-bit registers.
> 
> As for the C core: extended floating-point arithmetic is supported by
> gcc compiler on Linux, so build scripts for Linux does not require any
> change. Windows is another story. MSVC, commonly used to build Racket
> for Windows, does not support anything besides double precision. So we
> are forced to use gcc for Windows build, too. Cygwin's gcc is not a
> good option for us, because it denies the opportunity to use standard
> Windows GUI libraries etc. The other options are mingw (32-bit only)
> and mingw-w64 (both 32 and 64 bit). Many thanks to Matthew Flatt for
> his effort to port Racket to mingw.
> (Yet another option is Intel compiler, but I have not looked into it yet.)
> 
> Extflonums are tested on Linux x86_64, and Windows 7 x86 (VirtualBox).
> 
> I try to keep my modifications separate from other code. That requires
> much copy-paste, but hopefully makes my code easier to understand.
> 
> === Miscellaneous notes:
> 
> * Extflonum has text representation with "l" suffix
> (similar to "f" suffix for single flonums).
> 
> * Extflonum is not integrated into existing racket arithmetic (so, (+
> 123.0l0 513.0l0) is not possible). Extflonums have their own set of
> functions: extfl+, extfl-, extfl*, unsafe-extfl+, unsafe-extfl-, etc
> (similarly to flonums). The only Racket functions that were modified
> are reader, printer, and "equal?" (see below).
> 
> * The macro that guards the extflonum code is MZ_LONG_DOUBLE.
> The config definition is MZ_USE_LONG_DOUBLE, which enables MZ_LONG_DOUBLE.
> The configuration scripts were not modified.
> 
> * The macros MZ_LONG_DOUBLE_DISABLED and USE_EXTFLONUM_UNBOXING should be
> undefined, these are for unbox optimization, which will be in future.
> 
> Changes:
> 
> * C core was extended with following types and constants:
>   C structs:
>     Scheme_Long_Double (extflonum)
>     Scheme_Long_Double_Vector (extflvector)
>   constants:
>     scheme_long_double_type
>     scheme_extflvector_type
> 
> * Racket reader and printer were modified for reading extflonums (with
> suffix "l0"). Racket printer was modified for printing extflvectors
> (with "#extfl" prefix). Racket "equal?" function was modified to
> support extflonums (the purpose of doing that is that I needed
> rackunit to work with extflonums).
> 
> * xform compiler was extended for handling long double functions,
> such as cosl, sinl, floorl, etc.
> 
> * GNU lightning was extended with explicit jit fpu operations: fp-extfpu.h
> 
> * Racket collections was extended with racket/extflonum.rkt module,
> which exports both safe and unsafe functions for extflonums and
> extflvectors.
> 
> === Notes on JIT changes
> 
> JIT contains two optimization for flonums.
>   First is compiling racket code with inlined flonum functions.
>   Second is unboxing flonums to temporary storage when it
>   is possible, avoiding overhead with Scheme_Double object.
> 
> I have added only the first optimization for extflonum, by
> copy-pasting and modifying the original flonum
> code. Unboxing extflonums is not implemented yet.
> 
> long double is aligned on 12 bytes on x86 and on 16 bytes on x86_64.
> That is important for vector accessors generated by JIT for extflvectors.
> It is implemented by the following code
> 
>   #ifdef MZ_LONG_DOUBLE
>   # ifdef MZ_USE_JIT_X86_64
>   #  define JIT_LOG_LONG_DOUBLE_SIZE 4
>   #  define JIT_LONG_DOUBLE_SIZE (1 << JIT_LOG_LONG_DOUBLE_SIZE)
>   # else
>   #  define JIT_LOG_LONG_DOUBLE_SIZE not_implemented
>   #  define JIT_LONG_DOUBLE_SIZE 12
>   #endif
> 
> So that the jit code generation is:
> 
>   #ifdef MZ_USE_JIT_X86_64
>       jit_lshi_ul(JIT_V1, JIT_V1, JIT_LOG_LONG_DOUBLE_SIZE);
>   #else
>       jit_muli_ui(JIT_V1, JIT_V1, JIT_LONG_DOUBLE_SIZE);
>   #endif
> 
> JIT sometimes retains flonum into special buffer.
> I use it in the following not nice way:
> 
>   #ifdef MZ_LONG_DOUBLE
>   long double *scheme_mz_retain_long_double(mz_jit_state *jitter, long
> double ld)
>   {
>     /* TODO dirty hack to save long double into two cells of double */
>     void *p;
>     if (jitter->retain_start)
>       memcpy(&jitter->retain_double_start[jitter->retained_double], &ld,
>     sizeof(long double));
>     p = jitter->retain_double_start + jitter->retained_double;
>     jitter->retained_double++;
>     jitter->retained_double++;
>     return p;
>   }
>   #endif
> 
> I have modified the place where flonum is unboxed from Scheme_Double
> structure. I used jitter->unbox_extflonum flag for this.
> 
>   #ifdef MZ_LONG_DOUBLE
>     if (jitter->unbox_extflonum) {
>       fpr0 = JIT_FPU_FPR_0(jitter->unbox_depth);
>       jit_fpu_ldxi_ld_fppush(fpr0, target,
> &((Scheme_Long_Double*)0x0)->long_double_val);
>       jitter->unbox_depth++;
>     } else
>   #endif
>     {
>       fpr0 = JIT_FPR_0(jitter->unbox_depth);
>       jit_ldxi_d_fppush(fpr0, target, &((Scheme_Double *)0x0)->double_val);
>       jitter->unbox_depth++;
>     }
> 
> === Tests
> 
> There are tests for extflonums and extflvectors, done by
> copy-pasting the tests for flonums and flvectors:
> 
>   collects/racket/extflonum.rkt
>   collects/tests/racket/extfl-unsafe.rktl
> 
> The patch is attached, I think it can be applied to the development
> branch of Racket.
> 
> === Documentation
> 
> There is documentation in file
> collects/scribblings/reference/extflonums.scrbl,
> which contains mostly copypaste from flonum.scrbl with small
> adaptation for extflonum.


Posted on the dev mailing list.