[racket-dev] long double for racket

From: Michael Filonenko (mfilonenko at ipa.nw.ru)
Date: Sat Dec 22 12:24:15 EST 2012

Hi all.

Modern FPUs can accelerate three types of floating-point arithmetic:
single (32 bit), double (64 bit), long double (80 bit).

Currently Racket supports single and double precisions (flonum) and
is able to JIT operations on them.

The task that is currently being done is adding long double type
(hereinafter extflonum) arithmetic into racket, along with the
corresponding vector type (hereinafter extflvector).

Here we go:

"Long double" requires modification of three racket parts:
  - C core;
  - JIT;
  - Racket library.

Also, long double arithmetic requires setting "extended mode" flag on
FPU, which forces the FPU to use 80-bit registers. The side effect on
that flag is that the FPU gives slightly different (more accurate, but
not IEEE-compliant) results for 64-bit operations. That is usually
not a problem on machines who have SSE2 (introduced in Pentium 4 in
2001). In presense of SSE2, Racket performs 64-bit operations solely
on 64-bit SSE2 registers (see MZ_USE_JIT_SSE and --mfpmath=sse), so the
results are IEEE-compliant. 80-bit operations are done on FPU anyway
as SSE2 can not do them. Therefore, by setting the "extended mode" on
FPU, we introduce a subtle difference in ordinary flonums, but only on
old machines that do not have SSE2. Also, on PowerPC machines
the whole thing with extflonums will be canceled because they
do not have 80-bit registers.

As for the C core: extended floating-point arithmetic is supported by
gcc compiler on Linux, so build scripts for Linux does not require any
change. Windows is another story. MSVC, commonly used to build Racket
for Windows, does not support anything besides double precision. So we
are forced to use gcc for Windows build, too. Cygwin's gcc is not a
good option for us, because it denies the opportunity to use standard
Windows GUI libraries etc. The other options are mingw (32-bit only)
and mingw-w64 (both 32 and 64 bit). Many thanks to Matthew Flatt for
his effort to port Racket to mingw.
(Yet another option is Intel compiler, but I have not looked into it yet.)

Extflonums are tested on Linux x86_64, and Windows 7 x86 (VirtualBox).

I try to keep my modifications separate from other code. That requires
much copy-paste, but hopefully makes my code easier to understand.

=== Miscellaneous notes:

* Extflonum has text representation with "l" suffix
(similar to "f" suffix for single flonums).

* Extflonum is not integrated into existing racket arithmetic (so, (+
123.0l0 513.0l0) is not possible). Extflonums have their own set of
functions: extfl+, extfl-, extfl*, unsafe-extfl+, unsafe-extfl-, etc
(similarly to flonums). The only Racket functions that were modified
are reader, printer, and "equal?" (see below).

* The macro that guards the extflonum code is MZ_LONG_DOUBLE.
The config definition is MZ_USE_LONG_DOUBLE, which enables MZ_LONG_DOUBLE.
The configuration scripts were not modified.

* The macros MZ_LONG_DOUBLE_DISABLED and USE_EXTFLONUM_UNBOXING should be
undefined, these are for unbox optimization, which will be in future.

Changes:

* C core was extended with following types and constants:
  C structs:
    Scheme_Long_Double (extflonum)
    Scheme_Long_Double_Vector (extflvector)
  constants:
    scheme_long_double_type
    scheme_extflvector_type

* Racket reader and printer were modified for reading extflonums (with
suffix "l0"). Racket printer was modified for printing extflvectors
(with "#extfl" prefix). Racket "equal?" function was modified to
support extflonums (the purpose of doing that is that I needed
rackunit to work with extflonums).

* xform compiler was extended for handling long double functions,
such as cosl, sinl, floorl, etc.

* GNU lightning was extended with explicit jit fpu operations: fp-extfpu.h

* Racket collections was extended with racket/extflonum.rkt module,
which exports both safe and unsafe functions for extflonums and
extflvectors.

=== Notes on JIT changes

JIT contains two optimization for flonums.
  First is compiling racket code with inlined flonum functions.
  Second is unboxing flonums to temporary storage when it
  is possible, avoiding overhead with Scheme_Double object.

I have added only the first optimization for extflonum, by
copy-pasting and modifying the original flonum
code. Unboxing extflonums is not implemented yet.

long double is aligned on 12 bytes on x86 and on 16 bytes on x86_64.
That is important for vector accessors generated by JIT for extflvectors.
It is implemented by the following code

  #ifdef MZ_LONG_DOUBLE
  # ifdef MZ_USE_JIT_X86_64
  #  define JIT_LOG_LONG_DOUBLE_SIZE 4
  #  define JIT_LONG_DOUBLE_SIZE (1 << JIT_LOG_LONG_DOUBLE_SIZE)
  # else
  #  define JIT_LOG_LONG_DOUBLE_SIZE not_implemented
  #  define JIT_LONG_DOUBLE_SIZE 12
  #endif

So that the jit code generation is:

  #ifdef MZ_USE_JIT_X86_64
      jit_lshi_ul(JIT_V1, JIT_V1, JIT_LOG_LONG_DOUBLE_SIZE);
  #else
      jit_muli_ui(JIT_V1, JIT_V1, JIT_LONG_DOUBLE_SIZE);
  #endif

JIT sometimes retains flonum into special buffer.
I use it in the following not nice way:

  #ifdef MZ_LONG_DOUBLE
  long double *scheme_mz_retain_long_double(mz_jit_state *jitter, long
double ld)
  {
    /* TODO dirty hack to save long double into two cells of double */
    void *p;
    if (jitter->retain_start)
      memcpy(&jitter->retain_double_start[jitter->retained_double], &ld,
    sizeof(long double));
    p = jitter->retain_double_start + jitter->retained_double;
    jitter->retained_double++;
    jitter->retained_double++;
    return p;
  }
  #endif

I have modified the place where flonum is unboxed from Scheme_Double
structure. I used jitter->unbox_extflonum flag for this.

  #ifdef MZ_LONG_DOUBLE
    if (jitter->unbox_extflonum) {
      fpr0 = JIT_FPU_FPR_0(jitter->unbox_depth);
      jit_fpu_ldxi_ld_fppush(fpr0, target,
&((Scheme_Long_Double*)0x0)->long_double_val);
      jitter->unbox_depth++;
    } else
  #endif
    {
      fpr0 = JIT_FPR_0(jitter->unbox_depth);
      jit_ldxi_d_fppush(fpr0, target, &((Scheme_Double *)0x0)->double_val);
      jitter->unbox_depth++;
    }

=== Tests

There are tests for extflonums and extflvectors, done by
copy-pasting the tests for flonums and flvectors:

  collects/racket/extflonum.rkt
  collects/tests/racket/extfl-unsafe.rktl

The patch is attached, I think it can be applied to the development
branch of Racket.

=== Documentation

There is documentation in file
collects/scribblings/reference/extflonums.scrbl,
which contains mostly copypaste from flonum.scrbl with small
adaptation for extflonum.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: long-double-support.diff.tar.gz
Type: application/x-gzip
Size: 66179 bytes
Desc: not available
URL: <http://lists.racket-lang.org/dev/archive/attachments/20121222/9714bdcd/attachment-0001.gz>

Posted on the dev mailing list.