Mac Pro - no expandable memory per Gurman

dada_dave · Mar 12, 2023

Yoused said:
10E ±38? That is PF large/small. The observable universe is ~9E+26 meters; the Planck length is 1.6E-35 meters. If you need values exceeding those, wth are you calculating? FP32 is weak in mantissa, but its range is pretty damn wide.

Very small probabilities that need to be integrated over a wide range of possibilities … the difference between 10E-39 and 0 becomes important.

dada_dave · Mar 12, 2023

dada_dave said:
Very small probabilities that need to be integrated over a wide range of possibilities … the difference between 10E-39 and 0 becomes important.

I should clarify that the actual probabilities being integrated over were bigger than 10E-39 (well most of the time) but during the calculation of those probabilities various sub calculations could be smaller and the normal trick of taking logs to avoid this wouldn‘t really work given the structure of the calculations.

theorist9 · Mar 12, 2023

Yoused said:
10E ±38? That is PF large/small. The observable universe is ~9E+26 meters; the Planck length is 1.6E-35 meters. If you need values exceeding those, wth are you calculating? FP32 is weak in mantissa, but its range is pretty damn wide.

If you'd like some toy examples showing how you can get such small numbers (which may have nothing to do with GPU computing, but are at least mathematically concrete), consider:

g(x) = x^6
g(10^–6) = 10^–36

f(x) = x^2/2 + cos(x) – 1
f(10^–9) ≈ 4 * 10^–38

dada_dave · Mar 12, 2023

theorist9 said:
If you'd like some toy examples showing how you can get such small numbers (which may have nothing to do with GPU computing, but are at least mathematically concrete), consider:

f(x) = x^2/2 + cos(x) – 1
f(10^–9) ≈ 4 * 10^–38

g(x) = x^6
g(10^–6) = 10^–36

However if f and g are intermediate values a good trick can be to calculate the log instead and the reconvert when you have a larger number. There are various versions of this trick and it won’t always be practical depending on what you’re doing, but it can help with both precision and range for very small and very large numbers.

theorist9 · Mar 12, 2023

dada_dave said:
However if f and g are intermediate values a good trick can be to calculate the log instead and the reconvert when you have a larger number. There are various versions of this trick and it won’t always be practical depending on what you’re doing, but it can help with both precision and range for very small and very large numbers.

Sure. I imagine that's what Mathematica does when it calculates, say, 10^(–10^9) * 10^(–10^9) = 10^–(2*10^9). I just wanted to give a couple of toy examples showing how even simple calculations can give very small numbers.

leman · Mar 13, 2023

Yeah, if your computation involves values that small, you are pretty much screwed with the standard FP math

That requires some advanced solutions and exact understanding of numerics

theorist9 · Mar 13, 2023

To put this into context for those reading this who may not be familiar with FP32, FP64, etc.:

You may get the (mis)impression from this discussion that the FP64 data type corresponds to ultra-high-precision numbers needed only for the most abstruse or specialized types of calculations, and that the ability to do computations natively with such precision is found only in specialized data center*/workstation hardware.

That's actually not the case at all. In fact, FP64 (equivalent to a "double" in C/C++) is the default capability of most bog-standard CPUs, and translates to only about 16 digits of decimal precision. [FP64 has 64 bits, 53 of which are allocated to the "significand", which is the number in front of the exponent (e.g., "1.2345..." in "1.2345... x 10^-12"). And 53 binary digits is equivalent to 53 x ln(2)/ln(10) = 15.956... ≈16 decimal digits.]

[I've read some Intel CPU's also offer FP 80, which provides 64 bits for the significand (equivalent to a "long double" in C/C++), but I'm not personally familiar with this.]

So it's not that FP64 is particularly high in precision, but rather that it's high relative to what's typical for GPU's, which are specialized to do many lower-precision calculations very fast.

Indeed, when using Mathematica, I sometimes run into problems that will give unacceptable errors when run using FP64 (what Mathematica calls "Machine Precision") calculations. Fortunately, with Mathematica, you have a choice of either:
(a) emulating whatever precision you please in all intermediate calculations (up to practical run-time limits); if you do this Mathematica will actually track the precision loss as it does the calculation and give you its estimate of the precision of the final number;
(b) (for some numeric functions) setting a precision goal for the final value, in which case Mathematica will use whatever precision emulation it needs to achieve this (unless it needs more than you've allotted it, in which case it will say you need to give it more).
(c) (with some exceptions) doing calculations with exact, symbolic values (my default preference), so that there is no precision loss (you can then convert the final result to whatever precision you like).

The tradeoff is that using high-precision emulation is much slower than just running with Machine Precision, i.e., "on the hardware". [Though for certain operations the symbolic approach can be faster than working in Machine Precision.]

Here's a fun pathological example known as Rump's function, after its designer, Siegfreid Rump:
f(a,b) = 333.75 * b^6 + a^2 * (11 * a^2 * b^2 – b^6 – 121 * b^4 – 2 ) + 5.5 * b^8 + a/(2b)
Compute f(a,b) where a = 77617 and b = 33096

We need to carry 44 decimal digits (nearly three times the 16 digits offered by FP64) through the calculation to obtain an answer good to only six digits:

Exact result: –(54767/66192)
Exact result numericized to six decimal digits: –0.827396
Machine Precision (FP 64) result: +1.18059 * 10^21 (!!)
Result with 40 decimal digit emulation (2.5 times the digits of FP64): –0.827 (Mathematica only able to provide result to three digits)
Result with 44 decimal digit emulation (minimum needed to give correct result to six figures): –0.827396

Here's a neat real-world (and non-pathological) example of an error caused by insufficient numerical precision:

The Patriot Missile Failure

*I've read the IBM POWER9 CPU is capable of native FP128, which allocates 112 bits to the significand, equivalent to 112 x ln(2)/ln(10) = 33.7154... ≈ 34 decimal digits. That, I think, is properly considered high precision. But I don't know how often that is used in practice.

mr_roboto · Mar 14, 2023

theorist9 said:
[I've read some Intel CPU's also offer FP 80, which provides 64 bits for the significand (equivalent to a "long double" in C/C++), but I'm not personally familiar with this.]

It's still there in all Intel CPUs, but actual use is rare now. 80-bit FP is only supported by the x87 FPU ISA, and x87 is almost deprecated. SSE/SSE2 is the modern replacement; it supports scalar FP calculations (not just vector) and performs better. So, modern x86 software doesn't use the 80-bit extended precision lurking in a dusty legacy corner.

(When Apple transitioned the Mac to x86, they chose to design their ABIs and compilers and so forth to be SSE-only. This decision is paying off with the transition to Arm; they didn't have to do anything crazy trying to make 80-bit FP work fast in Rosetta.)

Like most things in x86, the 80-bit extended precision was always a bit awkward to work with:

Intermediate Floating-Point Precision

Riddle me this Batman: how much precision are these calculations evaluated at? If you answered ‘double’ and ‘float’ then you score one point for youthful idealism, but zero points for correctness. …

randomascii.wordpress.com

Mac Pro - no expandable memory per Gurman

dada_dave

Elite Member

dada_dave

Elite Member

theorist9

Site Champ

dada_dave

Elite Member

theorist9

Site Champ

leman

Elite Member

theorist9

Site Champ

mr_roboto

Site Champ

Intermediate Floating-Point Precision

Similar threads