The reference are 2 postings the Math newsgroup.
For short: compute R(x) = ( 1- cdfN(x) )/pdfN(x) through its Taylor series (which converges good and also gives
error estimates) in x=a for any a = 1,...,10 with external pre-computed values R(a). Now multiply by pdfN(x) = exp(
-x*x/2 - 0.91893... ) to get the cdfN. For higher accuracy take a finer grid, George Marsaglia explains why, see below.
For larger x classical asymptotics would be better.
The Math is easier than a rational approximation like Moro, the problem for a good precision here _is_ Excel and that's
what i wanted to know. One can use data type decimal, but the guys from M$ murmurous suggest it for |values|
below ~ 7 so i re-aranged the code. Multiplying to get cdfN additionally one has to set up a routine for exp (the
build-in cuts after 16 digits). The sheet contains a function localExp for negative arguments using the same technique
which for arguments between -1 and 0 should give exactness better than 10^-30 (in theory, compute the Lagrange
error term).
Practical it is not clear how the ugly data type behaves, so i did some checking: I tested the string output of myN(x) in
the range -10 <= x <= 10 using steps of 1/1000 against Maple (there setting significant digits to 60 for computation
and pre-defining all error values as 9.99... to be overwritten by the correct ones). Maple needs about 3 minutes to
spit out the statitics:
count(errorList); 20001
range(errorList); -.31669918176968306653738544000586364777397 * 10^(-19) ..
.31669918176968306653738544000586364777397 * 10^(-19)
mean(errorList); .8243012849357532123393830308484575760162 * 10^(-26)
That says: all 20001 values have successfully been processed (otherwise 9.99... would be in the range of the error
list) and even at the 'boundaries' a - 0.001 (a=1,2,...10) the errors are below 0.32 * 10^(-19).
So a second test would look closer at this ranges. But i am too lazy for that, a check with Maple shows that this
theoretical is true.
-------------------------------------------------------------------------
Subject: Re: Approx of Normal Distribution and its invers
Date: Thu, 10 Jan 2002 21:29:33 +0100
From: Hugo Pfoertner
Organization: Talkline Internet Division http://www.tli.de
Newsgroups: sci.math
George Marsaglia wrote:
>
> Axel Vogt wrote in message
news:3C20F835.DE905AE5@axelvogt.de...
> > Hi.
> >
> > In the 'enclopetic dictionary' i find some polynomial approximations
> > of the normal distribution and one for its inverse with a maximal
> > error of 10^-4.
> >
> > I remember (?) it has something to do with diverging series.
> >
> > Are there any further results (available on the web) with smaller
> > error (may be not using rational functions and exp, but for the
> > 'center' and not only far outside)?
>
> Let Phi be the normal distribution function and f its density.
> If you write 1-Phi(x) = R(x)*f(x)
> then terms for the Taylor series of R are easily expressed recursively.
> Thus with a table of 20 values of 1-Phi and 8 terms of the Taylor series
> you can get Phi accurate to about 12 places, and with a table of
> 120 you get Phi to 15 places.
>
> I have had a Fortran version for some 30 years, if anyone would like a copy,
> with a C version of a few years ago.
>
> George Marsaglia
>
> On second thought, since people such as Harris can clutter up the Internet
> with trash, why not include a copy of the 20-table Fortran version?
> function Phi(z)
> implicit real*8(a-h,o-z)
> real*8 r(0:19)
> data r/1.253314137005841, 1.175451907906121, 1.101176439464860,
> & 1.030105984709400, .9618781946289486, .8961465252562088,
> & .8325768132417638, .7708440227285019, .7106292052927357,
> & .6516167774466720, .593492322760325, .535941292274201,
> & .478649243294768, .421304670042080, .363606084920330,
> & .305275792412258, .246083511077006, .185882720934051,
> & .124658929655036, .0625777004443072/
> x=abs(z)
> j=20.-31.91538/(x+sqrt(x*x+2.546479))
> v=(1-j/20d0)*r(0)
> yj=1/v-v*.6366197723675813
> H=dABS(X)-yj
> F=r(J)
> F1=F*x-1d0
> F2=(F+x*F1)*.5d0
> F3=(F1+x*F2)*.333333333333333D0
> F4=(F2+x*F3)*.25D0
> F5=(F3+x*F4)*.2D0
> F6=(F4+x*F5)*.166666666666667D0
> F7=(F5+x*F6)*.14285714285714286D0
> F8=(F6+x*F7)*.125D0
> TAYLOR=F+H*(F1+H*(F2+H*(F3+H*(F4+H*(F5+H*(F6+H*(F7+H*F8)))))))
> Phi=TAYLOR*dEXP(-.5d0*X*X-.918938533204672D0)
> if(z.lt.0d0) Phi=1d0-Phi
> return
> end
I have tested this program and it only works correctly, if the following
change is made:
Replace the part
F1=F*x-1d0
F2=(F+x*F1)*.5d0
F3=(F1+x*F2)*.333333333333333D0
F4=(F2+x*F3)*.25D0
F5=(F3+x*F4)*.2D0
F6=(F4+x*F5)*.166666666666667D0
F7=(F5+x*F6)*.14285714285714286D0
F8=(F6+x*F7)*.125D0
by
F1=F*yj-1d0
F2=(F+yj*F1)*.5d0
F3=(F1+yj*F2)*.333333333333333D0
F4=(F2+yj*F3)*.25D0
F5=(F3+yj*F4)*.2D0
F6=(F4+yj*F5)*.166666666666667D0
F7=(F5+yj*F6)*.14285714285714286D0
F8=(F6+yj*F7)*.125D0
The Taylor expansion is made at the left boundary of the intervals,
which is yj. George Marsaglia als sent me the version with a table of
120 phi values at equally spaced abscissa. This 120-table version worked
correctly.
Hugo Pfoertner
------------------------------------------------------------------------
Subject: Re: Integrating a Bell Curve
Date: Tue, 20 Aug 2002 11:29:38 GMT
From: "George Marsaglia"
Organization: Giganews.Com - Premium News Outsourcing
Newsgroups: sci.math
...
&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&
Attached are extracted portions of a LaTeX file that will provide
the file cPhi.ps for the next edition of
The Marsaglia Random Number CDROM
with
The Diehard Battery of Tests of Randomness
It may be of interest to those considering methods
for "integrating a bell curve"
\newcommand{\cPhi}{\mbox{cPhi}}
\newcommand{\RR}{\mbox{R}}
\begin{title}{Evaluating the Normal Distribution Function}
Let $\phi(x)$ be the standard normal density function,
$\phi(x)=e^{-x^2/2}/\sqrt{2\pi}$ and let $\Phi(x)$ and $\cPhi(x)$ be
the corresponding distribution and complementary distribution
functions:
$$\Phi(x)=\int^x_{-\infty} \phi(t)\,dt,\qquad
\cPhi(x)=1-\Phi(x)=\int_x^\infty\phi(t)\,dt .$$
This note advocates a simple method for direct evaluation
of $\cPhi(x)$ for $x>0$ to the
accuracy of the floating point system involved---single, double or
extended precision. Unlike most of the available methods, which
evaluate the error function at $x/\sqrt{2}$ using often mysterious
methods for different ranges of $x$---continued fractions, rational or
Chebychev approximation, asymptotic expansions---the method here is
a straightforward application of a Taylor series whose terms arise from
a simple recursion.
It is an unfortunate accident of history that the `error function',
named for the way that sample values differ from the assumed true value
and obviously based on the normal distribution, was set up
for $e^{-t^2}$ rather than for $e^{-t^2/2}$, with proper exponent
belonging to the standard normal density. So the overwhelming
majority of users have to put up with transforming
$\mbox{erf}(x)=(2/\sqrt{\pi})\int_0^xe^{-t^2}\,dt$ to the standard
normal distribution function by means of
$$\Phi(x)=(1/\sqrt{2\pi})\int_{-\infty}^xe^{-t^2/2}\,dt=
\frac{1}{2}+\frac{1}{2}\mbox{erf}(x/\sqrt{2}).$$
Curses on that unknown scientist, (a physicist?, asrtonomer?),
who inflicted that onerous transformation on us. (To the
argument that the error function is really defined as a solution
to the differential equation $y''(z)+2zy'(z)=0$, one can reply that
$y''(z)+zy'(z)=0$ is simpler and has solution $\Phi(z)$.)
Now define the function $\RR(x)$ by means of
$$\RR(x)\phi(x)=\int_x^\infty \phi(t)\,dt,
\mbox{ that is, } \RR(x)=\cPhi(x)/\phi(x).$$
The ratio $\RR(x)=\cPhi(x)/\phi(x), x\ge0$
is a well behaved, convex function. It starts at\\
$\RR(0)=\sqrt{\pi/2}=1.2533\ldots$ then drops
steadily toward zero.
The graph of $y=\RR(x)$ looks much like that of $y=2/(x+\sqrt{x^2+8/\pi})$,
Both are plotted in
Figure 1.\\[2mm]
Because $\RR(x)$ has an easily-developed Taylor expansion,
one can use a few exact values, say $\RR(1)$,$\RR(2)$,
$\RR(3)$,$\RR(4),\ldots$,
then use the Taylor expansion to get R at intermediate points.
>From $\RR(x)$ one easily obtains $\cPhi(x)$ by multiplication:
$\cPhi(x)=\RR(x)\phi(x)$.
\mbox{To get the Taylor expansion of $\RR$, differentiate
$\\R(x)\phi(x)=\int_x^{\infty} \phi(t)\,dt$} on both sides to get
$$ \RR'(x)\phi(x)-x\RR(x)\phi(x)=-\phi(x),\mbox{ that is, } \RR'=x\RR-1.$$
Then
$$\RR''=x\RR'+\RR;\qquad \RR'''=x\RR''+2\RR';\qquad
\RR''''=x\RR'''+3\RR'';$$
and in general, if $\RR^{[k]}$ means the $k$'th derivative of $\RR$,
then for $k\ge1$:
$$\RR^{[k+1]}(x)=x\RR^{[k]}(x)+k\RR^{[k-1]}(x).$$
This leads to a simple way to evaluate $\RR(z+h)$, given the exact
value $a=\RR(z)$, (using C expressions):
\begin{verbatim}
b=z*a-1;
pwr=1;
s=a+h*b;
for(i=2;;i+=2) {
a=(a+z*b)/i;
b=(b+z*a)/(i+1);
pwr=pwr*h*h;
t=s;
s=s+pwr*(a+h*b);
if(s==t) break; }
\end{verbatim}
One expects the worst errors will be when $h$ is nearly -1, that is,
when $x$ just exceeds one of the tabled values,
say $x=.05,1.05,2.05,\ldots,14.05$. Here
is a list of the values returned for those $x$'s,
followed by the exact values, all to the nearest 15 digits:
\begin{verbatim}
cPhi( .05)=.480061194161628 cPhi( 8.05)=.413970181627315e-15
480061194161628 413970181627315
cPhi(1.05)=.146859056375896 cPhi( 9.05)=.714841701126973e-19
146859056375896 714841701126975
cPhi(2.05)=.201822154057044e-1 cPhi(10.05)=.459337105561296e-23
201822154057044 459337105561297
cPhi(3.05)=.114420683102270e-2 cPhi(11.05)=.109607441199641e-27
114420683102270 109407441199642
cPhi(4.05)=.256088164740415e-4 cPhi(12.05)=.969749658013101e-33
256088164740414 969749658013097
cPhi(5.05)=.220905032269544e-6 cPhi(13.05)=.317737014468943e-38
220905032269543 317737014468942
cPhi(6.05)=.724229170513765e-9 cPhi(14.05)=.385170178073225e-44
724229170513760 385170178073226
cPhi(7.05)=.894588955876993e-12
894588955876992
\end{verbatim}
One rarely needs values of the normal CDF $\Phi(x)$ or its complement
$\cPhi(x)$ accurate to as many as 15 digits, or for values $x$ beyond 6
or so, but the easily implemented Taylor series for
$\\R(x)=\cPhi(x)/\phi(x)$ makes such accuracy available with only a
few stored values of $\\R(x)$ and few lines of code.
Here is a listing of a C function that evaluates cPhi to that 15 digit
accuracy:
\begin{verbatim}
double cPhi(double x){
long double v[]={0.,.65567954241879847154L,
.42136922928805447322L,.30459029871010329573L,.23665238291356067062L,
.19280810471531576488L,.16237766089686746182L,.14010418345305024160L,
.12313196325793229628L,.10978728257830829123L,
.99028596471731921395e-1L,.90175675501064682280e-1L,
.82766286501369177252e-1L,.76475761016248502993e-1L,
.71069580538852107091e-1L,.66374235823250173591e-1L};
long double h,a,b,z,t,sum,pwr;
int i,j;
if(x>15.) return (0.);
if(x<-15.) return (1.);
j=fabs(x)+1.;
z=j;
h=fabs(x)-z;
a=v[j];
b=z*a-1.;
pwr=1.;
sum=a+h*b;
for(i=2;i<60;i+=2){
a=(a+z*b)/i;
b=(b+z*a)/(i+1);
pwr=pwr*h*h;
t=sum;
sum=sum+pwr*(a+h*b);
if(sum==t) break; }
sum=sum*exp(-.5*x*x-.91893853320467274178L);
if(x<0.) sum=1.-sum;
return ((double) sum);
}
\end{verbatim}
A Fortran version is virtually the same, except a {\tt goto}
may be needed to exit the loop when the new term is too small
to affect the sum.
George Marsaglia