## Need faster Exponentiate

General FreeBASIC programming questions.
Provoni
Posts: 380
Joined: Jan 05, 2014 12:33
Location: Belgium

### Re: Need faster Exponentiate

Thanks allot Richard!
Provoni
Posts: 380
Joined: Jan 05, 2014 12:33
Location: Belgium

### Re: Need faster Exponentiate

Richard wrote:and here is the faster single routine using the same approximations.

This one rocks. I will be using it for my program: viewtopic.php?f=8&t=23188

Do you want to be credited?
Richard
Posts: 3026
Joined: Jan 15, 2007 20:44
Location: Australia

### Re: Need faster Exponentiate

I'm happy that you find them useful. You can credit me as “Richard Freebasic”.

I am interested in the particular n-gram techniques you are using for the solution of substitution ciphers. Do you have any references to papers or books that cover n-gram based solution.

These routines were written as “proof of concept”. Regarding speed and accuracy trade-off, I wanted to see what could be done in the CPU arithmetic pipeline faster than the intrinsic functions. I expect speeds will vary depending on the particular CPU version, and the relationship between it's floating point and integer pipelines.

Working in the primitive braided space between integer and floating point makes this an interesting mental challenge. The “octaves” of the floating point format, with the “harmonic” periodicity of the polynomial approximation error functions are fascinating. The variable multiplier b convolutes the error functions in interesting ways to produce beats between the polynomial efs that appear in the final error function. Here is test code that shows how the resultant variable error functions seem to remain within about ±0.5% bounds.

Code: Select all

`'=======================================================================' Approximate the power function' Show that the error function {for b < 3.5} is less than 0.5% of the output' Use only primitive +, -, *, logic or shift' Avoid division, intrinsics and decisions that change the instruction order' https://www.freebasic.net/forum/viewtopic.php?f=3&t=27600''=======================================================================Function raise( Byval x As Single, Byval b As Single ) As Single    ' IEEE 754 Single   Seeeeeeeefffffffffffffffffffffff  ' single format    #define expo_bias &b00111111000000000000000000000000uL ' bias = 126 * 2^23    #define mant_mask &b00000000011111111111111111111111uL ' positive mantissa    #define mant_bits 23 ' number of bits used to store mantissa    #define bias 126     ' the bias that puts mantissa in range 0.5 to 0.999    Dim As Long Ptr fp = Cptr( Long Ptr, @x )   ' make ptr to bit pattern x    Dim As Long expo = ( *fp Shr mant_bits ) - bias  ' extract exponent    *fp And= mant_mask  ' kill the exponent bits    *fp Or=  expo_bias  ' bias exponent so value is between 0.5000 and 0.9999    x = expo - 3.153635 + x * ( 6.095832 + x * ( -4.207588 + x * 1.266028 ) ) ' Log2(x)    x *= b          ' raise Log to power b,  x = b * Log2(x)    expo = Int( x ) ' integer part will become exponent of antilog2(x)    x -= expo       ' fractional part, makes x range 0.0000 to 0.9999    x = 1.00 + x * ( 0.6557133 + x * 0.3442867 )   ' quick 2^x over 0 to 1    *fp += ( expo Shl mant_bits )  ' add the integer to biased log exponent    Return x        ' approximate x ^ bEnd Function        ' maximum error is about 0.5 percent of output' Warning; zero, infinite and denormalised values are not be handled correctly'=======================================================================' set up graphics screenDim As Integer sx, sy, szScreeninfo sx, sy, szsx -= 32 ' shrinksy -= 24sz = 4   ' 16 coloursScreenres sx, sy, sz'-----------------------------------------------------------------------' graphics scaleDim As Double lhs = 0.90001, rhs = 4.1, lo = -0.5, hi = 0.5Window ( lhs, lo ) - ( rhs, hi )Line( lhs,  0 )-( rhs,  0 ), 7Line( lhs, lo )-( lhs, hi ), 7Line( rhs, lo )-( rhs, hi ), 7'=======================================================================Dim As Double x, y, z, t, qDim As Integer i, kPrintPrint " Power =",For i = 0 To 7    k = i + 8   ' 16 colours    q = i * 0.5    If q < .3 Then q = 0.1    Color k    Print q,    For x = lhs To rhs Step ( rhs - lhs ) / ( sx * 2 )        y = x^q        z = raise( x, q )        t =  z - y        Pset( x, 100 * t /  y ), k    Next xNext iFor x = 0 To 4    Draw String ( x+.01, -0.005), Str(x), 15Next x'=======================================================================Sleep'=======================================================================`
albert
Posts: 5704
Joined: Sep 28, 2006 2:41
Location: California, USA

### Re: Need faster Exponentiate

You can use my multiplier... It does a million digits in under a minute..

Code: Select all

`function mul_loop_7( num1 as string , num2 as string ) as string       dim as string number1 = num1    dim as string number2 = num2       'make numbers equal multiple of 7 bytes    dim as string str1    dim as longint dec1    do        str1 = str(len(number1)/7)        dec1 = instr(1,str1,".")        if dec1 <> 0 then number1 = "0" + number1    loop until dec1 = 0    do        str1 = str(len(number2)/7)        dec1 = instr(1,str1,".")        if dec1 <> 0 then number2 = "0" + number2    loop until dec1 = 0           'convert the numeric strings to use pointers    'convert number1    dim as string n1 = string(len(number1)*8,chr(0))    dim as ulongint ptr ulp1 = cptr(ulongint ptr,strptr(n1))    dim as longint valu1    dim as longint len_1 = 0    for a as longint = 0 to len(number1)-1 step 7        valu1  = (number1[a     ]-48)*1e6        valu1+= (number1[a+1]-48)*1e5        valu1+= (number1[a+2]-48)*1e4        valu1+= (number1[a+3]-48)*1e3        valu1+= (number1[a+4]-48)*1e2        valu1+= (number1[a+5]-48)*1e1        valu1+= (number1[a+6]-48)'*1        *ulp1 = valu1        ulp1+=1        len_1+=8    next    number1 = left(n1,len_1)    n1=""       'convert the numeric strings to use pointers    'convert number2    dim as string n2 = string(len(number2)*8,chr(0))    dim as ulongint ptr ulp2 = cptr(ulongint ptr,strptr(n2))    dim as longint valu2    dim as longint len_2 = 0    for a as longint = 0 to len(number2)-1 step 7        valu2 =  (number2[a     ]-48)*1e6        valu2+= (number2[a+1]-48)*1e5        valu2+= (number2[a+2]-48)*1e4        valu2+= (number2[a+3]-48)*1e3        valu2+= (number2[a+4]-48)*1e2        valu2+= (number2[a+5]-48)*1e1        valu2+= (number2[a+6]-48)'*1        *ulp2 = valu2        ulp2+=1        len_2+=8    next    number2 = left(n2,len_2)    n2=""       'create accumulator    dim as string answer = string( len(number1) + len(number2) , chr(0) )    dim as ulongint outblocks = ( len(answer) \ 8 )    dim as ulongint ptr outplace = cptr(ulongint ptr , strptr(answer)) + (outblocks - 1 )    dim as ulongint stops = ( (len(number1)\8) + (len(number2)\8) )    dim as ulongint value = 0    dim as longint hold = -1    dim as longint locat = 0    dim as longint vals = 0    dim as ulongint high1 = ( len(number1)  \ 8 ) - 1    dim as ulongint high2 = ( len(number2)  \ 8 ) - 1    dim as longint ptr num1_ptr = cptr( ulongint ptr , strptr(number1) ) + high1    dim as longint ptr num2_ptr = cptr( ulongint ptr , strptr(number2) ) + high2    do        hold+=1        vals = hold        locat = 0        if vals > high2 then vals = high2 : locat = (hold - high2)        num1_ptr = cptr( ulongint ptr , strptr(number1) ) + high1 - locat        num2_ptr = cptr( ulongint ptr , strptr(number2) ) + high2 - vals        do            value+= *( num1_ptr ) * *( num2_ptr )            num1_ptr-=1            num2_ptr+=1            if num1_ptr = cptr( ulongint ptr , strptr(number1) ) - 1 then goto done            if num2_ptr > cptr( ulongint ptr , strptr(number2) ) + high2  then goto done        loop        Done:        *outplace = value mod 1e7         outplace-= 1         value = value \ 1e7    loop until hold = stops-2        *outplace = value mod 1e7           'convert answer back to ascii   dim as string outtext=""   outplace = cptr( ulongint ptr , strptr(answer) )   for a as ulongint = 1 to outblocks step 1       value = *outplace       outplace+=1       outtext+= right("0000000" + str(value),7)    next         outtext = ltrim(outtext,"0")      return outtext   end function`
Provoni
Posts: 380
Joined: Jan 05, 2014 12:33
Location: Belgium

### Re: Need faster Exponentiate

Richard wrote:I'm happy that you find them useful. You can credit me as “Richard Freebasic”.

I am interested in the particular n-gram techniques you are using for the solution of substitution ciphers. Do you have any references to papers or books that cover n-gram based solution.

Okay, will do. My algorithm is mostly original and I have no books to share, have been at it for several years though. I got the concept of using letter n-grams from zkdecrypto. For example a 5-gram such as "SHARE" has a score equal to the logarithm of its frequency in a corpus.

A simple explanation of how the score formula works:

All the 5-grams scores of the solution plain text are summed and divided by the total amount of 5-grams in the solution plain text to normalize the score. This divided sum is then multiplicated by the 1-gram entropy of the solution plain text to keep the solver from converging on "ETAION" heavy stuff. I use a weight on the entropy depending on the n-gram size: entropy^weight. Here is where your speedy function comes into play. On a fast computer my solver can do many tens of millions of iterations of this n-gram summing per second so it is important that the exponentiate function is fast so that it does not cause much of a slow down.

Richard wrote:These routines were written as “proof of concept”. Regarding speed and accuracy trade-off, I wanted to see what could be done in the CPU arithmetic pipeline faster than the intrinsic functions. I expect speeds will vary depending on the particular CPU version, and the relationship between it's floating point and integer pipelines.

It is beyond my level of education. I am thankful that you came up with it. :)
Richard
Posts: 3026
Joined: Jan 15, 2007 20:44
Location: Australia

### Re: Need faster Exponentiate

This more mature version is faster now that several instructions have been removed, merged or modified.

The error function has unfortunately increased slightly, but is much better behaved since I have polished the polynomial coefficients. The step changes and reflections in the error function when crossing octave boundaries have been removed. That required even order (quadratic) polynomials, to get odd (cubic) error functions.
The maximum percentage error is now = 0.54 * ( Abs( power ) + 0.49 ).
Typically; 0.5% for square roots; 1.24% for squares and 1.65% for cubes.
See table in code block.

Code: Select all

`'=======================================================================' Approximate the power function; Written by Richard, Freebasic.'=======================================================================' Approximate x ^ b, for x > 0.   Balanced error without octave steps Function raise( Byval x As Single, Byval b As Single ) As Single    ' IEEE 754 Single   Seeeeeeeefffffffffffffffffffffff ' Sign_Log_Linear format    #define expo_bias &b00111111000000000000000000000000uL ' bias = 126 * 2^23    #define frac_mask &b00000000011111111111111111111111uL ' positive fraction    #define frac_bits 23 ' bits used to store fraction, ignoring implicit msb    #define bias 126 ' exponent needed to make single to range from 0.5 to 0.999    Dim As Long Ptr fp = Cptr( Long Ptr, @x ) ' pointer to bit pattern of x    '-------- approximate Log2(x) --------------------------------------    Dim As Long expo = ( *fp Shr frac_bits ) - bias  ' extract exponent    *fp = ( *fp And frac_mask ) Or expo_bias  ' bias x into range 0.5 to 0.999    x = b * ( expo - 2.693111 + x * ( 4.079332 + x * -1.386221 ) ) ' log2 over 0.5 to 1    ' to raise x to power b, y = x^b, y = Exp2( b * Log2( x ) )    '-------- approximate 2^x, is antiLog base 2 ( x ) -----------------    expo = Int( x ) ' integer part will later adjust exponent of Alog2(x)    x -= expo   '  reduced to fraction in range 0.0 thru 0.999    x = 1.00 + x * ( 0.6557133 + x * 0.3442867 ) ' approx 2^x over range 0 to 1    *fp += ( expo Shl frac_bits ) ' restore early integer to biased log exponent    Return xEnd Function    ' Warning; avoid x <= 0, infinite or denormalised values'-----------------------------------------------------------------------' maximum error rises as magnitude of power rises' Maximum percentage error = ( pwr + 0.49 ) * 0.54'     abs    maximum'     pwr     % error  '     0.5,   0.560634'     1.0,   0.852009'     1.5,   1.057730'     2.0,   1.245606'     2.5,   1.612362'     3.0,   1.638780'     3.5,   2.136535'     4.0,   2.317659'     4.5,   2.646873'=======================================================================' set up graphics screenDim As Integer sx, sy, szScreeninfo sx, sy, szsx -= 32 ' shrinksy -= 24sz = 4   ' 16 coloursScreenres sx, sy, sz'--------------------------------------' graphics scaleDim As Double lhs = 0.90, rhs = 4.1, lo = -0.5, hi = 0.5Window ( lhs, lo ) - ( rhs, hi )Line( lhs,  0 )-( rhs,  0 ), 7Line( lhs, lo )-( lhs, hi ), 7Line( rhs, lo )-( rhs, hi ), 7'=======================================================================Dim As Double x, y, z, t, q, max = -100, min = 100, scale = 15Dim As Integer i, k = 7PrintPrint " Power =",'--------------------------------------For q = -5.5 to 5.51 step .25    k += 1                  ' 16 colours    if k > 15 then k = 8    Color k    Print using "####.##"; q;    '----------------------------------    For x = lhs To rhs Step ( rhs - lhs ) / ( sx * 2 )        y = x^q        z = raise( x, q )        t = ( z - y ) / y         ' this is the error        If min > t Then min = t        If max < t Then max = t        Pset( x, scale * t ), k    Next xNext q'=======================================================================For x = 0 To 4    Draw String ( x+.01, -0.005), Str(x), 15Next xcolor 15printprintprint using " Error bounds  ####.###### %  ####.###### %"; max * 100; min * 100'=======================================================================Sleep'=======================================================================`
Provoni
Posts: 380
Joined: Jan 05, 2014 12:33
Location: Belgium

### Re: Need faster Exponentiate

Thanks for your updated function, error table and plotter Richard. Much appreciated!
Richard
Posts: 3026
Joined: Jan 15, 2007 20:44
Location: Australia

### Re: Need faster Exponentiate

I have reduced error dependency on the power b, by increasing the order and accuracy of the Log2 approximation.
It is also now less kinky and better behaved.
Use this version when higher power errors are more of a problem than speed.
Maximum error is < ± 0.30% for b < ± 2.5
Maximum error is < ± 0.35% for b < ± 7.0
Maximum error is < ± 0.50% for b < ± 22.
Maximum error is < ± 0.92% for b < ± 60.

Code: Select all

`'=======================================================================' Approximate the power function; written by Richard.'=======================================================================' Approximate x ^ b, for x > 0; |b| < 60. Avoid infinite or denormalised x valuesFunction raise( Byval x As Single, Byval b As Single ) As Single    ' to raise x to power b, y = x^b, y = Exp2( b * Log2( x ) )    ' IEEE 754 Single   Seeeeeeeefffffffffffffffffffffff ' Sign_Log_Linear format    #define expo_bias &b00111111000000000000000000000000uL ' bias = 126 * 2^23    #define frac_mask &b00000000011111111111111111111111uL ' positive fraction    #define frac_bits 23 ' bits used to store fraction, ignoring implicit msb    #define bias 126 ' exponent that puts single into range from 0.500 to 0.999    Dim As Long Ptr fp = Cptr( Long Ptr, @x ) ' pointer to bit pattern of x    '-------- approximate Log2(x) --------------------------------------    Dim As Long expo = ( *fp Shr frac_bits ) - bias  ' extract exponent    *fp = ( *fp And frac_mask ) Or expo_bias  ' bias x into range 0.5 to 0.999    #Define a0 -3.5353058e0     ' log2 polynomial approximation    #Define a1  8.2616852e0     '   over the range from 0.5 to 1.0    #Define a2 -8.7235965e0     '   maximum error is 1.54e-4 = 0.000154    #Define a3  5.3685806e0     ' Zeros are located at    #Define a4 -1.3713634e0     '   0.500, 0.575, 0.720, 0.885, 1.000    x = b * ( Csng( expo ) + a0 + ( a1 + ( a2 + ( a3 + a4 * x ) * x ) * x  ) * x )    '-------- approximate 2^x, is antiLog2( x ) ------------------------    expo = Int( x ) ' this integer part adjusts the exponent of Alog2(x)    x -= expo   ' x reduced to fraction in range 0.000 thru 0.999    x = 1e0 + ( 0.6607687e0 + 0.3392313e0 * x ) * x  ' approx 2^x over range 0 to 1    *fp += ( expo Shl frac_bits ) ' restore early integer to biased log exponent    Return xEnd Function'    Maximum error is < ± 0.30% for b < ± 2.5'    Maximum error is < ± 0.35% for b < ± 7.0'    Maximum error is < ± 0.50% for b < ± 22.'    Maximum error is < ± 0.92% for b < ± 60.'    Beyond which the exponent overflows, so it all falls apart''=======================================================================' set up graphics screenDim As Integer sx, sy, szScreeninfo sx, sy, szsx -= 32 ' shrinksy -= 24sz = 4   ' 16 coloursScreenres sx, sy, sz'--------------------------------------' graphics scaleDim As Double x, y, z, t, q Dim As Double lhs = 0.8, rhs = 4.2, lo = -0.5, hi = 0.5Window ( lhs, lo ) - ( rhs, hi )Line( lhs,  0 )-( rhs,  0 ), 1For x = 1 To 4    Line( x, lo )-( x, hi ), 1Next x'=======================================================================' plot the error curvesDim As Double max = -1e6, min = 1e6, scale = 100Dim As Integer i, k = 7PrintPrint " Powers =  ";'--------------------------------------For q = -2.5 To 2.51 Step 0.1    k += 1                  ' 16 colours    If k > 15 Then k = 8    Color k    Print Using "####.##"; q;    '----------------------------------    For x = lhs To rhs Step ( rhs - lhs ) / ( sx * 2 )        y = x^q             ' correct        z = raise( x, q )   ' approx        t = ( z - y ) / y   ' t is the error        If min > t Then min = t        If max < t Then max = t        Pset( x, scale * t ), k    Next xNext qLine( lhs,  max * scale )-( rhs,  max * scale ), 1Line( lhs,  min * scale )-( rhs,  min * scale ), 1'=======================================================================For x = 0 To 4    Draw String ( x+.01, -0.005), Str(x), 15Next xColor 15PrintPrintPrint Using " Error bounds ####.### % ####.### %"; max * 100; min * 100'=======================================================================Sleep'=======================================================================`
Provoni
Posts: 380
Joined: Jan 05, 2014 12:33
Location: Belgium

### Re: Need faster Exponentiate

Thanks Richard, I have just released a new version this morning and included a mention of your code in the Readme.txt in the base directory of the program.

viewtopic.php?f=8&t=23188

I have tested both your functions with the programs functionality in mind and prefer the faster version though the difference is very small. Is it possible to further optimize the function if for example the b variable (the power raised to) will never exceed 5?
Richard
Posts: 3026
Joined: Jan 15, 2007 20:44
Location: Australia

### Re: Need faster Exponentiate

The problem with approximating diadic functions like a^b is that they cover an area rather than a line, so usually require two, or possibly three, approximations to get one answer.

There are a few possible ways to speed it up further.

1. Implement it as macro “inline code” to remove the function call overhead. Use static variables.

2a. Code it in ASM to use only the extended registers inside the CPU. It would probably run at twice the speed, but would then be processor dependent.
2b. Implement four of the approximations at one time, so four packed singles can be processed in parallel in mmx registers and the fpu pipeline.

3. Define constraints to the a and b input ranges, or quantise those values, so lookup table(s) or minimised polynomials could be used.

4. Look at replacing the Int() function call with inline code.

5. Do a larger part of the surrounding math in a logarithmic number system, not in the log_linear exponent_significand floating point system. That will depend on the surrounding code.

6. Use single rather than double throughout the code wherever possible. That only needs to move half the data bits through the processor-memory bottlenecks.

7. Identify if and when a^b is computed more than once with the same value for a. The earlier value of Log2(a) could then be saved as static rather than recomputed.

8. Buy a faster computer or use a GPU for parallel processing.
OSchmidt
Posts: 49
Joined: Jan 01, 2016 12:27
Contact:

### Re: Need faster Exponentiate

FWIW,
below is a routine which allows to adjust precision (the amount of iterations)...

This can be done via the optional i_0_11 param,
which defaults to 6 - and then ensures a max-error of +-1% (over the whole range of exponents).

With that default-value of i_0_11 = 6 (+-1% error), the routine is about 25% faster than "raise()" -
with i_0_11 = 8 (+- 0.25%) it has about the same speed on my machine.

As for performance...
- I've tested on a (4 year old) intel-mobile-cpu
- fbc 32bit, version 1.06 (with -fpu SSE as the only compiler-switch)

There might be further optimizations possible in the iteration-loop,
(where x is calculated as x = Sqr(x) "kinda recursively") -
but I'll leave that for the true math-experts... ;)

Code: Select all

`'an approach which applies the (usually quite fast) runtime-Sqr-function (multiple times)'the optional i_0_11 allows to adjust the amount of iterations - and thus the precision... 'with its default: i_0_11=6  the error is +-1% ... with 7 it is half that, a.s.o. up to 11Function FastPow(ByVal x As Single, ByVal n As Single, ByVal i_0_11 As Long=6) As Single  Dim i as Long, r As Single = 1    Static L(0 To 11) As Single   If L(0) = 0 Then 'fill the Lookup-Table (once, because it's static)    For i = 0 To 11: L(i) = 1 / 2 ^ (i + 1): Next  End If   If x = 0 Then return 0 'early exit for 0-values of x   If n > 0 Then 'handle positive exponents      Do Until n < 1: r *= x: n -= 1: Loop        For i = 0 To i_0_11      x = Sqr(x)      If n >= L(i) Then r *= x: n -= L(i)       If n <= L(i_0_11) Then Exit For 'early exit (n can't get any smaller)    Next      ElseIf n < 0 Then 'handle the case of neg. exponents    r = 1 / FastPow(x, -n, i_0_11)  End If    return rEnd Function`

HTH

Olaf
Provoni
Posts: 380
Joined: Jan 05, 2014 12:33
Location: Belgium

### Re: Need faster Exponentiate

Richard wrote:The problem with approximating diadic functions like a^b is that they cover an area rather than a line, so usually require two, or possibly three, approximations to get one answer.

There are a few possible ways to speed it up further.

1. Implement it as macro “inline code” to remove the function call overhead. Use static variables.

2a. Code it in ASM to use only the extended registers inside the CPU. It would probably run at twice the speed, but would then be processor dependent.
2b. Implement four of the approximations at one time, so four packed singles can be processed in parallel in mmx registers and the fpu pipeline.

3. Define constraints to the a and b input ranges, or quantise those values, so lookup table(s) or minimised polynomials could be used.

4. Look at replacing the Int() function call with inline code.

5. Do a larger part of the surrounding math in a logarithmic number system, not in the log_linear exponent_significand floating point system. That will depend on the surrounding code.

6. Use single rather than double throughout the code wherever possible. That only needs to move half the data bits through the processor-memory bottlenecks.

7. Identify if and when a^b is computed more than once with the same value for a. The earlier value of Log2(a) could then be saved as static rather than recomputed.

8. Buy a faster computer or use a GPU for parallel processing.

Thanks for your performance tips Richard, at least I will implement it as a macro. I have not used macro's before and it will be a good start. Point 7 should be applicable. Some of the other points are above my skill level.
Provoni
Posts: 380
Joined: Jan 05, 2014 12:33
Location: Belgium

### Re: Need faster Exponentiate

I have tested it and it is a bit slower than Richard's for my program. As you say it could probably be optimized a little bit further and have not tried altering the i_0_11 parameter yet.
OSchmidt
Posts: 49
Joined: Jan 01, 2016 12:27
Contact:

### Re: Need faster Exponentiate

As said, it gets faster with decreasing values in i_0_11.

With:
- i_0_11=5 instead of the default 6, you'll save one iteration - and the error will be doubled to about +-2%
- i_0_11=4 instead of the default 6, you'll save two iterations - and the error will be about +-4%

Would like to hear, how it performs when you try it with i_0_11=4 (since this roughly matches the err-interval of the "prior raise()".
Did you ensure the compiler-swich: -fpu SSE already?

Regards,

Olaf
Richard
Posts: 3026
Joined: Jan 15, 2007 20:44
Location: Australia

### Re: Need faster Exponentiate

Code: Select all

`' Compare the single precision  x^b  with algorithm under test'  reference  ref time   test      test time      ratio     Error bounds'  single_pwr   69.8ns   FastPow11  154.7ns      221.67%    0.03125%'  single_pwr   69.8ns   FastPow10  141.5ns      202.68%    0.0625%'  single_pwr   69.7ns   FastPow9   127.0ns      182.23%    0.125%'  single_pwr   69.7ns   FastPow8   112.3ns      161.07%    0.25%'  single_pwr   69.7ns   FastPow7    99.8ns      143.11%    0.5%'  single_pwr   69.7ns   FastPow6    87.6ns      125.68%    1.%'  single_pwr   69.7ns   FastPow5    75.5ns      108.31%    2.%'  single_pwr   69.7ns   FastPow4    63.3ns       90.81%    4.%'  single_pwr   69.8ns   FastPow3    52.0ns       74.42%    8.%'  single_pwr   69.8ns   FastPow2    41.4ns       59.36%   16.%'  single_pwr   69.9ns   FastPow1    32.5ns       46.52%   32.%'  single_pwr   69.8ns   Raise       41.7ns       59.72%    0.5%`

Note that, Unlike Raise(), the FastPow() error bounds increase with x. I tested to x = 4

Later model CPUs map the old external x87 register stack onto the internal mmx register block. Access time is therefore the same for each system. Compiler option -fpu (sse/x87) no longer makes a consistent difference.