Need faster Exponentiate

General FreeBASIC programming questions.
Provoni
Posts: 513
Joined: Jan 05, 2014 12:33
Location: Belgium

Re: Need faster Exponentiate

Post by Provoni »

Thanks allot Richard!
Provoni
Posts: 513
Joined: Jan 05, 2014 12:33
Location: Belgium

Re: Need faster Exponentiate

Post by Provoni »

Richard wrote: and here is the faster single routine using the same approximations.
This one rocks. I will be using it for my program: viewtopic.php?f=8&t=23188

Do you want to be credited?
Richard
Posts: 3096
Joined: Jan 15, 2007 20:44
Location: Australia

Re: Need faster Exponentiate

Post by Richard »

I'm happy that you find them useful. You can credit me as “Richard Freebasic”.

I am interested in the particular n-gram techniques you are using for the solution of substitution ciphers. Do you have any references to papers or books that cover n-gram based solution.

These routines were written as “proof of concept”. Regarding speed and accuracy trade-off, I wanted to see what could be done in the CPU arithmetic pipeline faster than the intrinsic functions. I expect speeds will vary depending on the particular CPU version, and the relationship between it's floating point and integer pipelines.

Working in the primitive braided space between integer and floating point makes this an interesting mental challenge. The “octaves” of the floating point format, with the “harmonic” periodicity of the polynomial approximation error functions are fascinating. The variable multiplier b convolutes the error functions in interesting ways to produce beats between the polynomial efs that appear in the final error function. Here is test code that shows how the resultant variable error functions seem to remain within about ±0.5% bounds.

Code: Select all

'=======================================================================
' Approximate the power function
' Show that the error function {for b < 3.5} is less than 0.5% of the output
' Use only primitive +, -, *, logic or shift
' Avoid division, intrinsics and decisions that change the instruction order
' https://www.freebasic.net/forum/viewtopic.php?f=3&t=27600
'
'=======================================================================
Function raise( Byval x As Single, Byval b As Single ) As Single
    ' IEEE 754 Single   Seeeeeeeefffffffffffffffffffffff  ' single format
    #define expo_bias &b00111111000000000000000000000000uL ' bias = 126 * 2^23
    #define mant_mask &b00000000011111111111111111111111uL ' positive mantissa
    #define mant_bits 23 ' number of bits used to store mantissa
    #define bias 126     ' the bias that puts mantissa in range 0.5 to 0.999
    Dim As Long Ptr fp = Cptr( Long Ptr, @x )   ' make ptr to bit pattern x
    Dim As Long expo = ( *fp Shr mant_bits ) - bias  ' extract exponent
    *fp And= mant_mask  ' kill the exponent bits
    *fp Or=  expo_bias  ' bias exponent so value is between 0.5000 and 0.9999
    x = expo - 3.153635 + x * ( 6.095832 + x * ( -4.207588 + x * 1.266028 ) ) ' Log2(x)
    x *= b          ' raise Log to power b,  x = b * Log2(x)
    expo = Int( x ) ' integer part will become exponent of antilog2(x)
    x -= expo       ' fractional part, makes x range 0.0000 to 0.9999
    x = 1.00 + x * ( 0.6557133 + x * 0.3442867 )   ' quick 2^x over 0 to 1
    *fp += ( expo Shl mant_bits )  ' add the integer to biased log exponent
    Return x        ' approximate x ^ b
End Function        ' maximum error is about 0.5 percent of output
' Warning; zero, infinite and denormalised values are not be handled correctly

'=======================================================================
' set up graphics screen
Dim As Integer sx, sy, sz
Screeninfo sx, sy, sz
sx -= 32 ' shrink
sy -= 24
sz = 4   ' 16 colours
Screenres sx, sy, sz

'-----------------------------------------------------------------------
' graphics scale
Dim As Double lhs = 0.90001, rhs = 4.1, lo = -0.5, hi = 0.5
Window ( lhs, lo ) - ( rhs, hi )
Line( lhs,  0 )-( rhs,  0 ), 7
Line( lhs, lo )-( lhs, hi ), 7
Line( rhs, lo )-( rhs, hi ), 7

'=======================================================================
Dim As Double x, y, z, t, q
Dim As Integer i, k
Print
Print " Power =",
For i = 0 To 7
    k = i + 8   ' 16 colours
    q = i * 0.5
    If q < .3 Then q = 0.1
    Color k
    Print q,
    For x = lhs To rhs Step ( rhs - lhs ) / ( sx * 2 )
        y = x^q
        z = raise( x, q )
        t =  z - y
        Pset( x, 100 * t /  y ), k
    Next x
Next i

For x = 0 To 4
    Draw String ( x+.01, -0.005), Str(x), 15
Next x

'=======================================================================
Sleep
'=======================================================================
albert
Posts: 6000
Joined: Sep 28, 2006 2:41
Location: California, USA

Re: Need faster Exponentiate

Post by albert »

You can use my multiplier... It does a million digits in under a minute..

Code: Select all

function mul_loop_7( num1 as string , num2 as string ) as string
   
    dim as string number1 = num1
    dim as string number2 = num2
   
    'make numbers equal multiple of 7 bytes
    dim as string str1
    dim as longint dec1
    do
        str1 = str(len(number1)/7)
        dec1 = instr(1,str1,".")
        if dec1 <> 0 then number1 = "0" + number1
    loop until dec1 = 0
    do
        str1 = str(len(number2)/7)
        dec1 = instr(1,str1,".")
        if dec1 <> 0 then number2 = "0" + number2
    loop until dec1 = 0
       
    'convert the numeric strings to use pointers
    'convert number1
    dim as string n1 = string(len(number1)*8,chr(0))
    dim as ulongint ptr ulp1 = cptr(ulongint ptr,strptr(n1))
    dim as longint valu1
    dim as longint len_1 = 0
    for a as longint = 0 to len(number1)-1 step 7
        valu1  = (number1[a     ]-48)*1e6
        valu1+= (number1[a+1]-48)*1e5
        valu1+= (number1[a+2]-48)*1e4
        valu1+= (number1[a+3]-48)*1e3
        valu1+= (number1[a+4]-48)*1e2
        valu1+= (number1[a+5]-48)*1e1
        valu1+= (number1[a+6]-48)'*1
        *ulp1 = valu1
        ulp1+=1
        len_1+=8
    next
    number1 = left(n1,len_1)
    n1=""
   
    'convert the numeric strings to use pointers
    'convert number2
    dim as string n2 = string(len(number2)*8,chr(0))
    dim as ulongint ptr ulp2 = cptr(ulongint ptr,strptr(n2))
    dim as longint valu2
    dim as longint len_2 = 0
    for a as longint = 0 to len(number2)-1 step 7
        valu2 =  (number2[a     ]-48)*1e6
        valu2+= (number2[a+1]-48)*1e5
        valu2+= (number2[a+2]-48)*1e4
        valu2+= (number2[a+3]-48)*1e3
        valu2+= (number2[a+4]-48)*1e2
        valu2+= (number2[a+5]-48)*1e1
        valu2+= (number2[a+6]-48)'*1
        *ulp2 = valu2
        ulp2+=1
        len_2+=8
    next
    number2 = left(n2,len_2)
    n2=""
   
    'create accumulator
    dim as string answer = string( len(number1) + len(number2) , chr(0) )
    dim as ulongint outblocks = ( len(answer) \ 8 )
    dim as ulongint ptr outplace = cptr(ulongint ptr , strptr(answer)) + (outblocks - 1 )
    dim as ulongint stops = ( (len(number1)\8) + (len(number2)\8) )
    dim as ulongint value = 0
    dim as longint hold = -1
    dim as longint locat = 0
    dim as longint vals = 0
    dim as ulongint high1 = ( len(number1)  \ 8 ) - 1
    dim as ulongint high2 = ( len(number2)  \ 8 ) - 1
    dim as longint ptr num1_ptr = cptr( ulongint ptr , strptr(number1) ) + high1
    dim as longint ptr num2_ptr = cptr( ulongint ptr , strptr(number2) ) + high2
    do
        hold+=1
        vals = hold
        locat = 0
        if vals > high2 then vals = high2 : locat = (hold - high2)
        num1_ptr = cptr( ulongint ptr , strptr(number1) ) + high1 - locat
        num2_ptr = cptr( ulongint ptr , strptr(number2) ) + high2 - vals
        do
            value+= *( num1_ptr ) * *( num2_ptr )
            num1_ptr-=1
            num2_ptr+=1
            if num1_ptr = cptr( ulongint ptr , strptr(number1) ) - 1 then goto done
            if num2_ptr > cptr( ulongint ptr , strptr(number2) ) + high2  then goto done
        loop
        Done:
        *outplace = value mod 1e7
         outplace-= 1
         value = value \ 1e7
    loop until hold = stops-2
   
     *outplace = value mod 1e7
       
    'convert answer back to ascii
   dim as string outtext=""
   outplace = cptr( ulongint ptr , strptr(answer) )
   for a as ulongint = 1 to outblocks step 1
       value = *outplace
       outplace+=1
       outtext+= right("0000000" + str(value),7)
    next   
   
   outtext = ltrim(outtext,"0")
   
   return outtext
   
end function

Provoni
Posts: 513
Joined: Jan 05, 2014 12:33
Location: Belgium

Re: Need faster Exponentiate

Post by Provoni »

Richard wrote: I'm happy that you find them useful. You can credit me as “Richard Freebasic”.

I am interested in the particular n-gram techniques you are using for the solution of substitution ciphers. Do you have any references to papers or books that cover n-gram based solution.
Okay, will do. My algorithm is mostly original and I have no books to share, have been at it for several years though. I got the concept of using letter n-grams from zkdecrypto. For example a 5-gram such as "SHARE" has a score equal to the logarithm of its frequency in a corpus.

A simple explanation of how the score formula works:

All the 5-grams scores of the solution plain text are summed and divided by the total amount of 5-grams in the solution plain text to normalize the score. This divided sum is then multiplicated by the 1-gram entropy of the solution plain text to keep the solver from converging on "ETAION" heavy stuff. I use a weight on the entropy depending on the n-gram size: entropy^weight. Here is where your speedy function comes into play. On a fast computer my solver can do many tens of millions of iterations of this n-gram summing per second so it is important that the exponentiate function is fast so that it does not cause much of a slow down.
Richard wrote: These routines were written as “proof of concept”. Regarding speed and accuracy trade-off, I wanted to see what could be done in the CPU arithmetic pipeline faster than the intrinsic functions. I expect speeds will vary depending on the particular CPU version, and the relationship between it's floating point and integer pipelines.
It is beyond my level of education. I am thankful that you came up with it. :)
Richard
Posts: 3096
Joined: Jan 15, 2007 20:44
Location: Australia

Re: Need faster Exponentiate

Post by Richard »

This more mature version is faster now that several instructions have been removed, merged or modified.

The error function has unfortunately increased slightly, but is much better behaved since I have polished the polynomial coefficients. The step changes and reflections in the error function when crossing octave boundaries have been removed. That required even order (quadratic) polynomials, to get odd (cubic) error functions.
The maximum percentage error is now = 0.54 * ( Abs( power ) + 0.49 ).
Typically; 0.5% for square roots; 1.24% for squares and 1.65% for cubes.
See table in code block.

This is the latest version, wrapped in an error function plotter.

Code: Select all

'=======================================================================
' Approximate the power function; Written by Richard, Freebasic.
'=======================================================================
' Approximate x ^ b, for x > 0.   Balanced error without octave steps 
Function raise( Byval x As Single, Byval b As Single ) As Single
    ' IEEE 754 Single   Seeeeeeeefffffffffffffffffffffff ' Sign_Log_Linear format
    #define expo_bias &b00111111000000000000000000000000uL ' bias = 126 * 2^23
    #define frac_mask &b00000000011111111111111111111111uL ' positive fraction
    #define frac_bits 23 ' bits used to store fraction, ignoring implicit msb
    #define bias 126 ' exponent needed to make single to range from 0.5 to 0.999
    Dim As Long Ptr fp = Cptr( Long Ptr, @x ) ' pointer to bit pattern of x
    '-------- approximate Log2(x) --------------------------------------
    Dim As Long expo = ( *fp Shr frac_bits ) - bias  ' extract exponent
    *fp = ( *fp And frac_mask ) Or expo_bias  ' bias x into range 0.5 to 0.999
    x = b * ( expo - 2.693111 + x * ( 4.079332 + x * -1.386221 ) ) ' log2 over 0.5 to 1
    ' to raise x to power b, y = x^b, y = Exp2( b * Log2( x ) )
    '-------- approximate 2^x, is antiLog base 2 ( x ) -----------------
    expo = Int( x ) ' integer part will later adjust exponent of Alog2(x)
    x -= expo   '  reduced to fraction in range 0.0 thru 0.999
    x = 1.00 + x * ( 0.6557133 + x * 0.3442867 ) ' approx 2^x over range 0 to 1
    *fp += ( expo Shl frac_bits ) ' restore early integer to biased log exponent
    Return x
End Function    ' Warning; avoid x <= 0, infinite or denormalised values

'-----------------------------------------------------------------------
' maximum error rises as magnitude of power rises
' Maximum percentage error = ( pwr + 0.49 ) * 0.54
'     abs    maximum
'     pwr     % error  
'     0.5,   0.560634
'     1.0,   0.852009
'     1.5,   1.057730
'     2.0,   1.245606
'     2.5,   1.612362
'     3.0,   1.638780
'     3.5,   2.136535
'     4.0,   2.317659
'     4.5,   2.646873

'=======================================================================
' set up graphics screen
Dim As Integer sx, sy, sz
Screeninfo sx, sy, sz
sx -= 32 ' shrink
sy -= 24
sz = 4   ' 16 colours
Screenres sx, sy, sz

'--------------------------------------
' graphics scale
Dim As Double lhs = 0.90, rhs = 4.1, lo = -0.5, hi = 0.5
Window ( lhs, lo ) - ( rhs, hi )
Line( lhs,  0 )-( rhs,  0 ), 7
Line( lhs, lo )-( lhs, hi ), 7
Line( rhs, lo )-( rhs, hi ), 7

'=======================================================================
Dim As Double x, y, z, t, q, max = -100, min = 100, scale = 15
Dim As Integer i, k = 7
Print
Print " Power =",
'--------------------------------------
For q = -5.5 to 5.51 step .25
    k += 1                  ' 16 colours
    if k > 15 then k = 8
    Color k
    Print using "####.##"; q;
    '----------------------------------
    For x = lhs To rhs Step ( rhs - lhs ) / ( sx * 2 )
        y = x^q
        z = raise( x, q )
        t = ( z - y ) / y         ' this is the error
        If min > t Then min = t
        If max < t Then max = t
        Pset( x, scale * t ), k
    Next x
Next q

'=======================================================================
For x = 0 To 4
    Draw String ( x+.01, -0.005), Str(x), 15
Next x
color 15
print
print
print using " Error bounds  ####.###### %  ####.###### %"; max * 100; min * 100

'=======================================================================
Sleep
'=======================================================================
Provoni
Posts: 513
Joined: Jan 05, 2014 12:33
Location: Belgium

Re: Need faster Exponentiate

Post by Provoni »

Thanks for your updated function, error table and plotter Richard. Much appreciated!
Richard
Posts: 3096
Joined: Jan 15, 2007 20:44
Location: Australia

Re: Need faster Exponentiate

Post by Richard »

Latest version is slower by two multiplies and two adds.
I have reduced error dependency on the power b, by increasing the order and accuracy of the Log2 approximation.
It is also now less kinky and better behaved.
Use this version when higher power errors are more of a problem than speed.
Maximum error is < ± 0.30% for b < ± 2.5
Maximum error is < ± 0.35% for b < ± 7.0
Maximum error is < ± 0.50% for b < ± 22.
Maximum error is < ± 0.92% for b < ± 60.

Code: Select all

'=======================================================================
' Approximate the power function; written by Richard.
'=======================================================================
' Approximate x ^ b, for x > 0; |b| < 60. Avoid infinite or denormalised x values
Function raise( Byval x As Single, Byval b As Single ) As Single
    ' to raise x to power b, y = x^b, y = Exp2( b * Log2( x ) )
    ' IEEE 754 Single   Seeeeeeeefffffffffffffffffffffff ' Sign_Log_Linear format
    #define expo_bias &b00111111000000000000000000000000uL ' bias = 126 * 2^23
    #define frac_mask &b00000000011111111111111111111111uL ' positive fraction
    #define frac_bits 23 ' bits used to store fraction, ignoring implicit msb
    #define bias 126 ' exponent that puts single into range from 0.500 to 0.999
    Dim As Long Ptr fp = Cptr( Long Ptr, @x ) ' pointer to bit pattern of x
    '-------- approximate Log2(x) --------------------------------------
    Dim As Long expo = ( *fp Shr frac_bits ) - bias  ' extract exponent
    *fp = ( *fp And frac_mask ) Or expo_bias  ' bias x into range 0.5 to 0.999
    #Define a0 -3.5353058e0     ' log2 polynomial approximation
    #Define a1  8.2616852e0     '   over the range from 0.5 to 1.0
    #Define a2 -8.7235965e0     '   maximum error is 1.54e-4 = 0.000154
    #Define a3  5.3685806e0     ' Zeros are located at
    #Define a4 -1.3713634e0     '   0.500, 0.575, 0.720, 0.885, 1.000
    x = b * ( Csng( expo ) + a0 + ( a1 + ( a2 + ( a3 + a4 * x ) * x ) * x  ) * x )
    '-------- approximate 2^x, is antiLog2( x ) ------------------------
    expo = Int( x ) ' this integer part adjusts the exponent of Alog2(x)
    x -= expo   ' x reduced to fraction in range 0.000 thru 0.999
    x = 1e0 + ( 0.6607687e0 + 0.3392313e0 * x ) * x  ' approx 2^x over range 0 to 1
    *fp += ( expo Shl frac_bits ) ' restore early integer to biased log exponent
    Return x
End Function

'    Maximum error is < ± 0.30% for b < ± 2.5
'    Maximum error is < ± 0.35% for b < ± 7.0
'    Maximum error is < ± 0.50% for b < ± 22.
'    Maximum error is < ± 0.92% for b < ± 60.
'    Beyond which the exponent overflows, so it all falls apart
'
'=======================================================================
' set up graphics screen
Dim As Integer sx, sy, sz
Screeninfo sx, sy, sz
sx -= 32 ' shrink
sy -= 24
sz = 4   ' 16 colours
Screenres sx, sy, sz

'--------------------------------------
' graphics scale
Dim As Double x, y, z, t, q 
Dim As Double lhs = 0.8, rhs = 4.2, lo = -0.5, hi = 0.5
Window ( lhs, lo ) - ( rhs, hi )

Line( lhs,  0 )-( rhs,  0 ), 1
For x = 1 To 4
    Line( x, lo )-( x, hi ), 1
Next x

'=======================================================================
' plot the error curves
Dim As Double max = -1e6, min = 1e6, scale = 100
Dim As Integer i, k = 7

Print
Print " Powers =  ";
'--------------------------------------
For q = -2.5 To 2.51 Step 0.1
    k += 1                  ' 16 colours
    If k > 15 Then k = 8
    Color k
    Print Using "####.##"; q;
    '----------------------------------
    For x = lhs To rhs Step ( rhs - lhs ) / ( sx * 2 )
        y = x^q             ' correct
        z = raise( x, q )   ' approx
        t = ( z - y ) / y   ' t is the error
        If min > t Then min = t
        If max < t Then max = t
        Pset( x, scale * t ), k
    Next x
Next q

Line( lhs,  max * scale )-( rhs,  max * scale ), 1
Line( lhs,  min * scale )-( rhs,  min * scale ), 1

'=======================================================================
For x = 0 To 4
    Draw String ( x+.01, -0.005), Str(x), 15
Next x
Color 15
Print
Print
Print Using " Error bounds ####.### % ####.### %"; max * 100; min * 100

'=======================================================================
Sleep
'=======================================================================
Provoni
Posts: 513
Joined: Jan 05, 2014 12:33
Location: Belgium

Re: Need faster Exponentiate

Post by Provoni »

Thanks Richard, I have just released a new version this morning and included a mention of your code in the Readme.txt in the base directory of the program.

viewtopic.php?f=8&t=23188

I have tested both your functions with the programs functionality in mind and prefer the faster version though the difference is very small. Is it possible to further optimize the function if for example the b variable (the power raised to) will never exceed 5?
Richard
Posts: 3096
Joined: Jan 15, 2007 20:44
Location: Australia

Re: Need faster Exponentiate

Post by Richard »

The problem with approximating diadic functions like a^b is that they cover an area rather than a line, so usually require two, or possibly three, approximations to get one answer.

There are a few possible ways to speed it up further.

1. Implement it as macro “inline code” to remove the function call overhead. Use static variables.

2a. Code it in ASM to use only the extended registers inside the CPU. It would probably run at twice the speed, but would then be processor dependent.
2b. Implement four of the approximations at one time, so four packed singles can be processed in parallel in mmx registers and the fpu pipeline.

3. Define constraints to the a and b input ranges, or quantise those values, so lookup table(s) or minimised polynomials could be used.

4. Look at replacing the Int() function call with inline code.

5. Do a larger part of the surrounding math in a logarithmic number system, not in the log_linear exponent_significand floating point system. That will depend on the surrounding code.

6. Use single rather than double throughout the code wherever possible. That only needs to move half the data bits through the processor-memory bottlenecks.

7. Identify if and when a^b is computed more than once with the same value for a. The earlier value of Log2(a) could then be saved as static rather than recomputed.

8. Buy a faster computer or use a GPU for parallel processing.
OSchmidt
Posts: 49
Joined: Jan 01, 2016 12:27
Contact:

Re: Need faster Exponentiate

Post by OSchmidt »

FWIW,
below is a routine which allows to adjust precision (the amount of iterations)...

This can be done via the optional i_0_11 param,
which defaults to 6 - and then ensures a max-error of +-1% (over the whole range of exponents).

With that default-value of i_0_11 = 6 (+-1% error), the routine is about 25% faster than "raise()" -
with i_0_11 = 8 (+- 0.25%) it has about the same speed on my machine.

As for performance...
- I've tested on a (4 year old) intel-mobile-cpu
- fbc 32bit, version 1.06 (with -fpu SSE as the only compiler-switch)

There might be further optimizations possible in the iteration-loop,
(where x is calculated as x = Sqr(x) "kinda recursively") -
but I'll leave that for the true math-experts... ;)

Code: Select all

'an approach which applies the (usually quite fast) runtime-Sqr-function (multiple times)
'the optional i_0_11 allows to adjust the amount of iterations - and thus the precision... 
'with its default: i_0_11=6  the error is +-1% ... with 7 it is half that, a.s.o. up to 11
Function FastPow(ByVal x As Single, ByVal n As Single, ByVal i_0_11 As Long=6) As Single
  Dim i as Long, r As Single = 1
  
  Static L(0 To 11) As Single 
  If L(0) = 0 Then 'fill the Lookup-Table (once, because it's static)
    For i = 0 To 11: L(i) = 1 / 2 ^ (i + 1): Next
  End If
 
  If x = 0 Then return 0 'early exit for 0-values of x
 
  If n > 0 Then 'handle positive exponents
  
    Do Until n < 1: r *= x: n -= 1: Loop
    
    For i = 0 To i_0_11
      x = Sqr(x)
      If n >= L(i) Then r *= x: n -= L(i) 
      If n <= L(i_0_11) Then Exit For 'early exit (n can't get any smaller)
    Next
    
  ElseIf n < 0 Then 'handle the case of neg. exponents
    r = 1 / FastPow(x, -n, i_0_11)
  End If
  
  return r
End Function
HTH

Olaf
Provoni
Posts: 513
Joined: Jan 05, 2014 12:33
Location: Belgium

Re: Need faster Exponentiate

Post by Provoni »

Richard wrote:The problem with approximating diadic functions like a^b is that they cover an area rather than a line, so usually require two, or possibly three, approximations to get one answer.

There are a few possible ways to speed it up further.

1. Implement it as macro “inline code” to remove the function call overhead. Use static variables.

2a. Code it in ASM to use only the extended registers inside the CPU. It would probably run at twice the speed, but would then be processor dependent.
2b. Implement four of the approximations at one time, so four packed singles can be processed in parallel in mmx registers and the fpu pipeline.

3. Define constraints to the a and b input ranges, or quantise those values, so lookup table(s) or minimised polynomials could be used.

4. Look at replacing the Int() function call with inline code.

5. Do a larger part of the surrounding math in a logarithmic number system, not in the log_linear exponent_significand floating point system. That will depend on the surrounding code.

6. Use single rather than double throughout the code wherever possible. That only needs to move half the data bits through the processor-memory bottlenecks.

7. Identify if and when a^b is computed more than once with the same value for a. The earlier value of Log2(a) could then be saved as static rather than recomputed.

8. Buy a faster computer or use a GPU for parallel processing.
Thanks for your performance tips Richard, at least I will implement it as a macro. I have not used macro's before and it will be a good start. Point 7 should be applicable. Some of the other points are above my skill level.
Provoni
Posts: 513
Joined: Jan 05, 2014 12:33
Location: Belgium

Re: Need faster Exponentiate

Post by Provoni »

Thanks OSchmidt for your function,

I have tested it and it is a bit slower than Richard's for my program. As you say it could probably be optimized a little bit further and have not tried altering the i_0_11 parameter yet.
OSchmidt
Posts: 49
Joined: Jan 01, 2016 12:27
Contact:

Re: Need faster Exponentiate

Post by OSchmidt »

As said, it gets faster with decreasing values in i_0_11.

With:
- i_0_11=5 instead of the default 6, you'll save one iteration - and the error will be doubled to about +-2%
- i_0_11=4 instead of the default 6, you'll save two iterations - and the error will be about +-4%

Would like to hear, how it performs when you try it with i_0_11=4 (since this roughly matches the err-interval of the "prior raise()".
Did you ensure the compiler-swich: -fpu SSE already?

Regards,

Olaf
Richard
Posts: 3096
Joined: Jan 15, 2007 20:44
Location: Australia

Re: Need faster Exponentiate

Post by Richard »

Code: Select all

' Compare the single precision  x^b  with algorithm under test
'  reference  ref time   test      test time      ratio     Error bounds
'  single_pwr   69.8ns   FastPow11  154.7ns      221.67%    0.03125%
'  single_pwr   69.8ns   FastPow10  141.5ns      202.68%    0.0625%
'  single_pwr   69.7ns   FastPow9   127.0ns      182.23%    0.125%
'  single_pwr   69.7ns   FastPow8   112.3ns      161.07%    0.25%
'  single_pwr   69.7ns   FastPow7    99.8ns      143.11%    0.5%
'  single_pwr   69.7ns   FastPow6    87.6ns      125.68%    1.%
'  single_pwr   69.7ns   FastPow5    75.5ns      108.31%    2.%
'  single_pwr   69.7ns   FastPow4    63.3ns       90.81%    4.%
'  single_pwr   69.8ns   FastPow3    52.0ns       74.42%    8.%
'  single_pwr   69.8ns   FastPow2    41.4ns       59.36%   16.%
'  single_pwr   69.9ns   FastPow1    32.5ns       46.52%   32.%
'  single_pwr   69.8ns   Raise       41.7ns       59.72%    0.5%
Note that, Unlike Raise(), the FastPow() error bounds increase with x. I tested to x = 4

Later model CPUs map the old external x87 register stack onto the internal mmx register block. Access time is therefore the same for each system. Compiler option -fpu (sse/x87) no longer makes a consistent difference.
Post Reply