Fast GCD algorithm

Post your FreeBASIC source, examples, tips and tricks here. Please don’t post code without including an explanation.
Post Reply
Quadrescence
Posts: 10
Joined: Jul 30, 2008 17:28
Location: Minnesota, USA
Contact:

Fast GCD algorithm

Post by Quadrescence »

A fast binary GCD. I'm sure someone like Mysoft could ASM-ize it and make it faster. Note that it uses pretty "processor-basic" instructions. :)

For those who don't know, GCD(u,v) is the greatest common denominator between u and v. That is, the largest number that goes into both of them.

A good application is simplifying fractions. Suppose we have a fraction a/b that is not simplified. Then we could easily simplify by calculating c = GCD(a,b), and then setting x = a/c and y = b/c, which gives us the fraction x/y, which is indeed equal to (albeit simpler than) a/b.

Anyway, onto the code.

Code: Select all

'binary GCD function
function gcd(u as uinteger, v as uinteger) as uinteger
	dim as uinteger sft=0, tmp=0

	if (u*v = 0) then return 0
    
	while (((u or v) and 1) = 0) 
		sft += 1
		u shr= 1
		v shr= 1
	wend

	while ((u and 1) = 0)
		u shr= 1
	wend

	do
		while ((v and 1) = 0)
			v shr= 1
		wend

		if (u < v) then
			v -= u
		else
			tmp = u - v
			u = v
			v = tmp
		end if
        
		v shr= 1
	loop while (v <> 0)

	return (u shl sft)
end function
MichaelW
Posts: 3500
Joined: May 16, 2006 22:34
Location: USA

Post by MichaelW »

I made a faster version the easy way, by using a C source and having GCC optimize it. A faster ASM version should be possible, but probably not without substantially more effort and time.

This is the C source (compiled with gcc -O3 -c gcd_c.c):

Code: Select all

/************************************************************************
http://en.wikipedia.org/wiki/Binary_GCD_algorithm#Implementation_in_C
************************************************************************/

unsigned int gcd_c(unsigned int u, unsigned int v)
{
    int shift;

    /* GCD(0,x) := x */
    if (u == 0 || v == 0)
      return u | v;

    /* Let shift := lg K, where K is the greatest power of 2
       dividing both u and v. */
    for (shift = 0; ((u | v) & 1) == 0; ++shift) {
        u >>= 1;
        v >>= 1;
    }

    while ((u & 1) == 0)
      u >>= 1;

    /* From here on, u is always odd. */
    do {
        while ((v & 1) == 0)  /* Loop X */
          v >>= 1;

        /* Now u and v are both odd, so diff(u, v) is even.
           Let u = min(u, v), v = diff(u, v)/2. */
        if (u < v) {
            v -= u;
        } else {
            unsigned int diff = u - v;
            u = v;
            v = diff;
        }
        v >>= 1;
    } while (v != 0);

    return u << shift;
}
And this is the FB test app (include -a gcd_c.o on the FBC command line to link in the C object module).

Code: Select all

'====================================================================
#include "windows.bi"
#include "counter.bas"
'====================================================================
'' Counter.bas is available here:
''
'' http://www.freebasic.net/forum/viewtopic.php?t=4221
'====================================================================

declare function gcd_c cdecl alias "gcd_c"( byval as uinteger, _
                                            byval as uinteger ) _
                                            as uinteger

function gcd(u as uinteger, v as uinteger) as uinteger
  dim as uinteger sft=0, tmp=0

  ''if (u*v = 0) then return 0
  if u = 0 or v = 0 then return u or v

  while (((u or v) and 1) = 0)
    sft += 1
    u shr= 1
    v shr= 1
  wend

  while ((u and 1) = 0)
    u shr= 1
  wend

  do
    while ((v and 1) = 0)
      v shr= 1
    wend

    if (u < v) then
      v -= u
    else
      tmp = u - v
      u = v
      v = tmp
    end if

    v shr= 1
  loop while (v <> 0)

  return (u shl sft)

end function

dim as uinteger i, e, u, v, r

for i = 1 to 1000000
  u = rnd * 1000000
  v = rnd * 1000000
  r = gcd( u, v )
  if gcd_c( u, v ) <> r then
    print "error at ";u;",";v
    e = 1
  end if
next
if e = 0 then print "OK"

counter_begin( 1000, HIGH_PRIORITY_CLASS )
  gcd( 123, 456 )
counter_end
print counter_cycles; " cycles"

counter_begin( 1000, HIGH_PRIORITY_CLASS )
  gcd_c( 123, 456 )
counter_end
print counter_cycles; " cycles"

sleep
And these are the results on my P3 system:

Code: Select all

OK
222 cycles
79 cycles
I did not attempt to verify that the returned GCDs are correct, I simply modified the FB source so in my tests it would return the same values as the C version.
Mysoft
Posts: 836
Joined: Jul 28, 2005 13:56
Location: Brazil, Santa Catarina, Indaial (ouch!)
Contact:

Post by Mysoft »

here it is my attempts...
a simple asm conversion... and then some 2... or 3 optimizations
i think this code in especial can be better optimized by aligning...
but i didnt tried that except from one jump that was badly misaligned :P (got 7 cycles there!)

Code: Select all

#include "Counter.bas"

Function gcd(u As Uinteger, v As Uinteger) As Uinteger  
  Dim As Uinteger sft=0, tmp=0
  
  If (u*v = 0) Then Return 0
  
  While (((u Or v) And 1) = 0) 
    sft += 1
    u Shr= 1
    v Shr= 1
  Wend
  
  While ((u And 1) = 0)
    u Shr= 1
  Wend
  
  Do
    While ((v And 1) = 0)
      v Shr= 1
    Wend
    
    If (u < v) Then
      v -= u
    Else
      tmp = u - v
      u = v
      v = tmp
    End If
    
    v Shr= 1
  Loop While (v <> 0)
  
  Return (u Shl sft)
End Function

' ******************************************************
' ******************************************************
' ******************************************************

Function gcd_asm(u As Uinteger, v As Uinteger) As Uinteger  
  
  asm
    mov eax,[U]         ' \
    or eax,eax          ' |
    jz _GCD_END_        ' | if (u*v=0) then return
    mov ebx,[V]         ' |
    or ebx,ebx          ' |
    jz _GCD_END_        ' /
    
    xor ecx,ecx         ' Dim As Uinteger sft=0
    
    _WHILE1_BEGIN_:     ' \
    mov edx,eax         ' |
    or edx,ebx          ' | While (((u Or v) And 1) = 0)
    test edx,1          ' |
    jnz _WHILE1_END_    ' /
    inc ecx                ' sft += 1
    shr eax,1              ' u Shr= 1
    shr ebx,1              ' v Shr= 1
    jmp _WHILE1_BEGIN_  ' \
    _WHILE1_END_:       ' / Wend
    
    _WHILE2_BEGIN_:     ' \
    test eax,1          ' | While ((u And 1) = 0)
    jnz _WHILE2_END_    ' /
    shr eax,1              ' u Shr= 1
    jmp _WHILE2_BEGIN_  ' \
    _WHILE2_END_:       ' / 
     
    _DO1_BEGIN:        ' do
    
    _WHILE3_BEGIN_:       ' \ while ((v And 1) = 0)
    test ebx,1             ' |
    jnz _WHILE3_END_      ' /
    shr ebx,1                 ' v Shr= 1
    jmp _WHILE3_BEGIN_    ' \
    _WHILE3_END_:         ' / wend
    
    cmp eax,ebx            ' \ If (u < v) Then
    jae _IF_ELSE1_         ' /
    sub ebx,eax               ' \
    jmp _IF_END1_             ' / v -= u
    _IF_ELSE1_:           ' Else
    mov edx,eax               ' \
    sub edx,ebx               ' / tmp = u - v
    mov eax,ebx               ' u = v
    mov ebx,edx               ' v = tmp
    _IF_END1_:            ' end if
    
    shr ebx,1              ' v Shr= 1
    
    jnz _ODO1_BEGIN     ' Loop While (v <> 0)   
    
    shl eax,cl          ' \
    mov [FUNCTION],eax  ' | Return (u Shl sft)
    _GCD_END_:          ' /
    
  end asm    
  
End Function

' **********************************************************************
' **********************************************************************
' **********************************************************************

Function gcd_asm_optimized(u As Uinteger, v As Uinteger) As Uinteger  
  
  asm
    
    mov eax,[U]         ' \
    or eax,eax          ' |
    jz _OGCD_END_       ' | if (u*v=0) then return
    mov ebx,[V]         ' |
    or ebx,ebx          ' |
    jz _OGCD_END_       ' /
    
    xor ecx,ecx         ' Dim As Uinteger sft=0
    mov edx,eax
    or edx,ebx
     _OWHILE1_BEGIN_:    ' \
    test edx,1           ' |
    jnz _OWHILE1_ENDZ_   ' | While (((u Or v) And 1) = 0)
    test edx,2           ' |
    jnz _OWHILE1_END_    ' | 
    add ecx,2              ' sft += 1
    shr edx,2              
    shr eax,2              ' u Shr= 1
    shr ebx,2              ' v Shr= 1
    jmp _OWHILE1_BEGIN_  ' \
    _OWHILE1_END_:       ' / Wend
    shr eax,1
    shr ebx,1
    inc ecx    
    _OWHILE1_ENDZ_:    
    
    _OWHILE2_BEGIN_:    ' \
    test eax,1          ' | While ((u And 1) = 0)
    jnz _OWHILE2_ENDZ_  ' /
    test eax,2          ' | While ((u And 1) = 0)
    jnz _OWHILE2_END_   ' /
    shr eax,1              ' u Shr= 1
    jmp _OWHILE2_BEGIN_ ' \
    nop
    nop
    nop
    nop
    nop
    _OWHILE2_END_:      ' / 
    shr eax,1
    _OWHILE2_ENDZ_:
    
    _ODO1_BEGIN:        ' do
    
    _OWHILE3_BEGIN_:       ' \ while ((v And 1) = 0)
    test ebx,1             ' |
    jnz _OWHILE3_ENDZ_     ' /
    test ebx,2             ' |
    jnz _OWHILE3_END_      ' /
    shr ebx,2                 ' v Shr= 1
    jmp _OWHILE3_BEGIN_    ' \
    _OWHILE3_END_:         ' / wend
    shr ebx,1
    _OWHILE3_ENDZ_:
    
    mov edx,ebx            ' \ If (u < v) Then
    sub ebx,eax               ' \
    jns _OIF_END1_             ' / v -= u
    '_OIF_ELSE1_:           ' Else
    mov ebx,eax               ' \
    sub ebx,edx               ' / tmp<v> = u - v
    mov eax,edx               ' u = v
    _OIF_END1_:            ' end if
        
    shr ebx,1              ' v Shr= 1
    jnz _ODO1_BEGIN        ' Loop While (v <> 0)   
    
    shl eax,cl          ' \
    mov [FUNCTION],eax  ' | Return (u Shl sft)
    _OGCD_END_:         ' /
    
  end asm    
  
End Function

dim as integer RESU

'--------------------------------------------------
#define Counter_Start Counter_Begin( 1000, HIGH_PRIORITY_CLASS )
'#define PARM &h22000,&hFF0000
#define PARM 123,456

Counter_Start
  RESU = gcd( PARM )
Counter_End
Print counter_cycles; " cycles",,RESU

Counter_Start
  RESU = gcd_asm( PARM )
Counter_End
Print counter_cycles; " cycles (asm)",RESU

Counter_Start
  RESU = gcd_asm_optimized( PARM )
Counter_End
Print counter_cycles; " cycles (asm optimized)",RESU

sleep

and so... my results here:

Code: Select all

128 cycles                   3
62 cycles (asm)              3
51 cycles (asm optimized)    3
DrV
Site Admin
Posts: 2116
Joined: May 27, 2005 18:39
Location: Midwestern USA
Contact:

Post by DrV »

Seems a bit excessive to me.. have you tried a straightforward implementation of the Euclidean algorithm? I would bet the remainder would be faster than the repeated subtraction, especially for larger numbers.

Code: Select all

Function gcd2(u As Uinteger, v As Uinteger) As Uinteger
     do while v <> 0
         var t = v
         v = u mod v
         u = t
     loop
     return u
end function
Mysoft
Posts: 836
Joined: Jul 28, 2005 13:56
Location: Brazil, Santa Catarina, Indaial (ouch!)
Contact:

Post by Mysoft »

isnt subtraction.... is "shr"

also... each mod will be 20 cycles at least...
so i dont think it will be faster....

i got 314 cycles with your code DrV :P
MichaelW
Posts: 3500
Joined: May 16, 2006 22:34
Location: USA

Post by MichaelW »

Despite the slow DIVs, the Euclidean algorithm is much faster for large numbers. In my tests the break-even point between an ASM version and the optimized C code was about 7 decimal digits.

BTW, on my P3 system gcd_asm_optimized runs in 71 cycles, versus 79 cycles for the optimized C version.
MichaelW
Posts: 3500
Joined: May 16, 2006 22:34
Location: USA

Post by MichaelW »

I came across a small-code version that a member had posted to MASM Forum and played with it a bit.
Here is the FB version:

Code: Select all

function gcd_sub(u as uinteger, v as uinteger) as uinteger
  if u = 0 or v = 0 then return u or v
  while u <> v
    if u > v then u -= v else v -= u
  wend
  return u
end function
And a C version:

Code: Select all

unsigned int gcd_sub_c(unsigned int u, unsigned int v)
{
  if( u == 0 || v == 0)
    return u | v;
  while( u != v )
  {
    if( u > v )
      u -= v;
    else
      v -= u;
  }
  return u;
}
And my ASM version:

Code: Select all

.intel_syntax noprefix
.global _GCD_SUB_ASM@8
.data
.text
.balign 16
_GCD_SUB_ASM@8:
    mov edx, [esp+8]  /* v */
    mov eax, [esp+4]  /* u */
    test eax, eax
    jnz L0
    or eax, edx
    ret 8
  L0:
    test edx, edx
    jnz L1
    or eax, edx
    ret 8
  L1:
    cmp eax, edx
    jz  L3
    jb  L2
    sub eax, edx
    jmp L1
  L2:
    sub edx, eax
    jmp L1
  L3:
    ret 8
The cycle counts on my P3, using the previous test values, were 192, 109, and 81.
Mysoft
Posts: 836
Joined: Jul 28, 2005 13:56
Location: Brazil, Santa Catarina, Indaial (ouch!)
Contact:

Post by Mysoft »

not bad... for a simple subtraction... i will be looking on that a bit more for fun... i think i can achieve the goal of <60 cycles

(in your P3)
Mysoft
Posts: 836
Joined: Jul 28, 2005 13:56
Location: Brazil, Santa Catarina, Indaial (ouch!)
Contact:

Post by Mysoft »

here it is... my last version on this topic :P
i dont think i will be able to do better than this... maybe thinking in a better general algorithm (similar what you did with subtraction)

the code is a little mess...
but whatever... now its a naked function and aligned code
it also includes DrV basic code...

Code: Select all

#include "Counter.bas"

dim gcd_asm_mysoft_optimized as function (u As Uinteger, v As Uinteger) As Uinteger
asm mov dword ptr [gcd_asm_mysoft_optimized],offset GcdAsmMysoftOptimized

Goto _Test_Begin_
' ******************************************************
' ******************************************************
' ******************************************************
Function gcd_quad(u As Uinteger, v As Uinteger) As Uinteger  
  Dim As Uinteger sft=0, tmp=0
  
  If (u or v) = 0 Then Return 0
  
  While (((u Or v) And 1) = 0) 
    sft += 1
    u Shr= 1
    v Shr= 1
  Wend
  
  While ((u And 1) = 0)
    u Shr= 1
  Wend
  
  Do
    While ((v And 1) = 0)
      v Shr= 1
    Wend
    
    If (u < v) Then
      v -= u
    Else
      tmp = u - v
      u = v
      v = tmp
    End If
    
    v Shr= 1
  Loop While (v <> 0)
  
  Return (u Shl sft)
End Function

' *******************************************************
' *******************************************************
' *******************************************************

'function gcd_asm_mysoft_optimized (u As Uinteger, v As Uinteger) As Uinteger

asm
  .align 16
  GcdAsmMysoftOptimized:
  mov eax,[ESP+4]     ' \ u
  mov ebx,[ESP+8]     ' | v
  mov edx,eax
  mov cl,31
  or edx,ebx
  jz _OGCD_END_       ' | if (u*v=0) then return
  xor ecx,ecx         ' Dim As Uinteger sft=0
  
  test dl,1
  jnz _OWHILE2_BEGIN_
  .align 4
  _OWHILE1_BEGIN_:     ' \
  shr edx
  jc _OWHILE1_ENDZ_    ' | While (((u Or v) And 1) = 0)
  shr edx              ' " u Shr= 1 / v Shr= 1 "
  jc _OWHILE1_END_     ' | 
  add cl,2            ' sft += 1
  jmp _OWHILE1_BEGIN_  ' \
  .align 16
  _OWHILE1_END_:       ' / Wend
  inc cl 'add cl,1
  _OWHILE1_ENDZ_:
  shr eax,cl
  shr ebx,cl
  
  .align 4
  _OWHILE2_BEGIN_:    ' \
  test al,1           ' | While ((u And 1) = 0)
  jnz _OWHILE2_ENDZ_  ' /
  test al,2           ' | While ((u And 1) = 0)
  jnz _OWHILE2_END_   ' /
  shr eax,2              ' u Shr= 1
  jmp _OWHILE2_BEGIN_ ' \
  .align 16
  _OWHILE2_END_:      ' / 
  shr eax,1  
  _OWHILE2_ENDZ_:
    
  _ODO1_BEGIN:        ' do
  
  .align 8
  _OWHILE3_BEGIN_:       ' \ while ((v And 1) = 0)
  test bl,1              ' |
  jnz _OWHILE3_ENDZ_     ' /
  test bl,2              ' |
  jnz _OWHILE3_END_      ' /
  shr ebx,2                 ' v Shr= 1
  jmp _OWHILE3_BEGIN_    ' \
  .align 8
  _OWHILE3_END_:         ' / wend
  shr ebx,1
  _OWHILE3_ENDZ_:
  
  mov edx,ebx            ' \ If (u < v) Then
  sub ebx,eax               ' \
  jns _OIF_END1_             ' / v -= u
  '_OIF_ELSE1_:           ' Else
  mov ebx,eax               ' \
  sub ebx,edx               ' / tmp<v> = u - v
  mov eax,edx               ' u = v
  .align 4
  _OIF_END1_:            ' end if
  
  shr ebx                ' v Shr= 1
  jnz _ODO1_BEGIN        ' Loop While (v <> 0)   
    
  'mov [FUNCTION],eax  ' | Return (u Shl sft)
  _OGCD_END_:         ' \
  shl eax,cl          ' /
  ret 8
  
end asm    

'End Function

' ************************************************
' ***********************************************
' **************************************************

Function gcd_DrV(u As Uinteger, v As Uinteger) As Uinteger
  Do While v <> 0
    var t = v
    v = u Mod v
    u = t
  Loop
  Return u
End Function

_Test_Begin_:

dim as integer RESU

'--------------------------------------------------
#define Counter_Start Counter_Begin( 10000, HIGH_PRIORITY_CLASS )
'#define PARM &h22000,&hFF0000
#define PARM 123,456

dim as integer R0=1000,R1=1000,R2=1000

do
  
  locate 1,1,0

Counter_Start
RESU = gcd_DrV( PARM )
Counter_End
if counter_cycles < R0 then R0 = counter_cycles
Print R0; " cycles (basic DrV)",RESU

sleep 10

Counter_Start
RESU = gcd_Quad( PARM )
Counter_End
if counter_cycles < R1 then R1 = counter_cycles
Print R1; " cycles (basic quad.)",RESU

sleep 10

Counter_Start
RESU = gcd_asm_mysoft_optimized( PARM )
Counter_End
if counter_cycles < R2 then R2 = counter_cycles
Print R2; " cycles (asm optimized)",RESU

sleep 10

locate 8,1
print "Any key to end"

loop until inkey$ <> ""
and My results here:

Code: Select all

 314 cycles (basic DrV)      3
 127 cycles (basic quad.)    3
 40 cycles (asm optimized)   3
i hope that everything is fine... and im hoping that maybe that will achieve ~65cycles there... even that im optimizing on AMD, you're running an intel hahah ;P
MichaelW
Posts: 3500
Joined: May 16, 2006 22:34
Location: USA

Post by MichaelW »

The AMD processors and the P3 are reasonably similar - the P4 is the oddball one.

Code: Select all

 353 cycles (basic DrV)      3
 163 cycles (basic quad.)    3
 46 cycles (asm optimized)   3
If you precede your first test with a 3-4 second sleep, to allow the system activities associated with launching an application to subside, this should eliminate most of the run to run variation.
srvaldez
Posts: 3379
Joined: Sep 25, 2005 21:54

Post by srvaldez »

I get strange timings which may be due to running Windows in a virtual machine, the numbers start at some high number like 1000 or 512 but rapidly diminish to 0 or 8.
Mysoft
Posts: 836
Joined: Jul 28, 2005 13:56
Location: Brazil, Santa Catarina, Indaial (ouch!)
Contact:

Post by Mysoft »

hum... as far i know Virtual Machnes... like that doesnt emulate ASM (just add layers)... but it seems that it's emulating RTDSC with a timer... (caching) and then retrieving... you probabily want to test with an accumulation timer (like how many "ns" it takes to execute the sub 10k or 100k times) to then get some comparsion...

but probabily Michael will have more info about that... :P
MichaelW
Posts: 3500
Joined: May 16, 2006 22:34
Location: USA

Post by MichaelW »

To return an accurate count, the cycle-count code needs to have exclusive use of the CPU for a full time slice, at least once during the test. That is the point of increasing the priority class and starting a new time slice for each counting loop. If the VM does not accurately duplicate the system behaviors that Windows expects, then the cycle counts will be more or less inaccurate. It might be instructive to replace HIGH_PRIORITY_CLASS with REALTIME_PRIORITY_CLASS, but note that this normally entails some risk of Window hanging if the code under test contains bugs. Ultimately, the only way I can see that the cycle-count code could work well in a VM would be if the TSC (Time Stamp Counter) were virtualized.
Post Reply