Need speedup

New to FreeBASIC? Post your questions here.
dodicat
Posts: 7976
Joined: Jan 10, 2006 20:30
Location: Scotland

Re: Need speedup

Post by dodicat »

I put
dfy=dfy+df*(dy/Rglue):dfx=dfx+df*(dx/Rglue)
into the outer loop with a flag.
Also compared select case with ordinary branching.

(Set up the arrays with some values so the code runs).
Tried to keep out of Case 0 To 10 (only 7 passes through this)

Code: Select all

 
Const nmass=10000

Redim Shared As Single x(nmass),y(nmass),mc(nmass),vy(nmass),vx(nmass)',kcnt=-7000,dt=.9
Redim Shared As Long x_ERR_m(nmass),x_ERR_n(nmass)
Dim Shared As Single kcnt=-7000,dt=.9
For n As Long=0 To nmass
    x(n)=n*n
    y(n)=n*n
    vx(n)=n
    vy(n)=n
Next
Dim Shared As Long NUM_ERROR
Dim Shared As Long n_ERR_part,m_ERR_part 


Sub vals1 ()
    Dim As Long n,m,flag
    Dim As Single dx,dy ,a,df,dfy,dfx,xm,ym
    Dim As Single Rglue ,Rgquadro
    For n  =0 To Nmass
        dfx=0
        dfy=0
        flag=0
        For m =0 To Nmass
            If n<>m Then 
                flag=1
                dx=x(m)-x(n):dy=y(m)-y(n)
                Rgquadro=(dx*dx+dy*dy): Rglue=Sqr (Rgquadro)
                Select Case Rglue
                Case Is >140 
                    df=9512/((Rgquadro))                             
                Case 10 To 140
                    df=((-1e12 /Rgquadro)/Rgquadro/Rgquadro)+.4
                Case 0 To 10 
                    df=0 : NUM_ERROR=NUM_ERROR+1
                    x_ERR_m(m) =x(m):x_ERR_m(m) =y(m):
                    x_ERR_n(n) =x(n):x_ERR_n(n) =y(n):
                    n_ERR_part =n
                    m_ERR_part =m
                End Select
                
                df=df*mc(m)
                 'dfy=dfy+df*(dy/Rglue):dfx=dfx+df*(dx/Rglue)                             
            End If
        Next m
        
        If flag Then dfy=dfy+df*(dy/Rglue):dfx=dfx+df*(dx/Rglue)  
        dfy=dfy-y(n)*Kcnt:dfx=dfx-x(n) *Kcnt   rem   Centering power
        
        vy(n)=vy(n)*.99999:vx(n)=vx(n)*.99999         rem dissipation
        
        vy(n)=vy(n)+dfy*dt:vx(n)=vx(n)+dfx*dt           
        y(n)=y(n)+vy(n)*dt   :   x(n)=x(n)+vx(n)*dt
        dx=x(m)-x(n):dy=y(m)-y(n) 
    Next n
End Sub

Sub vals2 ()
    Dim As Long n,m,flag
    Dim As Single dx,dy ,a,df,dfy,dfx,xm,ym
    Dim As Single Rglue ,Rgquadro
    For n  =0 To Nmass
        dfx=0
        dfy=0
        flag=0
        For m =0 To Nmass
            If n<>m Then 
                flag=1
                dx=x(m)-x(n):dy=y(m)-y(n)
                Rgquadro=(dx*dx+dy*dy): Rglue=Sqr (Rgquadro)
                If rglue>140  Then    df=9512/((Rgquadro)):Goto skip 
                
                If rglue>=10 Then
                    If rglue<=140 Then
                         'df=(-1e12 /(Rgquadro*Rgquadro*Rgquadro))+.4:goto skip 'Rock's idea
                        df=((-1e12 /Rgquadro)/Rgquadro/Rgquadro)+.4 :Goto skip
                    End If
                End If
                
                If rglue >=0 Then
                    If rglue<=10 Then
                        df=0 : NUM_ERROR=NUM_ERROR+1
                        x_ERR_m(m) =x(m):x_ERR_m(m) =y(m):
                        x_ERR_n(n) =x(n):x_ERR_n(n) =y(n):
                        n_ERR_part =n
                        m_ERR_part =m
                    End If
                End If
                
                skip:
                df=df*mc(m)
                ' dfy=dfy+df*(dy/Rglue):dfx=dfx+df*(dx/Rglue)                             
            End If

        Next m
        
        If flag Then dfy=dfy+df*(dy/Rglue):dfx=dfx+df*(dx/Rglue)  
        dfy=dfy-y(n)*Kcnt:dfx=dfx-x(n) *Kcnt   rem   Centering power
        
        vy(n)=vy(n)*.99999:vx(n)=vx(n)*.99999         rem dissipation
        
        vy(n)=vy(n)+dfy*dt:vx(n)=vx(n)+dfx*dt           
        y(n)=y(n)+vy(n)*dt   :   x(n)=x(n)+vx(n)*dt
        dx=x(m)-x(n):dy=y(m)-y(n) 
    Next n
End Sub

print "warming up . . ."

vals1


dim as double t
for k as long=1 to 5
    
Redim x(nmass),y(nmass),mc(nmass),vy(nmass),vx(nmass)',kcnt=-7000,dt=.9
Redim x_ERR_m(nmass),x_ERR_n(nmass),vy(nmass),vx(nmass)  
 kcnt=-7000:dt=.9
NUM_ERROR=0
n_ERR_part=0:m_ERR_part =0
For n As Long=0 To nmass
    x(n)=n*n
    y(n)=n*n
    vx(n)=n
    vy(n)=n
Next   
    
 t=Timer
vals1
Print Timer-t;" method 1",
Print "error ";num_error


Redim x(nmass),y(nmass),mc(nmass),vy(nmass),vx(nmass)',kcnt=-7000,dt=.9
Redim x_ERR_m(nmass),x_ERR_n(nmass),vy(nmass),vx(nmass)
kcnt=-7000:dt=.9
NUM_ERROR=0
n_ERR_part=0:m_ERR_part =0
For n As Long=0 To nmass
    x(n)=n*n
    y(n)=n*n
    vx(n)=n
    vy(n)=n
Next
t=Timer
vals2
Print Timer-t;" method 2",
Print "error ";num_error
print
next k
print "done"

Sleep

 
Provoni
Posts: 513
Joined: Jan 05, 2014 12:33
Location: Belgium

Re: Need speedup

Post by Provoni »

Provoni wrote: @12val12newakk

You may want to try to replace:

Code: Select all

Select Case Rglue
with:

Code: Select all

Select Case Rgquadro
and change the select case values accordingly. Removing the Rglue dependency may improve the CPU's out of order processing and speed up the code.
Like so:

Code: Select all

Select Case rgquadro 'Rglue
	case Is >19600
     	df=9512/((Rgquadro))
 	Case 100 To 19600
     	df=((-1e12 /Rgquadro)/Rgquadro/Rgquadro)+.4
 	Case 0 To 100
     	df=0 : NUM_ERROR=NUM_ERROR+1
     	x_ERR_m(m) =x(m):x_ERR_m(m) =y(m):
     	x_ERR_n(n) =x(n):x_ERR_n(n) =y(n):
     	n_ERR_part =n
     	m_ERR_part =m
end select
It's faster (tested).
dodicat
Posts: 7976
Joined: Jan 10, 2006 20:30
Location: Scotland

Re: Need speedup

Post by dodicat »

Tested, yes faster with select case Rgquadro
Also rglue isn't needed in the inner loop so it can be moved to the outer loop:-
. . .
Next m

If flag Then Rglue=Sqr (Rgquadro):dfy=dfy+df*(dy/Rglue):dfx=dfx+df*(dx/Rglue)
. . .
12val12newakk
Posts: 35
Joined: Nov 14, 2019 17:04

Re: Need speedup

Post by 12val12newakk »

this worked

Code: Select all

  sub Dr_Peters ( dt as single)  
         dim as single df,dx,dy,df_N,df_M
         dim as single dfx_N,dfy_N
         dim as single dfx_M,dfy_M
         dim as single Rg ,Rgquadro,Rglue
        dim as  single reciprocQDR,reciprocSQR
         dim as single dyR,dxR
         dfMAX=0
         
         for n=0 to Nmass-1
                 for m = n+1  to Nmass
                     df_N=0:df_M=0 :
                     dfx_N=0:dfy_N=0
                     dfx_M=0:dfy_M=0
                               dx=x(m)-x(n):dy=y(m)-y(n)
                                  Rgquadro=(dx*dx+dy*dy)
                                  reciprocQDR=1/Rgquadro
                                  reciprocSQR=sqr (reciprocQDR)
                                  Rglue=1/reciprocSQR
                                  
                         Select Case   Rglue
                    Case is >140   
                        df=9512*reciprocQDR                             
                    Case 5 to 140  
                       df=(-1e12 *reciprocQDR*reciprocQDR*reciprocQDR )+.39
                       Case 0 to 5  
                       df=0 : NUM_ERROR=NUM_ERROR+1 
                     End Select
                     
                            df_N= df*mc(m)
                    if  (mc(m)=mc(n)) THEN df_M=-df_N    else    df_M=-df*mc(n)
                    
                    if (ABS(df_N)>dfMAX ) then dfMAX =ABS(df_N)
                    if (ABS(df_M)>dfMAX ) then dfMAX =ABS(df_M)
                    
                dyR= dy*reciprocSQR :dxR= dx*reciprocSQR
                
                dfy_N=dfy_N+df_N*dyR:dfx_N=dfx_N+df_N*dxR  
                dfy_M=dfy_M+df_M*dyR:dfx_M=dfx_M+df_M*dxR   
                
                dfy_N=dfy_N-y(n)*Kcnt:  dfx_N=dfx_N-x(n)*Kcnt
                dfy_M=dfy_M-y(m)*Kcnt:  dfx_M=dfx_M-x(M)*Kcnt
           
              vy(n)=vy(n)+dfy_N*dt :    vx(n)=vx(n)+dfx_N*dt
              vy(m)=vy(m)+dfy_M*dt :    vx(m)=vx(m)+dfx_M*dt
                          
              y(n)=y(n)+vy(n)*dt     :    x(n)=x(n)+vx(n)*dt      
              y(m)=y(m)+vy(m)*dt     :    x(m)=x(m)+vx(m)*dt 
            
         next m  
         next n
         for n=0 to Nmass  rem  dampfer 
              vx(n)=vx(n)*(1-dt*.125)   
              vx(n)=vx(n)*(1-dt*.125)   
         next n        
    end sub
    
For t =0 to 7777 step dtt
Array_to_Rezerv ()

Dr_Peters ( dtt)
if dfMAX*dtt > .5 then rem step back
t=t-dtt : Rezerv_to_Array ()
dtt=dtt*.125
else
dtt=dtt*1.1
end if
next t

Code: Select all



how to split SUB on threads  .. (and + AVX)
srvaldez
Posts: 3373
Joined: Sep 25, 2005 21:54

Re: Need speedup

Post by srvaldez »

if your program is doing a lot of floating-point divisions and square roots then reducing the fpu precision might give some speed increase, example of reducing the fpu precision to single precision

Code: Select all

'place at top of your code
dim as ushort oldcw, cwdouble=&h27F, cwsingle=&h7F

asm
  fstcw word ptr [oldcw]
  fldcw word ptr [cwsingle] 'set FPU precision to single
end asm
then the rest of your code follows, and just before exiting your program you restore the fpu

Code: Select all

asm
  fldcw word ptr [oldcw] 'restore control word
end asm
please note that this only works if using the fpu and then only the functions division and square root are faster
D.J.Peters
Posts: 8586
Joined: May 28, 2005 3:28
Contact:

Re: Need speedup

Post by D.J.Peters »

D.J.Peters wrote:compare yours :-(

Code: Select all

for n = first to last
   for m = first to last
    if (n<>m) then calc_forces_gravitiy_with_cell(n,m)
  next
next
with this :-)

Code: Select all

for n = first to last-1
  for m = n+1 to last
    calc_forces_gravitiy_with_cell(n,m)
  next
next
12val12newakk wrote:D.J.Peters your code wrong
D.J.Peters wrote:may be you don't understand my hint in scope of speed !
12val12newakk wrote:this worked

Code: Select all

for n=0 to Nmass-1
  for m = n+1  to Nmass
  ...          
  next m  
next n
I know I have done it more than 50 times in the last three decades ;-)

Joshy
12val12newakk
Posts: 35
Joined: Nov 14, 2019 17:04

Re: Need speedup

Post by 12val12newakk »

Code: Select all

rem=======================================  
rem datalist  input  output from sub      ( all  single (float32))
rem  dt ,dfmax  , Rmin
rem  global   x (16384) :dim shared  as single   y (16384)
rem  global  vx (16384) :dim shared  as single  vy (16384)
rem  global  ax (16384) :dim shared  as single  ay (16384)  

sub Lannard_1  ( dt as single)  
         dim as single  df,dx,dy,df_N,df_M
         dim as single  Rg ,Rgquadro,Rglue
         dim as single RCPRglue ,Ri6
         dim as single dyR,dxR
         dim as single  dtl
         dim as  UShort CNTR 
                dfMAX=0: Rmin =99999
     for n=0 to Nmass-1
              for m = n+1  to Nmass
                      dx=x(m)-x(n)
                     if ABS(dx) > 30  then continue  for  rem  cutoff distance lennard jones                        
                      dy=y(m)-y(n)
                     if ABS(dy) > 30  then continue  for  rem  cutoff distance lennard jones    
                      Rgquadro=(dx*dx+dy*dy)
                     if   Rgquadro >900 then continue for rem  cutoff distance lennard jones    
                     Rglue=sqr(Rgquadro)
                   if ( Rglue< Rmin)  then Rmin =Rglue
                   RCPRglue=1/Rglue
                   Ri6 =(Rgquadro*Rgquadro)*(Rgquadro *.0000005)
                   df=(1/Ri6)-(1/(Ri6*Ri6)) -0.00273         rem  lennard jones    with shift
                                   df_N= df*mc(m)   
                                   df_M=-df*mc(n)               
                                  dyR= dy*RCPRglue   :dxR= dx*RCPRglue   
                                 ay(n)=ay(n)+df_N*dyR  :  ax(n)=ax(n)+df_N*dxR  
                                 ay(m)=ay(m)+df_M*dyR  :  ax(M)=ax(M)+df_M*dxR     
         next m  
         next n
         for n=0 to Nmass     
            rem   if (fmax(n)>dfMAX ) then dfMAX =fmax(n)
                     vy(n)=vy(n)+ay(n)*dt :   ay(n)=0 
                     vx(n)=vx(n)+ax(n)*dt :   ax(n)=0  
                     y(n)=y(n)+vy(n)*dt   :   x(n)=x(n)+vx(n)*dt                
                                rem    vx(n)=vx(n)-x(n)*Kcnt   rem centering
                                rem    vy(n)=vy(n)-y(n)*Kcnt
              vx(n)=vx(n)*(1-dt*.0125)    rem  dampfer 
              vy(n)=vy(n)*(1-dt*.0125)         
         next n      
end sub
Please, help
translate only one run-time critical routine to opencl / cuda
  It is necessary somehow to transfer data from the host to gpu and after calculations back.
the formalities of this process are unknown to me
if someone knows well, show an example
12val12newakk
Posts: 35
Joined: Nov 14, 2019 17:04

Re: Need speedup

Post by 12val12newakk »

D.J.Peters you know how to do it
from me public gratitude on all network resources
badidea
Posts: 2586
Joined: May 24, 2007 22:10
Location: The Netherlands

Re: Need speedup

Post by badidea »

I have no experience with this, but I found this code:
https://github.com/arpytanshu/Discrete- ... Simulation
Someone's master project. Seems well documented.
Not in freeBASIC language, but it might be a good starting point for learning.
Post Reply