Need speedup

New to FreeBASIC? Post your questions here.
dodicat
Posts: 6488
Joined: Jan 10, 2006 20:30
Location: Scotland

Re: Need speedup

Postby dodicat » Nov 15, 2019 23:14

I put
dfy=dfy+df*(dy/Rglue):dfx=dfx+df*(dx/Rglue)
into the outer loop with a flag.
Also compared select case with ordinary branching.

(Set up the arrays with some values so the code runs).
Tried to keep out of Case 0 To 10 (only 7 passes through this)

Code: Select all

 
Const nmass=10000

Redim Shared As Single x(nmass),y(nmass),mc(nmass),vy(nmass),vx(nmass)',kcnt=-7000,dt=.9
Redim Shared As Long x_ERR_m(nmass),x_ERR_n(nmass)
Dim Shared As Single kcnt=-7000,dt=.9
For n As Long=0 To nmass
    x(n)=n*n
    y(n)=n*n
    vx(n)=n
    vy(n)=n
Next
Dim Shared As Long NUM_ERROR
Dim Shared As Long n_ERR_part,m_ERR_part


Sub vals1 ()
    Dim As Long n,m,flag
    Dim As Single dx,dy ,a,df,dfy,dfx,xm,ym
    Dim As Single Rglue ,Rgquadro
    For n  =0 To Nmass
        dfx=0
        dfy=0
        flag=0
        For m =0 To Nmass
            If n<>m Then
                flag=1
                dx=x(m)-x(n):dy=y(m)-y(n)
                Rgquadro=(dx*dx+dy*dy): Rglue=Sqr (Rgquadro)
                Select Case Rglue
                Case Is >140
                    df=9512/((Rgquadro))                             
                Case 10 To 140
                    df=((-1e12 /Rgquadro)/Rgquadro/Rgquadro)+.4
                Case 0 To 10
                    df=0 : NUM_ERROR=NUM_ERROR+1
                    x_ERR_m(m) =x(m):x_ERR_m(m) =y(m):
                    x_ERR_n(n) =x(n):x_ERR_n(n) =y(n):
                    n_ERR_part =n
                    m_ERR_part =m
                End Select
               
                df=df*mc(m)
                 'dfy=dfy+df*(dy/Rglue):dfx=dfx+df*(dx/Rglue)                             
            End If
        Next m
       
        If flag Then dfy=dfy+df*(dy/Rglue):dfx=dfx+df*(dx/Rglue) 
        dfy=dfy-y(n)*Kcnt:dfx=dfx-x(n) *Kcnt   rem   Centering power
       
        vy(n)=vy(n)*.99999:vx(n)=vx(n)*.99999         rem dissipation
       
        vy(n)=vy(n)+dfy*dt:vx(n)=vx(n)+dfx*dt           
        y(n)=y(n)+vy(n)*dt   :   x(n)=x(n)+vx(n)*dt
        dx=x(m)-x(n):dy=y(m)-y(n)
    Next n
End Sub

Sub vals2 ()
    Dim As Long n,m,flag
    Dim As Single dx,dy ,a,df,dfy,dfx,xm,ym
    Dim As Single Rglue ,Rgquadro
    For n  =0 To Nmass
        dfx=0
        dfy=0
        flag=0
        For m =0 To Nmass
            If n<>m Then
                flag=1
                dx=x(m)-x(n):dy=y(m)-y(n)
                Rgquadro=(dx*dx+dy*dy): Rglue=Sqr (Rgquadro)
                If rglue>140  Then    df=9512/((Rgquadro)):Goto skip
               
                If rglue>=10 Then
                    If rglue<=140 Then
                         'df=(-1e12 /(Rgquadro*Rgquadro*Rgquadro))+.4:goto skip 'Rock's idea
                        df=((-1e12 /Rgquadro)/Rgquadro/Rgquadro)+.4 :Goto skip
                    End If
                End If
               
                If rglue >=0 Then
                    If rglue<=10 Then
                        df=0 : NUM_ERROR=NUM_ERROR+1
                        x_ERR_m(m) =x(m):x_ERR_m(m) =y(m):
                        x_ERR_n(n) =x(n):x_ERR_n(n) =y(n):
                        n_ERR_part =n
                        m_ERR_part =m
                    End If
                End If
               
                skip:
                df=df*mc(m)
                ' dfy=dfy+df*(dy/Rglue):dfx=dfx+df*(dx/Rglue)                             
            End If

        Next m
       
        If flag Then dfy=dfy+df*(dy/Rglue):dfx=dfx+df*(dx/Rglue) 
        dfy=dfy-y(n)*Kcnt:dfx=dfx-x(n) *Kcnt   rem   Centering power
       
        vy(n)=vy(n)*.99999:vx(n)=vx(n)*.99999         rem dissipation
       
        vy(n)=vy(n)+dfy*dt:vx(n)=vx(n)+dfx*dt           
        y(n)=y(n)+vy(n)*dt   :   x(n)=x(n)+vx(n)*dt
        dx=x(m)-x(n):dy=y(m)-y(n)
    Next n
End Sub

print "warming up . . ."

vals1


dim as double t
for k as long=1 to 5
   
Redim x(nmass),y(nmass),mc(nmass),vy(nmass),vx(nmass)',kcnt=-7000,dt=.9
Redim x_ERR_m(nmass),x_ERR_n(nmass),vy(nmass),vx(nmass) 
 kcnt=-7000:dt=.9
NUM_ERROR=0
n_ERR_part=0:m_ERR_part =0
For n As Long=0 To nmass
    x(n)=n*n
    y(n)=n*n
    vx(n)=n
    vy(n)=n
Next   
   
 t=Timer
vals1
Print Timer-t;" method 1",
Print "error ";num_error


Redim x(nmass),y(nmass),mc(nmass),vy(nmass),vx(nmass)',kcnt=-7000,dt=.9
Redim x_ERR_m(nmass),x_ERR_n(nmass),vy(nmass),vx(nmass)
kcnt=-7000:dt=.9
NUM_ERROR=0
n_ERR_part=0:m_ERR_part =0
For n As Long=0 To nmass
    x(n)=n*n
    y(n)=n*n
    vx(n)=n
    vy(n)=n
Next
t=Timer
vals2
Print Timer-t;" method 2",
Print "error ";num_error
print
next k
print "done"

Sleep

 
Provoni
Posts: 377
Joined: Jan 05, 2014 12:33
Location: Belgium

Re: Need speedup

Postby Provoni » Nov 16, 2019 6:38

Provoni wrote:@12val12newakk

You may want to try to replace:

Code: Select all

Select Case Rglue

with:

Code: Select all

Select Case Rgquadro

and change the select case values accordingly. Removing the Rglue dependency may improve the CPU's out of order processing and speed up the code.

Like so:

Code: Select all

Select Case rgquadro 'Rglue
   case Is >19600
        df=9512/((Rgquadro))
    Case 100 To 19600
        df=((-1e12 /Rgquadro)/Rgquadro/Rgquadro)+.4
    Case 0 To 100
        df=0 : NUM_ERROR=NUM_ERROR+1
        x_ERR_m(m) =x(m):x_ERR_m(m) =y(m):
        x_ERR_n(n) =x(n):x_ERR_n(n) =y(n):
        n_ERR_part =n
        m_ERR_part =m
end select

It's faster (tested).
dodicat
Posts: 6488
Joined: Jan 10, 2006 20:30
Location: Scotland

Re: Need speedup

Postby dodicat » Nov 16, 2019 11:56

Tested, yes faster with select case Rgquadro
Also rglue isn't needed in the inner loop so it can be moved to the outer loop:-
. . .
Next m

If flag Then Rglue=Sqr (Rgquadro):dfy=dfy+df*(dy/Rglue):dfx=dfx+df*(dx/Rglue)
. . .
12val12newakk
Posts: 15
Joined: Nov 14, 2019 17:04

Re: Need speedup

Postby 12val12newakk » Nov 16, 2019 14:48

this worked

Code: Select all

  sub Dr_Peters ( dt as single) 
         dim as single df,dx,dy,df_N,df_M
         dim as single dfx_N,dfy_N
         dim as single dfx_M,dfy_M
         dim as single Rg ,Rgquadro,Rglue
        dim as  single reciprocQDR,reciprocSQR
         dim as single dyR,dxR
         dfMAX=0
         
         for n=0 to Nmass-1
                 for m = n+1  to Nmass
                     df_N=0:df_M=0 :
                     dfx_N=0:dfy_N=0
                     dfx_M=0:dfy_M=0
                               dx=x(m)-x(n):dy=y(m)-y(n)
                                  Rgquadro=(dx*dx+dy*dy)
                                  reciprocQDR=1/Rgquadro
                                  reciprocSQR=sqr (reciprocQDR)
                                  Rglue=1/reciprocSQR
                                 
                         Select Case   Rglue
                    Case is >140   
                        df=9512*reciprocQDR                             
                    Case 5 to 140 
                       df=(-1e12 *reciprocQDR*reciprocQDR*reciprocQDR )+.39
                       Case 0 to 5 
                       df=0 : NUM_ERROR=NUM_ERROR+1
                     End Select
                     
                            df_N= df*mc(m)
                    if  (mc(m)=mc(n)) THEN df_M=-df_N    else    df_M=-df*mc(n)
                   
                    if (ABS(df_N)>dfMAX ) then dfMAX =ABS(df_N)
                    if (ABS(df_M)>dfMAX ) then dfMAX =ABS(df_M)
                   
                dyR= dy*reciprocSQR :dxR= dx*reciprocSQR
               
                dfy_N=dfy_N+df_N*dyR:dfx_N=dfx_N+df_N*dxR 
                dfy_M=dfy_M+df_M*dyR:dfx_M=dfx_M+df_M*dxR   
               
                dfy_N=dfy_N-y(n)*Kcnt:  dfx_N=dfx_N-x(n)*Kcnt
                dfy_M=dfy_M-y(m)*Kcnt:  dfx_M=dfx_M-x(M)*Kcnt
           
              vy(n)=vy(n)+dfy_N*dt :    vx(n)=vx(n)+dfx_N*dt
              vy(m)=vy(m)+dfy_M*dt :    vx(m)=vx(m)+dfx_M*dt
                         
              y(n)=y(n)+vy(n)*dt     :    x(n)=x(n)+vx(n)*dt     
              y(m)=y(m)+vy(m)*dt     :    x(m)=x(m)+vx(m)*dt
           
         next m 
         next n
         for n=0 to Nmass  rem  dampfer
              vx(n)=vx(n)*(1-dt*.125)   
              vx(n)=vx(n)*(1-dt*.125)   
         next n       
    end sub
   


For t =0 to 7777 step dtt
Array_to_Rezerv ()

Dr_Peters ( dtt)
if dfMAX*dtt > .5 then rem step back
t=t-dtt : Rezerv_to_Array ()
dtt=dtt*.125
else
dtt=dtt*1.1
end if
next t
[code]


how to split SUB on threads .. (and + AVX)
srvaldez
Posts: 2425
Joined: Sep 25, 2005 21:54

Re: Need speedup

Postby srvaldez » Nov 17, 2019 0:48

if your program is doing a lot of floating-point divisions and square roots then reducing the fpu precision might give some speed increase, example of reducing the fpu precision to single precision

Code: Select all

'place at top of your code
dim as ushort oldcw, cwdouble=&h27F, cwsingle=&h7F

asm
  fstcw word ptr [oldcw]
  fldcw word ptr [cwsingle] 'set FPU precision to single
end asm

then the rest of your code follows, and just before exiting your program you restore the fpu

Code: Select all

asm
  fldcw word ptr [oldcw] 'restore control word
end asm

please note that this only works if using the fpu and then only the functions division and square root are faster
D.J.Peters
Posts: 8024
Joined: May 28, 2005 3:28
Contact:

Re: Need speedup

Postby D.J.Peters » Nov 17, 2019 5:37

D.J.Peters wrote:compare yours :-(

Code: Select all

for n = first to last
   for m = first to last
    if (n<>m) then calc_forces_gravitiy_with_cell(n,m)
  next
next
with this :-)

Code: Select all

for n = first to last-1
  for m = n+1 to last
    calc_forces_gravitiy_with_cell(n,m)
  next
next
12val12newakk wrote:D.J.Peters your code wrong
D.J.Peters wrote:may be you don't understand my hint in scope of speed !
12val12newakk wrote:this worked

Code: Select all

for n=0 to Nmass-1
  for m = n+1  to Nmass
  ...         
  next m 
next n
I know I have done it more than 50 times in the last three decades ;-)

Joshy
12val12newakk
Posts: 15
Joined: Nov 14, 2019 17:04

Re: Need speedup

Postby 12val12newakk » Dec 17, 2019 16:03

Code: Select all

rem======================================= 
rem datalist  input  output from sub      ( all  single (float32))
rem  dt ,dfmax  , Rmin
rem  global   x (16384) :dim shared  as single   y (16384)
rem  global  vx (16384) :dim shared  as single  vy (16384)
rem  global  ax (16384) :dim shared  as single  ay (16384) 

sub Lannard_1  ( dt as single) 
         dim as single  df,dx,dy,df_N,df_M
         dim as single  Rg ,Rgquadro,Rglue
         dim as single RCPRglue ,Ri6
         dim as single dyR,dxR
         dim as single  dtl
         dim as  UShort CNTR
                dfMAX=0: Rmin =99999
     for n=0 to Nmass-1
              for m = n+1  to Nmass
                      dx=x(m)-x(n)
                     if ABS(dx) > 30  then continue  for  rem  cutoff distance lennard jones                       
                      dy=y(m)-y(n)
                     if ABS(dy) > 30  then continue  for  rem  cutoff distance lennard jones   
                      Rgquadro=(dx*dx+dy*dy)
                     if   Rgquadro >900 then continue for rem  cutoff distance lennard jones   
                     Rglue=sqr(Rgquadro)
                   if ( Rglue< Rmin)  then Rmin =Rglue
                   RCPRglue=1/Rglue
                   Ri6 =(Rgquadro*Rgquadro)*(Rgquadro *.0000005)
                   df=(1/Ri6)-(1/(Ri6*Ri6)) -0.00273         rem  lennard jones    with shift
                                   df_N= df*mc(m)   
                                   df_M=-df*mc(n)               
                                  dyR= dy*RCPRglue   :dxR= dx*RCPRglue   
                                 ay(n)=ay(n)+df_N*dyR  :  ax(n)=ax(n)+df_N*dxR 
                                 ay(m)=ay(m)+df_M*dyR  :  ax(M)=ax(M)+df_M*dxR     
         next m 
         next n
         for n=0 to Nmass     
            rem   if (fmax(n)>dfMAX ) then dfMAX =fmax(n)
                     vy(n)=vy(n)+ay(n)*dt :   ay(n)=0
                     vx(n)=vx(n)+ax(n)*dt :   ax(n)=0 
                     y(n)=y(n)+vy(n)*dt   :   x(n)=x(n)+vx(n)*dt               
                                rem    vx(n)=vx(n)-x(n)*Kcnt   rem centering
                                rem    vy(n)=vy(n)-y(n)*Kcnt
              vx(n)=vx(n)*(1-dt*.0125)    rem  dampfer
              vy(n)=vy(n)*(1-dt*.0125)         
         next n     
end sub


Please, help
translate only one run-time critical routine to opencl / cuda
  It is necessary somehow to transfer data from the host to gpu and after calculations back.
the formalities of this process are unknown to me
if someone knows well, show an example
12val12newakk
Posts: 15
Joined: Nov 14, 2019 17:04

Re: Need speedup

Postby 12val12newakk » Dec 17, 2019 17:14

D.J.Peters you know how to do it
from me public gratitude on all network resources
badidea
Posts: 2050
Joined: May 24, 2007 22:10
Location: The Netherlands

Re: Need speedup

Postby badidea » Dec 17, 2019 18:49

I have no experience with this, but I found this code:
https://github.com/arpytanshu/Discrete- ... Simulation
Someone's master project. Seems well documented.
Not in freeBASIC language, but it might be a good starting point for learning.

Return to “Beginners”

Who is online

Users browsing this forum: No registered users and 4 guests