-O 3 Optimization Query

General FreeBASIC programming questions.
3622
Posts: 13
Joined: Mar 14, 2015 23:53

-O 3 Optimization Query

Postby 3622 » Mar 15, 2015 21:53

Hi All,
Whilst trying to make a program faster I have been trying out different algorithms for a procedure which my program calls frequently. To do this I set up a timer and call the procedure a few million times. This has never been a problem in the past and with better hardware and 64bit OS and Freebasic 64 I have succeeded in what I set out to do.
However, I came across -O 3 optimization whilst reading a recent post and so I tried this and it did indeed speed up my program. But, when I tried to time my procedures it always gives me a time of zero and the procedure returns the correct result. So I just tried timing a For/Next loop and got the same time of zero. The same happens with nested loops.
I should also add that I am using PCLinuxOS 64. The same happens on another PC that runs Linux Mint 64. When I boot the first PC into Vista 64 this does not happen - I have never waited for the program to finish executing! I enclose the code below.
So my question is what is happening? The code seems to be returning the correct results only very nearly instantaneously.

screen 19

dim i,j,k,l as ulongint
dim z as double

print"Press any key to start"
while inkey$="":wend

z=timer

for i=1 to 1000000000000000000
'for j=1 to 1000000000000000000
'for k=1 to 1000000000000000000
'for l=1 to 1000000000000000000
' I call my procedure here
'next
'next
'next
next

print timer-z
print i',j,k,l

sleep
end
marcov
Posts: 2963
Joined: Jun 16, 2005 9:45
Location: Eindhoven, NL
Contact:

Re: -O 3 Optimization Query

Postby marcov » Mar 16, 2015 8:36

Make sure your procedure does something real each iteration. If it does nothing, or something that the compiler can see is invariant, gcc can optimize the whole loops away.
3622
Posts: 13
Joined: Mar 14, 2015 23:53

Re: -O 3 Optimization Query

Postby 3622 » Mar 16, 2015 22:00

Thanks marcov for your speedy reply.
My procedure (popcount) does do something real but it is the same for each iteration so I can appreciate your explanation. The optimization routine appears to be quite smart and I can only assume that it is peculiar to Linux only. Thanks again.
dodicat
Posts: 6493
Joined: Jan 10, 2006 20:30
Location: Scotland

Re: -O 3 Optimization Query

Postby dodicat » Mar 16, 2015 22:39

PCLinuxOS is also my preferred system (When I do use Linux).
It the only Linux version (I have found) which has the drivers for my usb modem.
But these days I am using Windows.
With -gen gcc -Wc -O3 the loop is optimzed out.
VIZ:
Press any key to start
4.437662166623113e-007
1000000000000000001

So marcov is on the button here.
3622
Posts: 13
Joined: Mar 14, 2015 23:53

Re: -O 3 Optimization Query

Postby 3622 » Mar 19, 2015 21:40

Thanks dodicat,
However I cannot replicate what you have done. Each time I receive error 80 Invalid command-line option "wc". Even though I am using Wc. I am trying this on Vista 64 (same PC) and have used both Freebasic 1.00 and 1.01 win64.
3622
Posts: 13
Joined: Mar 14, 2015 23:53

Re: -O 3 Optimization Query

Postby 3622 » Mar 19, 2015 21:49

My apologies. I have just found a post about FBIde converting command line parameters to lower case. I will try the work around and dig out my copy of FBEdit. Thanks again to you both.
MichaelW
Posts: 3500
Joined: May 16, 2006 22:34
Location: USA

Re: -O 3 Optimization Query

Postby MichaelW » Mar 20, 2015 6:13

This test code is 64-bit specific:

Code: Select all

''------------------------------------------------------------------------------
'' POPCNT instruction supported if for CPUID Function 1, bit 23 of ECX is set.
'' Note that CPUID will trash the callee-save register RBX.
''------------------------------------------------------------------------------

function HavePopCnt naked() as integer   
    asm
        ".intel_syntax noprefix"       
        "mov    eax, 1"
        "push   rbx"
        "cpuid"
        "pop    rbx"
        "xor    rax, rax"
        "bt     ecx, 23"
        "jnc    0f"       
        "mov    rax, -1"
        "0:"       
        "ret"
        ".att_syntax prefix"
    end asm
end function

''------------------------------------------------------------
'' This code adapted from the Wikipedia Hamming_weight page:
''------------------------------------------------------------

#define m1  &h5555555555555555
#define m2  &h3333333333333333
#define m4  &h0f0f0f0f0f0f0f0f
#define h01 &h0101010101010101

function PopCount(byval x as uinteger) as integer
    ''---------------------------------------------------------------
    '' This uses fewer arithmetic operations than any other known 
    '' implementation on machines with fast multiplication.
    '' It uses 12 arithmetic operations, one of which is a multiply.
    ''---------------------------------------------------------------
    x -= (x shr 1) and m1               '' put count of each 2 bits into those 2 bits
    x = (x and m2) + ((x shr 2) and m2) '' put count of each 4 bits into those 4 bits
    x = (x + (x shr 4)) and m4          '' put count of each 8 bits into those 8 bits
    return ((x * h01) shr 56)           '' returns left 8 bits of x+(x<<8)+(x<<16)+(x<<24)+...     
end function

function PopCnt naked (byval x as uinteger) as integer
    asm
        ".intel_syntax noprefix"       
        "popcnt rax, rcx"
        "ret"
        ".att_syntax prefix"       
    end asm
end function

dim as double t
dim as integer res1, res2, res3

for i as integer = 1 to 10
    print PopCount(i);chr(9);
next
print
if HavePopCnt() then
    for i as integer = 1 to 10
        print PopCnt(i);chr(9);
    next   
end if
print

t = timer
for i as integer = 1 to 1000000000   
    PopCount(i)
next
print "PopCount (no return)                :";timer-t

t = timer
for i as integer = 1 to 1000000000   
    res1 += PopCount(i)
next
print "PopCount (return to sum)            :";timer-t

t = timer
for i as integer = 1 to 1000000000   
    res2 += PopCount(i)
next
print "PopCount (return to sum and display):";timer-t
print res2

if HavePopCnt() then
    t = timer
    for i as integer = 1 to 1000000000   
        PopCnt(i)
    next
    print "PopCnt (no return)                  :";timer-t
    t = timer
    for i as integer = 1 to 1000000000   
        res2 = PopCnt(i)
    next   
    print "PopCnt (return)                     :";timer-t
end if

sleep

I compiled with Version 1.02.0 (01-12-2015), built for win64 (64bit).

Results compiled without optimization:

Code: Select all

 1       1       2       1       2       2       3       1       2       2

 1       1       2       1       2       2       3       1       2       2

PopCount (no return)                : 6.430998902535066
PopCount (return to sum)            : 7.395839434466325
PopCount (return to sum and display): 7.396075153723359
 14846928141
PopCnt (no return)                  : 2.422068938147277
PopCnt (return)                     : 2.802914366475307

Timing results compiled with O 1 optimization:

Code: Select all

PopCount (no return)                : 0.3619673564098775
PopCount (return to sum)            : 0.3701911893440411
PopCount (return to sum and display): 3.01097651431337
 14846928141
PopCnt (no return)                  : 1.673372065648437
PopCnt (return)                     : 1.67269672465045

The no return and return to sum calls were probably optimized away, and note the effect on the PopCnt timings.

Timing results compiled with O 2 optimization:

Code: Select all

PopCount (no return)                : 5.679158493876457e-005
PopCount (return to sum)            : 5.337037146091461e-005
PopCount (return to sum and display): 1.67982988525182
 14846928141
PopCnt (no return)                  : 1.651597622665577
PopCnt (return)                     : 1.798132925061509

The no return and return to sum calls were obviously optimized away.


Timing results compiled with O 3 optimization:

Code: Select all

PopCount (no return)                : 6.500247400254011e-005
PopCount (return to sum)            : 9.921425953507423e-005
PopCount (return to sum and display): 1.340862438431941
 14846928141
PopCnt (no return)                  : 1.67324582405854
PopCnt (return)                     : 1.672604352817871

The no return and return to sum calls were obviously optimized away, and I did not have time to determine how the compiler-generated code could be faster than a 2-instruction assembly-code procedure.

Return to “General”

Who is online

Users browsing this forum: No registered users and 8 guests