faster MC for AndAlso and OrElse

General FreeBASIC programming questions.
frisian
Posts: 249
Joined: Oct 08, 2009 17:25

faster MC for AndAlso and OrElse

Postby frisian » Apr 24, 2015 11:48

For the Devs and all others who understand machine code.

When testing a program where the “And” and “Or” statements where replaced by “AndAlso” and “OrElse” I discovered that the program was slower than before. A GCC compiled version on the other hand was much faster than it (GCC) predecessor. A look at the assembler listing revealed why.

Little program used for generating the assembler listing.

Code: Select all

Dim As Integer a, b, c, r

Randomize 1
a = Rnd * 100
b = Rnd * 100
c = Rnd * 100

Print Using " ###"; a; b; c

If a > b And b > c Then r = 1
If a > b Andalso b > c Then r = 2
If b > a And c > b Then r = 3
If b > a AndAlso c > b Then r = 4

If a > b Or b > c Then r = 5
If a > b OrElse b > c Then r = 6
If b > a Or c > b Then r = 7
If b > a OrElse c > b Then r = 8

Print r
Print
Print "Done"
Sleep
End


Using the “And” statement.

Code: Select all

##If a > b And b > c Then r = 1
   mov ebx, dword ptr [ebp-8]
   cmp ebx, dword ptr [ebp-12]
   setg bl
   shr ebx, 1
   sbb ebx, ebx
   mov eax, dword ptr [ebp-12]
   cmp eax, dword ptr [ebp-16]
   setg al
   shr eax, 1
   sbb eax, eax
   and ebx, eax
   je .Lt_0009
.Lt_000A:
   mov dword ptr [ebp-20], 1
.Lt_0009:

Nothing wrong with this piece of code, does what it needs to do

Using the “AndAlso” statement

Code: Select all

##If a > b Andalso b > c Then r = 2
   mov eax, dword ptr [ebp-8]
   cmp eax, dword ptr [ebp-12]
   setg al
   shr eax, 1
   sbb eax, eax
   test eax, eax
   je .Lt_000C
   mov eax, dword ptr [ebp-12]
   cmp eax, dword ptr [ebp-16]
   setg al
   shr eax, 1
   sbb eax, eax
   test eax, eax
   setne al
   shr eax, 1
   sbb eax, eax
   mov dword ptr [ebp-24], eax
   jmp .Lt_0037
.Lt_000C:
   mov dword ptr [ebp-24], 0
.Lt_0037:
   cmp dword ptr [ebp-24], 0
   je .Lt_000F
.Lt_0010:
   mov dword ptr [ebp-20], 2
.Lt_000F:

It works but it has some code that is not needed, no wonder that it is slower than the “And” code.

Lets look at that piece of code again with some comments.
ZF = Zero Flag

Code: Select all

##If a > b Andalso b > c Then r = 2
   mov eax, dword ptr [ebp-8]   
   cmp eax, dword ptr [ebp-12]   
   setg al   
   shr eax, 1   
   sbb eax, eax   'EAX = 0 -> EAX= 0:ZF=1 , EAX <> 0 -> EAX = FFFFFFFF :ZF=0
   test eax, eax   'test the same register as the previous instruction, does not alter the ZF [redundant]
   je .Lt_000C  ' jump if equal (ZF = 1 meaning EAX = 0)
   mov eax, dword ptr [ebp-12]   
   cmp eax, dword ptr [ebp-16]   
   setg al   
   shr eax, 1   
   sbb eax, eax   
   test eax, eax   
   setne al   'does the same thing as before [redundant]
   shr eax, 1  'does the same thing as before [redundant]
   sbb eax, eax  'does the same thing as before [redundant]
   mov dword ptr [ebp-24], eax  'moves the contents of EAX  to a memory location
   jmp .Lt_0037  'jumps over the next statement
.Lt_000C:
   mov dword ptr [ebp-24], 0   'load memory location with 0
.Lt_0037:
   cmp dword ptr [ebp-24], 0   'check the memory location
   je .Lt_000F  'If ZF = 1 then jump
.Lt_0010:
   mov dword ptr [ebp-20], 2 
.Lt_000F:

There is no need for use of a memory location, EAX holds the same information.

My suggestion is to change the code for a “AndAlso” statement in

Code: Select all

##If a > b Andalso b > c Then r = 2
   mov eax, dword ptr [ebp-8]   
   cmp eax, dword ptr [ebp-12] 
   setg al   
   shr eax, 1 
   sbb eax, eax   
   je .Lt_000C   
   mov eax, dword ptr [ebp-12]   
   cmp eax, dword ptr [ebp-16]   
   setg al   
   shr eax, 1   
   sbb eax, eax
.Lt_000C:
.Lt_0037:   
   je .Lt_000F  'Flags are not affected by a jump
.Lt_0010:
   mov dword ptr [ebp-20], 2
.Lt_000F:

Now the code looks nearly identical to the code for “And” but with a jump (short cut).

Using a “Or” statement.

Code: Select all

##If a > b Or b > c Then r = 5
   mov ebx, dword ptr [ebp-8]
   cmp ebx, dword ptr [ebp-12]
   setg bl
   shr ebx, 1
   sbb ebx, ebx
   mov eax, dword ptr [ebp-12]
   cmp eax, dword ptr [ebp-16]
   setg al
   shr eax, 1
   sbb eax, eax
   or ebx, eax
   je .Lt_001D
.Lt_001E:
   mov dword ptr [ebp-20], 5
.Lt_001D:


Using a “OrElse” statement.

Code: Select all

##If a > b OrElse b > c Then r = 6
   mov eax, dword ptr [ebp-8]
   cmp eax, dword ptr [ebp-12]
   setg al
   shr eax, 1
   sbb eax, eax
   test eax, eax
   jne .Lt_0020
   mov eax, dword ptr [ebp-12]
   cmp eax, dword ptr [ebp-16]
   setg al
   shr eax, 1
   sbb eax, eax
   test eax, eax
   setne al
   shr eax, 1
   sbb eax, eax
   mov dword ptr [ebp-32], eax
   jmp .Lt_0039
.Lt_0020:
   mov dword ptr [ebp-32], -1
.Lt_0039:
   cmp dword ptr [ebp-32], 0
   je .Lt_0023
.Lt_0024:
   mov dword ptr [ebp-20], 6
.Lt_0023:


My suggestion is to change the code for a “Orelse” statement in

Code: Select all

##If a > b OrElse b > c Then r = 6
   mov eax, dword ptr [ebp-8]
   cmp eax, dword ptr [ebp-12]
   setg al
   shr eax, 1
   sbb eax, eax
   jne .Lt_0020
   mov eax, dword ptr [ebp-12]
   cmp eax, dword ptr [ebp-16]
   setg al
   shr eax, 1
   sbb eax, eax
.Lt_0020:
.Lt_0039:
    je .Lt_0023   
.Lt_0024:
   mov dword ptr [ebp-20], 6
.Lt_0023:


All the changes can be made by removing a number of lines with code, no alteration of the remaining code is necessary.

If I have overlooked something let me know.
fxm
Posts: 9997
Joined: Apr 22, 2009 12:46
Location: Paris suburbs, FRANCE

Re: faster MC for AndAlso and OrElse

Postby fxm » Apr 24, 2015 16:30

Have you seen the already existing feature request: #246 Execution speed optimization of 'Andalso' and 'Orelse' ?
You can add text inside 'discussion' (need to login)!

The included test program provides the following execution times on my PC:

Code: Select all

and   0.653s
andalso   0.595s
nested   0.465s

or   0.684s
orelse   0.594s
nested   0.509s

Weird, the results order is inverted between 'and' and 'andalso' when I compile with option '-exx':

Code: Select all

and   0.645s
andalso   0.682s
nested   0.523s

or   0.676s
orelse   0.637s
nested   0.521s
dkl
Site Admin
Posts: 3212
Joined: Jul 28, 2005 14:45
Location: Germany

Re: faster MC for AndAlso and OrElse

Postby dkl » Apr 25, 2015 15:04

Hi,

this inspired me to take a closer look at how AndAlso/OrElse are implemented. For example, AndAlso:

Code: Select all

l andalso r
iif(l <> 0, r <> 0, 0)

a < b andalso b < c
iif((a < b) <> 0, (b < c) <> 0, 0)


and indeed, the <> 0 checks are redundant in the latter case, and were not optimized out by fbc.
So I implemented this optimization:
ast: Add x <> 0 => x BOP optimization
and also another one for good measure (inspired by OrElse's use of = 0 checks):
ast: Optimize logical negations on comparison BOPs

and that seemed to improve the AndAlso code noticably:

Code: Select all

dim as integer a, b, c
print a < b andalso b < c

diff old vs. new -gen gas code:
--- 1.asm   2015-04-25 16:53:35.532485047 +0200
+++ 1-new.asm   2015-04-25 16:53:30.284658402 +0200
@@ -12,22 +12,14 @@
 mov dword ptr [ebp-8], 0
 mov dword ptr [ebp-12], 0
 push 1
-mov eax, dword ptr [ebp-4]
-cmp eax, dword ptr [ebp-8]
-setl al
-shr eax, 1
-sbb eax, eax
-test eax, eax
-je .Lt_0004
+mov eax, dword ptr [ebp-8]
+cmp dword ptr [ebp-4], eax
+jge .Lt_0004
 mov eax, dword ptr [ebp-8]
 cmp eax, dword ptr [ebp-12]
 setl al
 shr eax, 1
 sbb eax, eax
-test eax, eax
-setne al
-shr eax, 1
-sbb eax, eax
 mov dword ptr [ebp-16], eax
 jmp .Lt_0006
 .Lt_0004:


It should be noted that this is merely due to "trivial" AST-level optimizations... no messing with the actual ASM code generation. That means it's relatively easy to do, and will affect all backends (not just -gen gas), but of course the real interesting optimization for AndAlso/OrElse, of IIf() in general, is to put the temp var into a register, and that's something fbc can't do with its AST/IR or ASM backend. But hey, that's one of the reasons why we have C/LLVM backends...
frisian
Posts: 249
Joined: Oct 08, 2009 17:25

Re: faster MC for AndAlso and OrElse

Postby frisian » Apr 30, 2015 18:26

fxm wrote:Have you seen the already existing feature request: #246 Execution speed optimization of 'Andalso' and 'Orelse' ?
You can add text inside 'discussion' (need to login)!


Forgot to check the forum and the feature-requests.

I had written my one timing program, to get timings. The timings with -exx differ from the timing without -exx.

Code: Select all

#Define max  4000000000

Dim As Integer a, b, c, d, r
Dim As UInteger x
Dim As Double t

t = Timer
For x = 0 To max
Next
Print Using "time used by the for next loop ###.## sec."; Timer - t

Randomize 1 : a = Rnd * 100
Print "value of a = ";a

Randomize 1 : a = Rnd * 100 : r = 0
t = Timer
For x = 0 To max
  If 1 > a And 2 > a Then r = 11
  If 1 > a And 2 > a Then r = 12
  If 1 > a And 2 > a Then r = 13
  If 1 > a And 2 > a Then r = 14
  If 1 > a And 2 > a Then r = 15
Next
Print
Print Using "And, conditions false     ###.## sec. : value of r =###"; Timer - t ;r

Randomize 1 : a = Rnd * 100 : r = 0
t = Timer
For x = 0 To max
  If 1 > a AndAlso 2 > a Then r = 21
  If 1 > a AndAlso 2 > a Then r = 22
  If 1 > a AndAlso 2 > a Then r = 23
  If 1 > a AndAlso 2 > a Then r = 24
  If 1 > a AndAlso 2 > a Then r = 25
Next
Print Using "AndAlso, conditions false ###.## sec. : value of r =###"; Timer - t ;r
Randomize 1 : a = Rnd * 100 : r = 0

t = Timer
For x = 0 To max
  If 1 < a And 2 < a Then r = 31
  If 1 < a And 2 < a Then r = 32
  If 1 < a And 2 < a Then r = 33
  If 1 < a And 2 < a Then r = 34
  If 1 < a And 2 < a Then r = 35
Next
Print
Print Using "And, conditions true      ###.## sec. : value of r =###"; Timer - t ;r

Randomize 1 : a = Rnd * 100 : r = 0
t = Timer
For x = 0 To max
  If 1 < a AndAlso 2 < a Then r = 41
  If 1 < a AndAlso 2 < a Then r = 42
  If 1 < a AndAlso 2 < a Then r = 43
  If 1 < a AndAlso 2 < a Then r = 44
  If 1 < a AndAlso 2 < a Then r = 45
Next
Print Using "AndAlso, conditions true  ###.## sec. : value of r =###"; Timer - t ;r

Print "Done"
Sleep
End


Timings
fbc_win32_mingw_0281_2015-04-22 (old)

Code: Select all

Time for empty loop  8.30 sec.
Both conditions False
And       30.91 sec.
AndAlso   37.57 sec.
Both conditions True
And       30.71 sec.
AndAlso   58.62 sec.

With -exx
Time for empty loop  8.10 sec   
Both conditions False
And       27.65 sec.
AndAlso   43.12 sec.
Both conditions True
And       30.22 sec.
AndAlso   62.01 sec.



fbc_win32_mingw_0281_2015-04-26 (new)

Code: Select all

Time for empty loop   8.34 sec.
Both conditions False
And        30.72 sec.
AndAlso    33.21 sec.
Both conditions True
And        31.11 sec.
AndAlso    34.10 sec.

With -exx
Time for empty loop  8.10 sec   
Both conditions False
And        27.66 sec.
AndAlso    36.66 sec.
Both conditions True
And        31.11 sec.
AndAlso    34.10 sec.


The timing for the new code is nearly as fast as the timing for "And".
The difference in timing between without and with -exx is a little odd since the code is identical, only the labels are different. I think it has to do with alignment since -exx adds code and the different label names.

If have also used a program to get timings.

fbc_win32_mingw_0281_2015-04-22 (old)
with And/Or : 3158 sec.
with AndAlso/OrElse : 3256 sec.

fbc_win32_mingw_0281_2015-04-26 (new)
with And/Or : 3151 sec.
with AndAlso/OrElse 3091 sec.

I'm considering to change the And/Or into nested If Then's, to get more speed increase.

@dkl
The changes in the code for “AndAlso” and “OrElse” have them made faster. Thanks.

You used as an example “Print a > b andalso b > c” which gives the following code.

Code: Select all

##Print a > b andalso b > c
   push 1
   mov ecx, dword ptr [ebp-12]
   cmp dword ptr [ebp-8], ecx
   jle .Lt_0005
   mov ecx, dword ptr [ebp-12]
   cmp ecx, dword ptr [ebp-16]
   setg cl
   shr ecx, 1
   sbb ecx, ecx
   mov dword ptr [ebp-20], ecx
   jmp .Lt_0008
.Lt_0005:
   mov dword ptr [ebp-20], 0
.Lt_0008:
   push dword ptr [ebp-20]
   push 0
   call _fb_PrintInt@12


I find it rather odd to use memory to hold the Boolean since there’s no need to save it state, after pushing it on the stack it's no longer needed. Why not do it the same way as the following piece of code generated when “AndAlso” is replaced by “And”.

Code: Select all

####Print a > b And b > c
   push 1
   mov eax, dword ptr [ebp-8]
   cmp eax, dword ptr [ebp-12]
   setg al
   shr eax, 1
   sbb eax, eax
   mov ebx, dword ptr [ebp-12]
   cmp ebx, dword ptr [ebp-16]
   setg bl
   shr ebx, 1
   sbb ebx, ebx
   and eax, ebx
   mov ecx, eax
   push ecx
   push 0
   call _fb_PrintInt@12


Thus changing the code into this

Code: Select all

   mov dword ptr [ebp-20], ecx   ''<- drop this line
   jmp .Lt_0008
.Lt_0005:
   mov dword ptr [ebp-20], 0    '  <- change this to Xor Reg, Reg or Sub Reg, Reg where Reg is the register holding the Boolean.
.Lt_0008:
   push dword ptr [ebp-20]  '  <-push dword ptr Reg
   push 0
   call _fb_PrintInt@12


One remark about the code using “And” .
mov ecx, eax
push ecx

You can simply do a push eax there is no need for moving it to ecx.
Seems to me that it's a leftover from older/different code.

EDIT: text cleanup, alignment timings. (hit the submit button instead of the preview button)
dkl
Site Admin
Posts: 3212
Joined: Jul 28, 2005 14:45
Location: Germany

Re: faster MC for AndAlso and OrElse

Postby dkl » May 01, 2015 14:56

With this change in Git, the ASM backend's optimization which allows reusing operand registers for the results should now eliminate the extra register in that case.

Return to “General”

Who is online

Users browsing this forum: No registered users and 7 guests