Duffs Device & similar

General FreeBASIC programming questions.
Manpcnin
Posts: 46
Joined: Feb 25, 2007 1:43
Location: N.Z.

Re: Duffs Device & similar

Post by Manpcnin »

dodicat wrote:Here is the original wanted, with C (loosly similar to switch) type code, cascading through the cases, visiting each one in turn
Except the control flow graph of your program is completely different. The cmp3() function now gets executed every time an alternate loop is selected. What if the function had side effects?

And the purpose of the construct, is to reduce the number of branches taken in a performance critical core loop.
It reduces it from 2 branches (+ an exit condition) to: between 1 and 2 (+ exit) depending on the data, saving half a branch and improving the PBU predictions too.
Calling the SELECT CASE again makes it pointless, that's even more expensive.

As for GOTOs as a language feature. I do avoid them when reasonable. Probably 95% of my programs are "goto" free. But sometimes the control flow graph you need, doesn't fit into nice: nested loops, nested branches, or combination thereof; In which case: you need them.

And yes I know, Branchless programming is a thing, "MOV" is Turing-Complete, and technically any program can be converted to a linear series of instructions, But the cost is abysmal speed and lack of readability that puts the worst Spaghetti code to shame! Fewer branches are not always more readable than more. (I mean, the MOVfuscator was conceived as a (joke) anti-reverse-engineering tool after all.)
dodicat
Posts: 7983
Joined: Jan 10, 2006 20:30
Location: Scotland

Re: Duffs Device & similar

Post by dodicat »

Well, not interfering with your structure, and using fb select case as it should be, but instead of all the internal goto's and if's e.t.c. , use an external while loop to govern the process.
your first code:

Code: Select all

function cmp3(a as single,b as single,c as single) as integer
    return (4 and a<b) or (2 and b<c) or (1 and c<a) 'Make SIMD later
end function

dim as single  x = 0.10, y = 0.20, z = 0.30
dim as single dx = 0.31,dy = 0.12,dz = 0.43
dim as long count

while x<10 and y<10 and z<10
select case cmp3(x,y,z)
    'XMIN:if y<z then
        case &b110 :do
            print x,y,z:x+=dx
            'if x >= 10 then goto break
        loop while x<=y
       ' goto YMIN
    'else
        case &b100:do
            print x,y,z:x+=dx
           ' if x >= 10 then goto break
        loop while x<=z
       ' goto ZMIN
    'end if
    'YMIN:if x < z then
        case &b010 :do
            print x,y,z:y+=dy
            'if y >= 10 then goto break
        loop while y<=x
        'goto XMIN
   ' else
        case &b011:do
            print x,y,z:y+=dy
           ' if y >= 10 then goto break
        loop while y<=z
        'goto ZMIN
    'end if
    'ZMIN:if x < y then
        case &b101 :do
            print x,y,z:z+=dz
           ' if z >= 10 then goto break
        loop while z<=x
        'goto XMIN
    'else
        case else:do
            print x,y,z:z+=dz
           ' if z >= 10 then goto break
        loop while z<=y
        'goto YMIN
    'end if
end select
count+=1
wend
print
print count;"  iterations"
sleep:end 
Manpcnin
Posts: 46
Joined: Feb 25, 2007 1:43
Location: N.Z.

Re: Duffs Device & similar

Post by Manpcnin »

Code: Select all

return (4 and a<b) or (2 and b<c) or (1 and c<a) 
'...
while x<10 and y<10 and z<10
Aren't each of these Compares being translated into a JL (Jump if Less than) instruction? So this code now has 6 extra branches in it, to save one.

Look; I've done a lot of performance testing on this code. The whole block is running about 25,000,000 times a second, so each branch inside the block costs me dearly, and has to be fought over.

This way is too slow.

*EDIT*

Oh I forgot another code branch also uses this algorithm almost identically. Make that 50,000,000 times a second.
marcov
Posts: 3462
Joined: Jun 16, 2005 9:45
Location: Netherlands
Contact:

Re: Duffs Device & similar

Post by marcov »

Manpcnin wrote:
dodicat wrote:Here is the original wanted, with C (loosly similar to switch) type code, cascading through the cases, visiting each one in turn
Except the control flow graph of your program is completely different. The cmp3() function now gets executed every time an alternate loop is selected. What if the function had side effects?
Does every theoretical use case deserve special syntax?
As for GOTOs as a language feature. I do avoid them when reasonable. Probably 95% of my programs are "goto" free. But sometimes the control flow graph you need, doesn't fit into nice: nested loops, nested branches, or combination thereof; In which case: you need them.
Yup, I have a handful, mostly to jump out of nested IF statements in quite heavily calculating code, because the compiler that I use does a poor job with exiting from nested IFs.
And yes I know, Branchless programming is a thing, "MOV" is Turing-Complete, and technically any program can be converted to a linear series of instructions, But the cost is abysmal speed and lack of readability that puts the worst Spaghetti code to shame! Fewer branches are not always more readable than more. (I mean, the MOVfuscator was conceived as a (joke) anti-reverse-engineering tool after all.)
I do some SSE/AVX programming now and then, and when you operate on more values at the same time, doing ifs per value defeats the purpose of SSE. Then you learn a bit about branchless programming. Fairly simple stuff like recording the result of an operation in a register and multiplying or ANDing with it later on.
Manpcnin
Posts: 46
Joined: Feb 25, 2007 1:43
Location: N.Z.

Re: Duffs Device & similar

Post by Manpcnin »

marcov wrote:I do some SSE/AVX programming now and then, and when you operate on more values at the same time, doing ifs per value defeats the purpose of SSE. Then you learn a bit about branchless programming. Fairly simple stuff like recording the result of an operation in a register and multiplying or ANDing with it later on.
Oh yeah, I've looked at making this branch-less with SSE/AVX. I think it needs a version of PHMINPOSUW that operates on SINGLES instead, to be practical. (Why isn't that a thing? INTEL get on it!)
I haven't given up yet, but getting a a SIMD version that is more efficient is not trivial.

And because I need to select one of three values to operate on in any given iteration, The instruction width of the SIMD version will be 3x-4x higher.
IE, Three out of four bytes operated on is thrown away/masked out.
Even it's it IS technically faster, those wide instruction are more power hungry, and I'm already using all the real & virtual cores on my CPU.
If I gain 10% in speed, but start thermally throttling, I may end up worse in the end.

Still: A SIMD, branch-less version is being worked on. Theorizing is useless unless you actually use it to TEST.

(And when I said "Abysmal Speed" I meant indiscriminate branch removal. Obviously judiciously making a core loop branch-less can speed things up quite a bit by removing pipeline stalls.)
dodicat
Posts: 7983
Joined: Jan 10, 2006 20:30
Location: Scotland

Re: Duffs Device & similar

Post by dodicat »

Save a little
use perhaps
#define cmp3(a,b,c) (4 and a<b) or (2 and b<c) or (1 and c<a)
instead of a function.

select case as const is usually disappointing, I have never found it any faster.
andalso versus and has to be timed, in your code it makes little difference.
I make the while code slightly faster than your goto code, not much though.
coderJeff
Site Admin
Posts: 4326
Joined: Nov 04, 2005 14:23
Location: Ontario, Canada
Contact:

Re: Duffs Device & similar

Post by coderJeff »

If the structures and scopes getting in your way and really want something a bit more low level ... On expr Goto ... lets you build your own jump table.

Something like:

Code: Select all

on cmp3(x,y,z) goto ZYX, YXZ, YZX, XZY, ZXY, XYZ 

default: '' cmp3(x, y, z) = 0 or 7
	goto after

XYZ: '' cmp3(x, y, z) = 6
XZY: '' cmp3(x, y, z) = 4
YXZ: '' cmp3(x, y, z) = 2
YZX: '' cmp3(x, y, z) = 3
ZXY: '' cmp3(x, y, z) = 5
ZYX: '' cmp3(x, y, z) = 1

after:
And for some help with 'break', can try this:

Code: Select all

'' requires fb 1.08.0

#macro jumptable ? ( expr, labels... )
	__fb_uniqueid_push__( jumptable_exit )
	#if "" = __FB_QUOTE__( expr )
	#  error "expected expression"
	#elseif __FB_ARG_COUNT__( labels ) = 0
	#  error "expected labels"
	#else
		'' "on goto ... creates a jump table"
		on expr goto labels
	#endif
#endmacro

#macro exit_jumptable
	goto __fb_uniqueid__( jumptable_exit )	
#endmacro

#macro end_jumptable
	__fb_uniqueid__( jumptable_exit ):	
	__fb_uniqueid_pop__( jumptable_exit )
#endmacro

dim i as integer
input "enter an integer value: ", i

'' 
''           1  2  3  4  5  6  7
jumptable i, A, B, C, D, C, C, A 

	print "default"

	A:
		print "A"
		goto C
	B:
		print "B"
		exit_jumptable
	C:
		print "C"
	D:
		print "D"

end_jumptable

sleep
Take care, labels have procedure scope. So there is some effort required to translate over from C's switch statement; for example values are far apart, many values, or nest switch statements.
Manpcnin
Posts: 46
Joined: Feb 25, 2007 1:43
Location: N.Z.

Re: Duffs Device & similar

Post by Manpcnin »

@coderJeff
Thank you! Thank you!
This is perfect!

How did I not know about this language feature?! I've been messing around in FreeBasic since 2006!

The fact that "On ... Gosub" is also supported implies that it's been in the language since BASIC.
caseih
Posts: 2157
Joined: Feb 26, 2007 5:32

Re: Duffs Device & similar

Post by caseih »

Correct. It was part of most BASICs back in the 80s or earlier. Commodore 64 had it actually.
Post Reply