Why is there a Known Compiler Bug from 2012 still in?

Manpcnin · Post by **Manpcnin** » Jan 10, 2021 8:22

I just wasted a stupid amount of time trying to track down a bug in my code, only to find that it was a (known) compiler bug.

Pretty sure it's [bug] https://sourceforge.net/p/fbc/bugs/572 [/bug]

I know FBC is free software by volunteers, but it's still pretty frustrating.
Anyway; Doing my bit to help so here is the offending code reduced to some easy (not minimal) test-cases:

Code: Select all

'Compile with -fpu sse -vec 2
'#DEFINE TRY_NAMESPACE 'A namespace doesn't trigger the issue. Seems to be specific to UDTs
#Ifdef TRY_NAMESPACE
    namespace X
        dim as single h0,h1,h2
    end namespace
#ELSE
    type vec
        as single h0,h1,h2
    end type
    dim as vec X
#endif
dim as single f0,f1,f2
dim as double d0

X.h0 = 1f:X.h1 = 2f:X.h2 =3f

f0 = X.h0+X.h1+X.h2
f1 = X.h0+X.h2
f2 = X.h2+X.h1+X.h0
d0 = X.h0+X.h1+X.h2
print X.h0;" +";X.h1;" +";X.h2;" =",f0    'in_order Expected:6, Got: 1
print X.h0;" +";X.h2;" =",f1                'Skipped Expected:4, Got: 4
print X.h2;" +";X.h1;" +";X.h0;" =",f2    'inverted Expected:6, Got: 6
print X.h0;" +";X.h1;" +";X.h2;" =",d0    'Double Expected:6, Got: 1
print X.h0;" +";X.h1;" +";X.h2;" =",X.h0+X.h1+X.h2 'in_order_but_passed_to_a_function Expected:6, Got: 6
sleep:end

What's really bad about this bug, is how silent it is. Unless you are doing FP math to calculate Ptrs or some other shenanigans that might segfault (Although texture lookup might actually do that so it's not that far-fetched actually.), you might never even notice that somethings is wrong. It'll just give you the wrong result. You'd have to check your computers Math. And how many people do that?

Oh yeah, tested with fbc-1.04.0-win32 & fbc-1.07.1-win64 The change-log for 1.07.2/1.07.3 doesn't mention a vector optimization bug-fix, and issue 572 is still open on Sourceforge so I didn't bother downloading them to test.

Post by **counting_pine** » Jan 11, 2021 11:35

Sorry for the time this bug has cost you.
The vectorisation changes were added a long time ago by Bryan Stoeberl, and I don't think he's been around for some time.

We should perhaps deprecate the -vec functionality and mark it as unstable/unmaintaned.
It's perhaps obsolete now with the gcc emitter, which should, I believe, do vectorisation optimisations with -gen gcc -O3.

Post by **coderJeff** » Jan 11, 2021 19:20

@Manpcnin. thanks for the post with additional information. Yeah, the original author has not been around in 10+ years, and we're not experts in everything. Usually, when I go back to those old bugs, I have to do a lot self teaching before I can even begin to understand what's going on. I think I understand what the algorthm is trying / supposed to do.

The vectorisation changes predate the gcc/llvm backends. However, the vectorisation should only be applied to -gen gas -fpu sse. It makes no sense for for gcc / llvm. This -vec N AST optimization should never be applied to gcc / llvm. Neither of those backends know to do anything with it, add so bad code gets generated even in gcc. (Internally AST is a AST_OP_HADD, which only has implementation in gas + x86 + sse)

I have a couple fixes to apply later today:
- disable vectorisation option for gcc/llvm, they don't do anything with it.
- I think I found a fix for the -vec 2 bug. I have written many notes in the source - after 15+ hrs is like 2 changed lines, lol.
- Expand the tests in ./tests/optimizations/vector.bas (still todo)

@counting_pine, I agree,
- mark it in the documentation as unmaintained, only applies to gas + x86 + SSE
- remove from 'fbc -help', and move to 'fbc -help -v' with the unmaintained warning.

Because, even using only -fpu sse (without -vec), there are other bugs I notice with SSE code gen . Appears to be loss of precision in some conversions. maybe only to ulongint? Not 100% sure. SSE code emitter uses x87 processor for conversions. I fear I will take many many hours, and I feel like my volunteer hours could be spent better on other areas.

Manpcnin · Post by **Manpcnin** » Jan 11, 2021 21:35

removing support for vec 2 would solve it I guess; if it's too much work to fix it.
If the new backend is better, why bother.
Maybe it's time me to switch to gcc? :(
I've been using GAS for a really long time. I get really poor performance from GCC. I don't know why.
I just tested the code I'm working on right now, and a performance critical part went from 80ms to 530ms. OUCH!

Any ideas on why the GCC emitted code would perform so much worse?

*EDIT*

Ok I tested with the 32 bit version ( 1.07.2 -gen gcc [-o 3] ) and it's nowhere near as slow as "-gen gcc" 64bit. 115ms. *ONLY* 40% slower, instead of 560% slower.
And you may be right about other sse optimization bugs still being in, even without -vec 2.
I don't have a test case for it (other than my entire program, lol) at this point; but I got noticeable artifacts in my output when compiled with "-gen gcc -fpu sse -vec 1"

MrSwiss · Post by **MrSwiss** » Jan 11, 2021 22:08

Try it with optimisatioms: -gen gcc -O 2 (capitalized "O", doesn't work with FBIDE!)
Don't use -O 3 (or higher) because then, -vec 2 is back 'in play'.

caseih · Post by **caseih** » Jan 11, 2021 22:20

Is there a problem with GCC's -O3 optimization? I ran his test code with GCC and -O 3 and it works fine. Maybe I'm not reading this correctly but the -vec 2 issue only applies to the gas backend, does it not?

MrSwiss · Post by **MrSwiss** » Jan 11, 2021 22:24

Not really certain except: on GCC's website it states that only up to -O 2 is considered 'production code ready'.
Whatever that may mean to say ...
AFAIK, FB-devs also use -O 2 (to compile FB's internal libraries).

Post by **coderJeff** » Jan 11, 2021 22:26

caseih wrote:Is there a problem with GCC's -O3 optimization? I ran his test code with GCC and -O 3 and it works fine. Maybe I'm not reading this correctly but the -vec 2 issue only applies to the gas backend, does it not?

The problem occurs when using '-fpu sse -vec 2' command line options with any backend. The compiler incorrectly tries to do vector optimizations on all backends when '-vec 2' should only be applied to gas+x86+sse. So when using '-gen gcc -fpu sse -vec 2' there's bad code generation.

'-vec N' should have no effect on '-gen gcc' and is completely separate option from '-O N' optimize level N.

Post by **coderJeff** » Jan 11, 2021 22:31

Manpcnin wrote:And you may be right about other sse optimization bugs still being in, even without -vec 2.

I think warning the user in docs and '-help -v' is still not a bad idea.

To not have it completely dropped, the bug reports do really help. And if you want to help out the project narrowing down the test case is helpful. If you happen to understand what's supposed to happen and can read the assembly to narrow it down, even better.

I just finished writing the new tests for the single precision horizontal add optimization and they all pass, so that's a good sign.

Manpcnin · Post by **Manpcnin** » Jan 11, 2021 23:00

@MrSwiss Thanks for the tip about capital O and FBIDE! I was using that. Compiling from command line doesn't help though. It just reveals another bug in "-O 2" & "-O 3" optimization... OTL

I'm getting bad code with "-gen gcc -O 2" and no other parameters. "-gen gcc -O 1" seems fine. No test-cases yet

"fbc-1.07.2-win32 file.bas -gen gcc -O 1"

"fbc-1.07.2-win32 file.bas -gen gcc -O 2"

Something's definitely #%$@. (And I'm sure it's bad optimization of my vector cross product code. All the calculated surface normals are dead. ( the water just loads a vec3 "UP" constant.))

Post by **counting_pine** » Jan 13, 2021 14:47

I don’t want to make any assumptions here, but it’s possible that errors with -O2 could be that something in the program is causing Undefined Behaviour, which works as desired in -O1 but not in -O2.
Something harder to detect, like a buffer overrun clobbering an important part of memory.
(Just speculating/offering an alternative theory here..)

dodicat · Post by **dodicat** » Jan 13, 2021 16:05

I have some cross products here.

Code: Select all

Type v3
    As Single x,y,z
    As Ulong col
    flag As Long
    Declare Function length As Single
    Declare Function unit As v3
End Type

Type Line
    As v3 v1,v2
End Type
#define cross ^
#define dot *
Operator + (Byref v1 As v3,Byref v2 As v3) As v3
Return Type(v1.x+v2.x,v1.y+v2.y,v1.z+v2.z)
End Operator
Operator -(Byref v1 As v3,Byref v2 As v3) As v3
Return Type(v1.x-v2.x,v1.y-v2.y,v1.z-v2.z)
End Operator
Operator * (Byval f As Single,Byref v1 As v3) As v3
Return Type(f*v1.x,f*v1.y,f*v1.z)
End Operator
Operator * (Byref v1 As v3,Byref v2 As v3) As Single 
Return v1.x*v2.x+v1.y*v2.y+v1.z*v2.z
End Operator
Operator ^ (Byref v1 As v3,Byref v2 As v3) As v3 
Return Type(v1.y*v2.z-v2.y*v1.z,-(v1.x*v2.z-v2.x*v1.z),v1.x*v2.y-v2.x*v1.y)
End Operator
Operator <>(Byref v1 As V3,Byref v2 As V3) As Integer
Return (v1.x<>v2.x) Or (v1.y<>v2.y)
End Operator

Function v3.length As Single
    Return Sqr(x*x+y*y+z*z)
End Function

Function v3.unit As v3
    Dim n As Single=length
    If n=0 Then n=1e-20
    Return Type(x/n,y/n,z/n)
End Function

Type _float As V3

Dim Shared As Const v3 eyepoint=Type(512,768\2,600)
#define map(a,b,x,c,d) ((d)-(c))*((x)-(a))/((b)-(a))+(c)
#define incircle(cx,cy,radius,x,y) (cx-x)*(cx-x) +(cy-y)*(cy-y)<= radius*radius
'<><><><><><><><><><><> Quick SORT <><><><><><><><><><>
#define up <,>
#define down >,<
#macro SetQsort(datatype,fname,b1,b2,dot)
Sub fname(array() As datatype,begin As Long,Finish As Ulong)
    Dim As Long i=begin,j=finish 
    Dim As datatype x =array(((I+J)\2))
    While  I <= J
        While array(I)dot b1 X dot:I+=1:Wend
            While array(J)dot b2 X dot:J-=1:Wend
                If I<=J Then Swap array(I),array(J): I+=1:J-=1
            Wend
            If J > begin Then fname(array(),begin,J)
            If I < Finish Then fname(array(),I,Finish)
        End Sub
        #endmacro    
        
        Sub GetCircle(xm As Single, ym As Single,zm As Single, r As Integer,p() As v3)
            #define CIRC(r)  ( ( Int( (r)*(1 + Sqr(2)) ) - (r) ) Shl 2 )
            Dim As Long x = -r, y = 0, e = 2 - r Shl 1,count
            Redim p(1 To CIRC(r)+4 )
            Do
                count+=1:p(count)=Type<v3>(xm-x, ym+y,zm)
                count+=1:p(count)=Type<v3>(xm-y, ym-x,zm)
                count+=1:p(count)=Type<v3>(xm+x, ym-y,zm)
                count+=1:p(count)=Type<v3>(xm+y, ym+x,zm)
                r = e
                If r<=y Then
                    y+=1
                    e+=y Shl 1+1
                End If
                If r>x Or e>y Then
                    x+=1
                    e+=x Shl 1+1
                End If
            Loop While x<0
            Redim Preserve p(1 To count-1)
        End Sub
        
        Sub RotateArray(wa() As V3,result() As V3,angle As _float,centre As V3,flag As Long=0)
            Dim As Single dx,dy,dz,w
            Dim As Single SinAX=Sin(angle.x)
            Dim As Single SinAY=Sin(angle.y)
            Dim As Single SinAZ=Sin(angle.z)
            Dim As Single CosAX=Cos(angle.x)
            Dim As Single CosAY=Cos(angle.y)
            Dim As Single CosAZ=Cos(angle.z)
            Redim result(Lbound(wa) To Ubound(wa))
            For z As Long=Lbound(wa) To Ubound(wa)
                dx=wa(z).x-centre.x
                dy=wa(z).y-centre.y
                dz=wa(z).z-centre.z
                Result(z).x=((Cosay*Cosaz)*dx+(-Cosax*Sinaz+Sinax*Sinay*Cosaz)*dy+(Sinax*Sinaz+Cosax*Sinay*Cosaz)*dz)+centre.x
                result(z).y=((Cosay*Sinaz)*dx+(Cosax*Cosaz+Sinax*Sinay*Sinaz)*dy+(-Sinax*Cosaz+Cosax*Sinay*Sinaz)*dz)+centre.y
                result(z).z=((-Sinay)*dx+(Sinax*Cosay)*dy+(Cosax*Cosay)*dz)+centre.z
                #macro perspective()
                w = 1 + (result(z).z/eyepoint.z)
                result(z).x = (result(z).x-eyepoint.x)/w+eyepoint.x 
                result(z).y = (result(z).y-eyepoint.y)/w+eyepoint.y 
                result(z).z = (result(z).z-eyepoint.z)/w+eyepoint.z
                #endmacro
                If flag Then: perspective():End If
                result(z).col=wa(z).col
                result(z).flag=wa(z).flag
            Next z
        End Sub
        
        Sub inc(a() As v3,b() As v3,clr As Ulong) 'increment an array
            Var u=Ubound(a)
            Redim Preserve a(1 To u+ Ubound(b)) 
            For n As Long=1 To Ubound(b)
                b(n).col=clr
                a(u+n)= b(n)
            Next n
        End Sub
        
        Sub createdisc(xc As Single,yc As Single,zc As Single,rad As Long,d() As v3)'ends
            Redim d(1 To 4*rad^2)
            Dim As Long ctr
            For x As Long=xc-rad To xc+rad
                For y As Long=yc-rad To yc+rad  
                    If incircle(xc,yc,rad,x,y) Then
                        ctr+=1
                        d(ctr)=Type(x,y,zc,0,1)
                    End If
                Next y
            Next x
            Redim Preserve d(1 To ctr)     
        End Sub
        
        Function segment_distance( l As Line, p As v3, ip As v3=Type(0,0,0)) As Single
            Var s=l.v1,f=l.v2
            Dim As Single linelength=(s-f).length
            Dim As Single dist= ((1/linelength)*((s-f) cross (p-s))).length
            Dim As Single lpf=(p-f).length,lps=(p-s).length
            If lps >= lpf Then
                Var temp=Sqr(lps*lps-dist*dist)/linelength
                If temp>=1 Then temp=1:dist=lpf
                ip=s+(temp)*(f-s)
                Return dist
            Else
                Var temp=Sqr(lpf*lpf-dist*dist)/linelength
                If temp>=1 Then temp=1:dist=lps
                ip=f+(temp)*(s-f)
                Return dist
            End If
            Return dist
        End Function
        
        Function Regulate(Byval MyFps As Long,Byref fps As Long=0) As Long
            Static As Double timervalue,_lastsleeptime,t3,frames
            Var t=Timer
            frames+=1
            If (t-t3)>=1 Then t3=t:fps=frames:frames=0
            Var sleeptime=_lastsleeptime+((1/myfps)-T+timervalue)*1000
            If sleeptime<1 Then sleeptime=1
            _lastsleeptime=sleeptime
            timervalue=T
            Return sleeptime
        End Function
        '======================== set up ============= 
        
        Screen 20,32
        
        Dim As Any Ptr i=Imagecreate(1024,768)
         For n As Long=0 To 768
        Var red=map(768,0,n,0,255)
        Var green=map(768,0,n,0,255)
        Var blue=map(768,0,n,100,255)
        Line i,(0,n)-(1024,n),Rgb(red,green,blue)
        Next
        
        Redim As v3 e1(),e2() 'ends
        Redim As v3 c(),a(0)  'cylinder
        
        For z As Long=-200 To 200 'fill cylinder
            getcircle(512,768\2,z,20,c())
            inc(a(),c(),Rgb(0,200,0))
        Next
        
        createdisc(512,768\2,-201,18,e1()) 'ends
        createdisc(512,768\2, 201,18,e2())
        inc(a(),e1(),Rgb(155,50,0))  'add them to the array
        inc(a(),e2(),Rgb(0,50,155))
        Dim As v3 L(1 To 2)={Type(512,768\2,-205),Type(512,768\2,205)}'ends of central axis
        inc(a(),L(),0) 'add them to array
        
        SetQsort(V3,QsortZ,down,.z)'initiate quicksort
        
        Redim As v3 result()'working array
        Dim As Single ang
        Dim As Single r,g,b,rad,dt
        Dim As v3 light=Type(512,-1000,0)
        Dim As v3 ip 
        Dim As Line ln
        Dim As Long fps
        Do
            ang+=.015
            RotateArray(a(),result(),Type<_float>(1.2*ang,2*ang,ang),Type(512,768\2,0),1)
            Qsortz(result(),Lbound(result),Ubound(result)-2)
            Screenlock
            Cls
            put(0,0),i,pset
            Draw String(20,20),"FPS " &fps,0
            For n As Long=Lbound(result) To Ubound(result)-2
                If result(n).flag=0 Then 'curved bit shader
                    Dim As v3 d=Type(result(n).x-light.x,result(n).y-light.y,result(n).z-light.z)'point to light
                    ln=Type<Line>(result(Ubound(result)-1),result(Ubound(result))) 'the central cylinder axis (line)
                    segment_distance(ln,result(n),ip) 'need ip (intercept of central axis)
                    Dim As v3 c=Type(result(n).x-ip.x,result(n).y-ip.y,result(n).z-ip.z)  'cylinder normals at point
                    Var q=c.unit dot d.unit        'shade by dot product
                    dt=map(-1,1,q,1,0)             'map dot product to [1,0]    
                    r=Cast(Ubyte Ptr,@result(n).col)[2]*dt
                    g=Cast(Ubyte Ptr,@result(n).col)[1]*dt
                    b=Cast(Ubyte Ptr,@result(n).col)[0]*dt
                Else 'ends
                    dt=map(600,200,result(n).y,.3,1) 'shade by .y
                    r=Cast(Ubyte Ptr,@result(n).col)[2]*dt
                    g=Cast(Ubyte Ptr,@result(n).col)[1]*dt
                    b=Cast(Ubyte Ptr,@result(n).col)[0]*dt  
                End If
                
                rad=map(-200,200,result(n).z,2,1) 
                Circle(result(n).x,result(n).y),rad,Rgb(r,g,b),,,,f
            Next n
            
            Screenunlock
            Sleep regulate(60,fps)
        Loop Until Inkey=Chr(27)
        imagedestroy i
        Sleep

My results

Code: Select all


32 bit complier
-gen gcc 21 fps
gcc O1 26 fps
gcc O2 28 fps
gcc O3 36 fps
fpu sse -vec2 28 fps
-gen gas 27 fps



64 bit compiler
-gen gcc 29 fps
gcc O1 39 fps
gcc O2 37 fps
gcc O3 48 fps
fpu sse -vec2 30 fps  GRAPHICS FAIL ON CYLINDER!

Why is your -O2 slower than your O1??
Or is your time elapdsed not a measusre of speed.

Win 10 64 bits.

D.J.Peters · Post by **D.J.Peters** » Jan 14, 2021 9:43

@Manpcnin
do you use double or float (single) ?
do you run 32-bit exe on 64-bit windows ?
are the 3D data created on the fly or loaded from file ?
can you upload the code ?

Joshy

TeeEmCee · Post by **TeeEmCee** » Jan 18, 2021 0:40

The -O 2 build which gives faulty results and is also slower may be because denormals, NaNs or infinities are creeping into the calculations, which can greatly slow down some instructions due to causing the CPU to use a slow path. Maybe this is happening due to differences in rounding or precision rather than undefined behaviour or a compiler bug.

But gcc's -O 2 doesn't enable -ffast-math (non-standards-compliant calculation optimisations), which can often be blamed for such problems.

I see that Jeff commented on sf.net bug #458 about double vs long double precision, maybe because he was thinking about this. One difference between gcc and gas backends is that it can change how intermediate results are rounded, because they will be at 80bit precision unless spilled to memory. To test whether this is the problem you pass "-fpu sse" and see whether the difference between "-gen gcc -O 1" and "-gen gcc -O 2" persists.

Manpcnin · Post by **Manpcnin** » Jan 21, 2021 1:05

Sorry I got too busy to focus on coding this week, but I finally dove into my program and found a small piece of code that can replicate some of the weirdness.

Code: Select all

const size = 20
dim shared as single hMap(size,size),nMap(size,size)

for j as integer = 0 to size
    for i as integer = 0 to size
        hmap(i,j)=rnd*10
    next i
next j

dim as single l,j0,j1,j2,i0,i1,i2

j0 = 0
for j1 = 1 to size-1
    i0 = 0:j2 = j1+1
    for i1 = 1 to size-1
    i2 = i1+1
        nMap(i1,j1) = hMap(i0,j0)+hMap(i1,j1) + hMap(i2,j2)			'Error is probably generated from this code.
        'nMap(i1,j1) = hMap(i1-1,j1-1)+hMap(i1,j1) + hMap(i2+1,j2+1)    '<= This, instead of the above, generates working code under gen gcc.
        'print nMap(i1,j1)                                              '<= un-commenting the print ALSO prevents the weird behavior
        i0 = i1
    next i1
    j0 = j1
next j1
for j1 = 1 to size step 5 'print a sample of the array
    for i1 = 1 to size step 5
        print nMap(i1,j1)
    next i1
next j1
print "done"
sleep: end

Yes there's no reason to use singles for those variables. And I've switched to ints for my actual program which fixes the problem. BUT there's still a problem here. There is no undefined behavior, but this code fails and crashes under "-gen gcc -O 1" (and -O 2 / -O 3) (And was obviously the reasons NaNs and INFs were messing up my previous code.
And adding a print REALLY shouldn't affect the emmited ASM in any meaningful way, yet it does.

Why is there a Known Compiler Bug from 2012 still in?

Why is there a Known Compiler Bug from 2012 still in?

Re: Why is there a Known Compiler Bug from 2012 still in?

Re: Why is there a Known Compiler Bug from 2012 still in?

Re: Why is there a Known Compiler Bug from 2012 still in?

Re: Why is there a Known Compiler Bug from 2012 still in?

Re: Why is there a Known Compiler Bug from 2012 still in?

Re: Why is there a Known Compiler Bug from 2012 still in?

Re: Why is there a Known Compiler Bug from 2012 still in?

Re: Why is there a Known Compiler Bug from 2012 still in?

Re: Why is there a Known Compiler Bug from 2012 still in?

Re: Why is there a Known Compiler Bug from 2012 still in?

Re: Why is there a Known Compiler Bug from 2012 still in?

Re: Why is there a Known Compiler Bug from 2012 still in?

Re: Why is there a Known Compiler Bug from 2012 still in?

Re: Why is there a Known Compiler Bug from 2012 still in?