Possible bug? Undefined behavior in C emitted code?

General discussion for topics related to the FreeBASIC project or its community.
Post Reply
angros47
Posts: 2326
Joined: Jun 21, 2005 19:04

Possible bug? Undefined behavior in C emitted code?

Post by angros47 »

From this discussion: viewtopic.php?p=300503#p300503

The code:

Code: Select all

Dim shared As long bDimMancala(13)


Sub DrawNumbers()

	For i As Long = 0 To 5
		
		bDimMancala(13 - i) = i
		
	next

End Sub
Produces, in the C intermediate source, the line:

Code: Select all

*(int32*)(((int32)(int32*)BDIMMANCALA$ + (-I$1 << (2 & 31))) + 52) = I$1;
(or (int64), depending on the platform)

The code " (-I$1 << (2 & 31))", specifically, is supposed to produce undefined results, because according to C specifications, the left and right shift operators have a definite behavior only for unsigned numbers (source: https://stackoverflow.com/questions/841 ... d-behavior )

GCC seems not to have problems with that, but clang (used in Emscripten) seems to have issues with it.
Is that a bug of the compiler?

Is the code perhaps supposed to be " (-(I$1 << (2 & 31)))" ?
adeyblue
Posts: 300
Joined: Nov 07, 2019 20:08

Re: Possible bug? Undefined behavior in C emitted code?

Post by adeyblue »

It recently changed to be well defined, so that for instance -1 << 1 = -2 is now guaranteed
https://stackoverflow.com/questions/554 ... t-overflow

So I guess it now depends on how old the compiler. Although I suspect the reason this change was accepted into the standard along with the mandatory use of two's complement is that this is what the overwhelming number of modern C compilers for various architectures were already doing anyway.

TL;DR it didn't used to be technically fine, it's now technically fine, but in practice it's probably always been fine.
TeeEmCee
Posts: 375
Joined: Jul 22, 2006 0:54
Location: Auckland

Re: Possible bug? Undefined behavior in C emitted code?

Post by TeeEmCee »

Yes, this is a bug in fbc. Normally I would have expected that bitshifting a negative in this way would just cause a warning, if anything, that could be ignored, but C and C++ compilers do sometimes really punish you hard for technically undefined behaviour.
adeyblue wrote: Sep 26, 2023 19:10 It recently changed to be well defined, so that for instance -1 << 1 = -2 is now guaranteed
https://stackoverflow.com/questions/554 ... t-overflow
I hadn't heard that C++20 now requires two's complement signed integers. (About time.) However, that's C++. C has made the same change only in C23, which isn't going to be published until next year. As the emscripten problem shows, we can't rely on such a modern standard.

Incidentally there are plenty of interesting changes in C23, such as "variadic functions no longer need a named argument before the ellipsis and the va_start macro no longer needs a second argument", picking up more C++ features such as [[..]] attribute syntax and auto for type inference, native char8_t for UTF-8, and #elifdef and #elifndef convenience preprocessor directives, which is something I very much want to see added to FB.
angros47
Posts: 2326
Joined: Jun 21, 2005 19:04

Re: Possible bug? Undefined behavior in C emitted code?

Post by angros47 »

TeeEmCee wrote: Sep 27, 2023 2:05 As the emscripten problem shows, we can't rely on such a modern standard.
Actually, with further investigation, it seems the problem in Emscripten is not caused by that
TeeEmCee
Posts: 375
Joined: Jul 22, 2006 0:54
Location: Auckland

Re: Possible bug? Undefined behavior in C emitted code?

Post by TeeEmCee »

Looking at the issue you filed. I'm really surprised to see the bug seems to be in really far down in LLVM. Nice work narrowing down the original problem and reporting it.

But still, should fbc be emitting C code which left-shifts negatives? We might hope to support other compilers aside from GCC and Clang. I have no idea whether it might actually cause a problem anywhere (due to over-aggressive optimisation, rather than due to a hypothetical non-two's complement machine, of course).
caseih
Posts: 2157
Joined: Feb 26, 2007 5:32

Re: Possible bug? Undefined behavior in C emitted code?

Post by caseih »

The problem with taking advantage of new features in C23 is that most C programs were intended to be portable, and so programmers tend to prefer the lowest common denominator. It's going to take years for them to be widely adopted and supported by the majority of compilers. Although I suppose some esoteric projects like the Linux kernel will adopt these sort of features the quickest.

I expect FBC will be very conservative in the C features the code it emits requires. This even if shifting a negative number were officially defined, I would prefer that FBC emit code that does not require the behavior.
angros47
Posts: 2326
Joined: Jun 21, 2005 19:04

Re: Possible bug? Undefined behavior in C emitted code?

Post by angros47 »

TeeEmCee wrote: Sep 27, 2023 11:19 I have no idea whether it might actually cause a problem anywhere (due to over-aggressive optimisation, rather than due to a hypothetical non-two's complement machine, of course).
Neither I have; on the other hand, I know one thing: the C emitter is around since several years, by now, and that specific behavior has never caused issues, as far as I know (the issue with Emscripten, in the end, was unrelated with that: previous versions of FreeBasic used the same trick and it caused no issues with Emscripten). So, I would say "If it's not broken, don't fix it"; at the moment it causes no problems anywhere, and since that behavior is now defined, it means more and more compilers will accept it in the future, so the likelihood of related issues is going to decrease even more in the future

caseih wrote: Sep 27, 2023 14:38 The problem with taking advantage of new features in C23 is that most C programs were intended to be portable, and so programmers tend to prefer the lowest common denominator
......
I would prefer that FBC emit code that does not require the behavior.
Well, GCC and clang are the most portable compilers I have ever seen, since they seem to support almost every modern platform, and a lot of old ones.
Personally, the only situation I can imagine where a different, older compiler might be needed would be to emit code for processors older than 80386, but this would not work anyway because of the different memory model
caseih
Posts: 2157
Joined: Feb 26, 2007 5:32

Re: Possible bug? Undefined behavior in C emitted code?

Post by caseih »

angros47 wrote: Sep 27, 2023 16:12and since that behavior is now defined, it means more and more compilers will accept it in the future, so the likelihood of related issues is going to decrease even more in the future
It's defined in C++20, but to my knowledge not in ISO C. Am I incorrect? Can you point me at the ISO C standard where this has now been defined?
angros47
Posts: 2326
Joined: Jun 21, 2005 19:04

Re: Possible bug? Undefined behavior in C emitted code?

Post by angros47 »

I thought I read somewhere it was planned to be defined in C 23, but I can't find that page anymore, so I might be wrong
nehakakar
Posts: 7
Joined: Jul 04, 2023 6:42
Location: India
Contact:

Re: Possible bug? Undefined behavior in C emitted code?

Post by nehakakar »

angros47 wrote: Sep 24, 2023 17:11 From this discussion: viewtopic.php?p=300503#p300503

The code:

Code: Select all

Dim shared As long bDimMancala(13)


Sub DrawNumbers()

	For i As Long = 0 To 5
		
		bDimMancala(13 - i) = i
		
	next

End Sub
Produces, in the C intermediate source, the line:

Code: Select all

*(int32*)(((int32)(int32*)BDIMMANCALA$ + (-I$1 << (2 & 31))) + 52) = I$1;
(or (int64), depending on the platform)

The code " (-I$1 << (2 & 31))", specifically, is supposed to produce undefined results, because according to C specifications, the left and right shift operators have a definite behavior only for unsigned numbers (source: https://stackoverflow.com/questions/841 ... d-behavior )

GCC seems not to have problems with that, but clang (used in Emscripten) seems to have issues with it.
Is that a bug of the compiler?

Is the code perhaps supposed to be " (-(I$1 << (2 & 31)))" ?
You can try this..

Code: Select all

*(int32*)(((int32)(int32*)BDIMMANCALA$ + (-I$1 << 2)) + 52) = I$1;
coderJeff
Site Admin
Posts: 4326
Joined: Nov 04, 2005 14:23
Location: Ontario, Canada
Contact:

Re: Possible bug? Undefined behavior in C emitted code?

Post by coderJeff »

From other topic:
viewtopic.php?p=300547#p300547
VANYA wrote: Sep 25, 2023 3:01 It turns out that I saved the previous compiler (1.08) in my archives. And there are no problems with him. Here is the difference in code generation:

1.09:

Code: Select all

fb_IntToStr( (int32)*(int8*)(((int64)(struct $8TMANCALA*)BDIMMANCALA$ + (-(int64)I$2 * 49ll)) + 637ll) );
1.08:

Code: Select all

fb_IntToStr( (int32)*(int8*)(((uint8*)BDIMMANCALA$ + (-I$2 * 49)) + 637) );
So the problem is in the FB compiler.
It was only very recently (with TeeEmCee's help) I was able to start building the test-suite for Emscripten port. And there are still many failures. If can get the test-suite working properly, then should avoid these kinds of changes in future.

Reminder to myself: need to narrow down why this was changed (i.e. find the commits that caused this change instead of just comparing the fbc versions).
coderJeff
Site Admin
Posts: 4326
Joined: Nov 04, 2005 14:23
Location: Ontario, Canada
Contact:

Re: Possible bug? Undefined behavior in C emitted code?

Post by coderJeff »

Forgot all about this. And forget where this was left off. Sorry.

Which post / fragment of fbc code showed the exact problem? Doing a few tests here and the emitted code I'm currently generating isn't quite matching up with the fragments in this thread.

Thanks
angros47
Posts: 2326
Joined: Jun 21, 2005 19:04

Re: Possible bug? Undefined behavior in C emitted code?

Post by angros47 »

The question is for me or for VANYA?
coderJeff
Site Admin
Posts: 4326
Joined: Nov 04, 2005 14:23
Location: Ontario, Canada
Contact:

Re: Possible bug? Undefined behavior in C emitted code?

Post by coderJeff »

angros47 wrote: Nov 24, 2023 7:39 The question is for me or for VANYA?
oh, anyone that can answer will be fine with me. I think I found the working folder where I was looking at this issue last but I have absolutely lost track of where I left off with it.
Post Reply