Executables and Compiling

Forum for discussion about the documentation project.
coderJeff
Site Admin
Posts: 4326
Joined: Nov 04, 2005 14:23
Location: Ontario, Canada
Contact:

Executables and Compiling

Post by coderJeff »

I had been thinking about this wiki page for a while: Executables
When testing a bug, I got confused about some of fbc's command line options, and that helped lead me to what I might want to say to explain about executables.

Compiling an Executable, in general

fbc is a compiler that takes fbc source code and transforms it in to a file that can be loaded and executed (run) by the operating system. fbc doesn't do this all on it's own. It uses some intermediate files and other tools to complete this transformation.

For simplicity we'll leave a couple ideas for later, or another time:
- static libraries are collections of object files (compiled code, but not executable)
- dynamic link libraries are a kind of executable that's loaded and used from another executable
- module constructors are special code that gets executed before the "main" function (more on the main function below).
- emscripten and llvm backends


The "main" entry point of an executable

An executable needs a starting point. This starting point which we will call the "main" function or "main" entry point needs to be recorded in the executable so that when the executable file is loaded by the operating system, the operating system knows where to begin execution of the program.

By default, the "main" function or starting point will be the first line of the first basic source file on the command line.
$ fbc program.bas module1.bas module2.bas
"program" becomes the main module because it is first, and fbc will generate an implicit "main" function that will be executed first when the executable is loaded.

This default can be overriden with the '-m module' option to specifiy a main module that is not the first source file given on the command line.
$ fbc -m program module1.bas module2.bas program.bas
The "-m program" option tells fbc to use "program.bas" as the main module, even though "program.bas" is not listed first.

If no other option is given that will affect the compile process, this "main" function is generated implicitly by fbc.

There can be only one "main" function for an executable. It's not possible to have more than one "main" function.
coderJeff
Site Admin
Posts: 4326
Joined: Nov 04, 2005 14:23
Location: Ontario, Canada
Contact:

Re: Executables and Compiling

Post by coderJeff »

Compile process for an executable

When fbc compiles basic source code, it translates the source in to another format that can be used by other tools that eventually create an executable. By default, fbc will use these other tools automatically.
To see all the steps that fbc uses, specify '-v' on the command line to see the steps.
For example, on win32:

Code: Select all

$ fbc a.bas -v
FreeBASIC Compiler - Version 1.08.0 (2021-01-24), built for win32 (32bit)
Copyright (C) 2004-2021 The FreeBASIC development team.
standalone
target:       win32, 486, 32bit
backend:      gas
compiling:    a.bas -o a.asm (main module)
assembling:   D:\fb.git\bin\win32\as.exe --32 --strip-local-absolute "a.asm" -o "a.o"
linking:      D:\fb.git\bin\win32\ld.exe -m i386pe -o "a.exe" -subsystem console
"D:\fb.git\lib\win32\fbextra.x" --stack 1048576,1048576 -s -L "D:\fb.git\lib\win32" 
-L "." "D:\fb.git\lib\win32\crt2.o" "D:\fb.git\lib\win32\crtbegin.o" "D:\fb.git\lib\win32\fbrt0.o"
"a.o" "-(" -lfb -lgcc -lmsvcrt -lkernel32 -luser32 -lmingw32 -lmingwex -lmoldname -lgcc_eh "-)"
"D:\fb.git\lib\win32\crtend.o"
Tools:
- [ fbc ] compiler translate *.bas in to *.a64 or *.asm or *.c files
- [ gcc ] compiler translate *.c files in to *.asm files
- [ as ] assembler translate *.asm/*.a64 files in to *.o object files
- [ ld ] linker join *.o files (and other files) in to executable files
- emscripten backend has other tools
- llvm backend has other tools


GNU assembler 32-bit backend (-gen gas)

*.bas => [ fbc ] => *.asm compile (first stage) to assembly (-r or -rr, -R or -RR)
*.asm => [ as ] => *.o assemble to object file (-c, -C)
*.o => [ ld ] => *[.exe] link to executable

GNU assembler 64-bit backend (-gen gas64)

*.bas => [ fbc ] => *.a64 compile (first stage) to assembly (-r or -rr, -R or -RR)
*.a64 => [ as ] => *.o assemble to object file (-c, -C)
*.o => [ ld ] => *[.exe] link to executable

GCC compiler backend (-gen gcc)

*.bas => [ fbc ] => *.c compile (first stage) to C (-r, -R)
*.c => [ gcc ] => *.asm compile (second stage) to assembly (-rr, -RR)
*.asm => [ as ] => *.o assemble to object file (-c, -C)
*.o => [ ld ] => *[.exe] link to executable


Options controlling compile / assemble / link stages:

There are a few options that can control what fbc does with the intermediate files and at what point the process may be stopped early.

-r, -rr, -c : stop the compile / assemble process sometime before the link stage
-R, -RR, -C : keep intermediate files at compile / assemble stages then continue to next stage

Compiler Option -r : compile up to first stage, keep file (*.asm/*.a64/*.c), and stop
Compiler Option -rr : compile up to second stage, keep file (*.asm), and stop
Compiler Option -c : compile up to assembly stage, keep file (*.o), and stop

Compiler Option -R : don't delete compile (first stage) intermediate file (*.asm/*.a64/*.c)
Compiler Option -RR : don't delete compile (second stage) intermediate file (*.asm)
Compiler Option -C : don't delete assemble stage intermediate file (*.o)

-r : option overrides -rr, -RR, -c, -C
-rr : overrides overrides -c, -C
-r and -rr : behave the same if there is only one compile stage
-R and -RR : behave the same if there is only one compile stage

-r, -rr, -c : override the default behaviour of creating an implicit "main" entry point, and no "main" function is created by default. To have a "main" entry point when using the -r, -rr, -c, options, then '-m module' option needs to be used to indicate which module should have an "main" function.

-dll and -lib options
In general, the above for -r, -R, -rr, -RR, -c, -C should hold true for -lib and -dll, however default behaviours for the implicit main are probably different? I haven't verified for myself yet....
marcov
Posts: 3462
Joined: Jun 16, 2005 9:45
Location: Netherlands
Contact:

Re: Executables and Compiling

Post by marcov »

(Main is not the entry point of the program. That is usually the symbol "_start", which resides in cprt0.o or something similar.

_start transforms some of its parameters to prepare the argc/argv symbols(if needed), initializes stack frames and initializes the libraries (ctors)
)
coderJeff
Site Admin
Posts: 4326
Joined: Nov 04, 2005 14:23
Location: Ontario, Canada
Contact:

Re: Executables and Compiling

Post by coderJeff »

I think the goal here should be a few concepts that can help build an understanding:
- executable programs need a "main" point of entry
- fbc may or may not create an implicit main function, depending on options or method of building an executable
- some fbc options / methods of compiling will prevent an implicit main from being created, consequently allowing for the definition of an explicit main / winmain, etc.
- module constructors run before main, module destructors run after main
marcov wrote:(Main is not the entry point of the program. That is usually the symbol "_start", which resides in cprt0.o or something similar.
_start transforms some of its parameters to prepare the argc/argv symbols(if needed), initializes stack frames and initializes the libraries (ctors)
)
I know what you're getting at, but realistically when should a user care about main() vs __main() vs *mainCRTStartup() vs whatever load/init mechanisms exist? With Compiler Option -nodeflibs the actual start-up and initializing code will come in to play as we are no longer linking against the C runtime. But for talking or explaining about executables at a relatable level to general users, I don't know how to be more precise with the idea of a program starting with respect to basic user code without muddying it up with a lot of platform and tool chain specific details.
SARG
Posts: 1765
Joined: May 27, 2005 7:15
Location: FRANCE

Re: Executables and Compiling

Post by SARG »

Good work Jeff

Some typos and forgettings :

emscripten backed has other tools <--- missing n

- [ as ] assembler translate *.asm files in to *.o object files <--- missing *.a64

-R, -RR, -C : keep imtermediate files at compile / assemble stages then continue to next stage <---- m instead n

Compiler Option -R : don't delete compile (first stage) intermediate file (*.asm/*.c) <--- missing *.a64


-R, -r etc are really not userfriendly. Each time before using them I need to read carefully documentation. :-)
coderJeff
Site Admin
Posts: 4326
Joined: Nov 04, 2005 14:23
Location: Ontario, Canada
Contact:

Re: Executables and Compiling

Post by coderJeff »

coderJeff wrote:With Compiler Option -nodeflibs the actual start-up and initializing code will come in to play as we are no longer linking against the C runtime.
Need to correct myself here: fbc still pulls in some the init/exit code with some crt?.o, crtbegin.o, crtend.o. I'd have to look through each platform / profiling options to confirm exact names.
marcov
Posts: 3462
Joined: Jun 16, 2005 9:45
Location: Netherlands
Contact:

Re: Executables and Compiling

Post by marcov »

coderJeff wrote:I think the goal here should be a few concepts that can help build an understanding:
I know what you're getting at, but realistically when should a user care about main() vs __main() vs *mainCRTStartup() vs whatever load/init mechanisms exist? With Compiler Option -nodeflibs the actual start-up and initializing code will come in to play as we are no longer linking against the C runtime. But for talking or explaining about executables at a relatable level to general users, I don't know how to be more precise with the idea of a program starting with respect to basic user code without muddying it up with a lot of platform and tool chain specific details.
The details can be left out, and were meant as an illustration. The core point was however not misrepresenting Main symbol as the actual entry point. Describe it as the first user code to be run by the startup framework or so. No details, concise, and still correct.

Note that if you don't go for details, then maybe also leave out the MAIN() symbol name altogether? That is then an internal symbol only, and users don't need to know that, unless it really has a place in the language as a "sub main()" or so.
coderJeff
Site Admin
Posts: 4326
Joined: Nov 04, 2005 14:23
Location: Ontario, Canada
Contact:

Re: Executables and Compiling

Post by coderJeff »

marcov wrote:Describe it as the first user code to be run by the startup framework or so. No details, concise, and still correct.
That sounds workable. Yeah, "entry point" on it's own or even coupled with "user code entry point" could be still be confusing due the specific meaning for linkers and executables.
marcov wrote:Note that if you don't go for details, then maybe also leave out the MAIN() symbol name altogether? That is then an internal symbol only, and users don't need to know that, unless it really has a place in the language as a "sub main()" or so.
Hmm... We need to call it something though. I feel like the term 'implicit main' has been in the fbc vocabulary for years.

---

If I do not use "main function" but only "main module" then I think it could be laid out something like this:

main module : the user source file identified as having the starting point for user code, there can be only one main module in an executable
other module : any module that is not the main module
module-level code : user code that is not inside any procedure (sub, function, member procedure, etc)
user code : any executable (not declarations) code written by the user
module-level code in main module : beginning of user code
module-level code in other module : module constructor user code
module constructor : user code for initializing the module and executes before the main module user code

I really feel like I want to call "main module user code" => "main" for brevity, lol.

fbc under the hood:
- a "main()" function will be typically(*) be generated by default. The exact name of the symbol can depend on compile options. (* I don't see many users using makefiles or separating the compile / link stages in batch files, etc).
- there are some compile options which cause "main()" not to be generated
- to create an explicit "main()", the definition has to follow what's expected by the startup framework
- if a "main()" function is generated (implicit or explicit) can use it to set the initial break point when debugging (after all module constructors) without having to find the actual starting point by line number

I think I'll do a couple of examples to expand on the main() details. It doesn't necessarily have to in the wiki topic, though. I feel like I did something similar on the forum before with argv[] handling.

---
Just noticed this question in beginner's Main line code in modules that are not the main module, which I think is related to some of what we're trying to explain and document here.
fxm
Moderator
Posts: 12107
Joined: Apr 22, 2009 12:46
Location: Paris suburbs, FRANCE

Re: Executables and Compiling

Post by fxm »

I still have a doubt about what exactly happens when the program starts, if other module (than the main module) has its own module constructor, module destructor and module-level code.

When and how are these different codes of other module executed?

Is there a difference if:
  • the other module is compiled at the same time as the main module,
  • the other module is used as a static library,
  • the other module is used as shared library (DLL):
    • either statically loaded,
    • or dynamically loaded.
marcov
Posts: 3462
Joined: Jun 16, 2005 9:45
Location: Netherlands
Contact:

Re: Executables and Compiling

Post by marcov »

coderJeff wrote: module-level code in main module : beginning of user code
module-level code in other module : module constructor user code
That is what I thought. So the difference is only the position on the cmdline,with no difference in syntax between program (main) module and aux modules that could be used to generate a warning or error if aux module is accidentally specified first?

Then IMHO talking about main is even more confusing, if you can't really tell from a piece of source if it is a (implicit) main module or not?

Btw I miss module destructors and maybe also class constructors/destructors in the list. Maybe FB doesn't have them? (Class constructors are called for the class type as a kind of ctor if the class is actually used (not smartlinked out). A way of decentralizing ctor/dtor support in an per class/OOP manner) avoiding linking in the class because some initialization for it was done in the module ctor)
I really feel like I want to call "main module user code" => "main" for brevity, lol.
We call it program (for an exe) or library (for a dll/.so), while auxiliary modules are "module" or "unit", depending on the dialect (Unit=Borland like, Module is the mostly forgotten second Pascal standard)
But more importantly, all those are the first token in each piece of sourcecode, so that makes is easy and unambiguous to explain. I can recommend it if you ever implement a non-quirks mode :-)

But QB iirc had the weird convention to have the main program first, with subs following ?
- to create an explicit "main()", the definition has to follow what's expected by the startup framework
(which can be funny with iirc e.g. SDL2 which wants an own main)
coderJeff
Site Admin
Posts: 4326
Joined: Nov 04, 2005 14:23
Location: Ontario, Canada
Contact:

Re: Executables and Compiling

Post by coderJeff »

I think this will get too complicated if we jump right in to the DLL stuff. If I remember correctly there's some differences between windows and linux.
fxm wrote:Is there a difference if:
the other module is compiled at the same time as the main module,
the other module is used as a static library,
There's no difference if the other module is compiled at the same time or is an object file specified on the command line.

Static libraries are containers for object files with one major difference. To have an object file from a library be included in an executable, the main module needs to reference it either directly or indirectly (I think I'll make a separate post on that).

Consider this example where we have two modules, both with module level code:
r1.bas

Code: Select all

print __FILE__ & ":" & __FUNCTION__
r2.bas

Code: Select all

print __FILE__ & ":" & __FUNCTION__
1) Compile in one step:

- r1.bas is first on the command line, so it is the main module
- r2.bas is after, so it is some other module
- We are passing -gen gcc and -R to keep the *.c files. It might be easier to see what's going on in C rather than assembly.

$ fbc r1.bas r2.bas -gen gcc -R

Having a look at the resulting intermediate r1.c and r2.c files...

in r1.c (r1.bas main module):
- we have: int32 main( int32 __FB_ARGC__$0, char** __FB_ARGV__$0 )
- which is the implicit main function from the module level user code, (I don't I can totally avoid the use of "main" function)
- in an executable, there can be only one "main" symbol, which links to the startup framework
- the main function is how the startup framework knows where to start the main module user code

in r2.c (r2.bas other module):
- we have: __attribute__(( constructor )) static void fb_ctor__r2( void )
- which is an implicit module constructor from the module level user code
- there can be any number of these module constructors
- the address of these constructors are stored in a special place by the linker
- when the startup framework runs, it will execute all of the module constructors before calling the "main" function

When we run r1[.exe] this we get this output:
$ ./r1
r2.bas:__FB_MODLEVELPROC__
r1.bas:__FB_MAINPROC__

What you should be able to see from this is that:
- implicit main is determined by which module is the main module
- implicit module constructor is determined by *not* being the main module

2) Compile in separate steps and run:

Compile r1.bas and r2.bas separately to object modules:

$ fbc -c -m r1 r1.bas
$ fbc -c r2.bas

The -c option tells fbc to compile to an object module then stop. And when using -c option, there is no main module by default, so '-m r1' is added to indicate that the main module is r1.bas.

Link the two object modules to an executable and run r1.exe:
$ fbc r1.o r2.o
$ ./r1
r2.bas:__FB_MODLEVELPROC__
r1.bas:__FB_MAINPROC__

A subtle point here is that if the object modules are reversed,
$ fbc r2.o r1.o
- r1.o is still the main module
- r2.o is still the other module
- the output filename is r2[.exe] by default only because it was the first filename specificied, but if we were to run the new r2[.exe], it's the same program because the contents (code) in r1.o and r2.o doesn't change because the order is changed:
$ ./r2
r2.bas:__FB_MODLEVELPROC__
r1.bas:__FB_MAINPROC__

3) Compile other module in to a static library and run

Compile r2.bas to a static library
$ fbc -lib r2.bas

Compile r1.bas to an executable and allow libr2.a to be linked.
$ fbc r1.bas -l r2
$ ./r1
r1.bas:__FB_MAINPROC__

Even though r2.o is in the libr2.a library, there's nothing in r1.bas that needs anything from r2.o. Only if there was some declaration or function in r2.o that was needed by r1.o would it get pulled in automatically.
coderJeff
Site Admin
Posts: 4326
Joined: Nov 04, 2005 14:23
Location: Ontario, Canada
Contact:

Re: Executables and Compiling

Post by coderJeff »

marcov wrote:Btw I miss module destructors and maybe also class constructors/destructors in the list. Maybe FB doesn't have them? (Class constructors are called for the class type as a kind of ctor if the class is actually used (not smartlinked out). A way of decentralizing ctor/dtor support in an per class/OOP manner) avoiding linking in the class because some initialization for it was done in the module ctor)
That's all there too.... like 95% confidence. I may have to verify some linking subtleties .... this may take a while to unpack.
But QB iirc had the weird convention to have the main program first, with subs following ?
I think they had the convention of ignoring module level code in all other modules except for error handling (a label). Otherwise, yes I think main module was specified first in the .mak file to indicate main module.
Julcar
Posts: 141
Joined: Oct 19, 2010 18:52
Contact:

Re: Executables and Compiling

Post by Julcar »

Let's say I have a header file where I declare an array

modules.bi

Code: Select all

DIM MyArray(3) AS STRING
then in module2.bas I initialize the array

Code: Select all

#include "modules.bi"
MyArray(0) = "option1" : MyArray(1) = "option2" : MyArray(2) = "option3"
and finally I call it from main module1.bas

Code: Select all

#include "modules.bi"
For i as ubyte to ubound(MyArray)
  print MyArray(i)
next i
is really the array initialized before calling from the main module?
SARG
Posts: 1765
Joined: May 27, 2005 7:15
Location: FRANCE

Re: Executables and Compiling

Post by SARG »

Module 1 is the main module put it first in the list, however module 2 will be the first executed.

Code: Select all

fbc module1.bas module2.bas
Then you need to define the array as common in both modules and redim in one (as it's an array) See manual for common statement.
And don't forget that re/dim MyArray(3) there are 4 elements.


Module 1

Code: Select all

common MyArray() AS string
print "inside module 1"
For i as ubyte =lbound(MyArray) to ubound(MyArray)
  print i;"   ";MyArray(i)
next i
sleep
Module 2

Code: Select all

common MyArray() AS string
redim myarray(3)
MyArray(0) = "option1" : MyArray(1) = "option2" : MyArray(2) = "option3"
print "inside module 2"
fxm
Moderator
Posts: 12107
Joined: Apr 22, 2009 12:46
Location: Paris suburbs, FRANCE

Re: Executables and Compiling

Post by fxm »

SARG wrote:Module 1 is the main module put it first in the list, however module 2 will be the first executed.
Because the module-level code in module 2 is put in an implicit module constructor which is consequently executed before the module-level code in module 1.
Post Reply