The war on DGJPP bloat again | Helo word sizes | how to save
The war on DGJPP bloat again | Helo word sizes | how to save
Bloat test, compiling with FreeBASIC 0.18.5, COFF (unless stub specified) sizes in KiB, always aligned to 1/2 KiB :
76.5 - Empty or a few ASM
85.5 - Yeah ... ' ? "Hello world" ' - PRINT costs 9 KiB
87.5 - As above, with "standard" "ERROR: no DPMI" stub
107.5 - As above, with CWSDPMI stub, probably the FB Hello world size :-\
120.5 - As above, only theoretical, HDPMI stub
87 - Added just 2 UBYTE's and one STRING, optionally also ALLOC + COMMAND$(1) (no or 1/2 KiB diff)
112.5 - Added CRT.BI + FOPEN/FREAD/FCLOSE - 26 KiB cost !!!
106.5 - Disabled the globe, :WOW: - 6 KiB saved !!!
At least, now I have my own file I/O code with cost considerably lower than 26 KiB and considerably better support for > 4 GiB files ;-)
Anyway, considering that disabling the globe saves 6 KiB and even prevents bugs, are there maybe other DGJPP "features" that can be disabled ? The exception handling maybe ?
76.5 - Empty or a few ASM
85.5 - Yeah ... ' ? "Hello world" ' - PRINT costs 9 KiB
87.5 - As above, with "standard" "ERROR: no DPMI" stub
107.5 - As above, with CWSDPMI stub, probably the FB Hello world size :-\
120.5 - As above, only theoretical, HDPMI stub
87 - Added just 2 UBYTE's and one STRING, optionally also ALLOC + COMMAND$(1) (no or 1/2 KiB diff)
112.5 - Added CRT.BI + FOPEN/FREAD/FCLOSE - 26 KiB cost !!!
106.5 - Disabled the globe, :WOW: - 6 KiB saved !!!
At least, now I have my own file I/O code with cost considerably lower than 26 KiB and considerably better support for > 4 GiB files ;-)
Anyway, considering that disabling the globe saves 6 KiB and even prevents bugs, are there maybe other DGJPP "features" that can be disabled ? The exception handling maybe ?
From: http://www.delorie.com/djgpp/v2faq/faq8_14.html
( Why are DJGPP .exe files so large? )
The benefits of all that "bloat" can't be realized with small useless "Hello world" programs. However, here's an examples using the tips in that FAQ.
To go smaller, means serious messing around in the DJGPP crt startup, FB startup code, and/or writing your own EXE loader and run-time support.
( Why are DJGPP .exe files so large? )
The benefits of all that "bloat" can't be realized with small useless "Hello world" programs. However, here's an examples using the tips in that FAQ.
Code: Select all
'' fbc-dos-0.18.5 should compile
'' this to 63,488 bytes
#include "dos/conio.bi"
''
'' don't care about environment variables
''
sub __crt0_load_environment_file cdecl _
alias "__crt0_load_environment_file" _
( _
byval progname as zstring ptr _
)
end sub
''
'' don't care about command line arguments
''
sub __crt0_setup_arguments cdecl _
alias "__crt0_setup_arguments" _
( _
)
end sub
'' ----------------
'' main
'' ----------------
'' write directly to console
cputs( !"Hello world\n" )
Well, first of all, FreeBASIC/DOS's LIBC.A is 20k larger than DJGPP 2.04's version, so it must've been recompiled and/or patched. (Temporarily swap in 2.03p2's LIBC.A if you want to save 17k in your "Hello, world!" app, heh.)coderJeff wrote: ( Why are DJGPP .exe files so large? )
The benefits of all that "bloat" can't be realized with small useless "Hello world" programs. However, here's an examples using the tips in that FAQ.
(snip)
To go smaller, means serious messing around in the DJGPP crt startup, FB startup code, and/or writing your own EXE loader and run-time support.
Secondly, you may as well use UPX if you really want smaller binaries! ("--ultra-brute" basically does "--best --lzma --all-filters" for DJGPP stuff, but doing the latter is a lot faster)
Thirdly, 386+ code is always larger than 16-bit code. (The largest 8086 instruction takes only six bytes.)
Fourthly, unlike GCC, you don't save any extra alignment space by targeting a plain 386 (instead of 486), at least not with a simple "Hello, world!" program.
Fifthly, don't forget that DJGPP does a lot of stuff, including handling LFNs transparently, FPU emulation (for ye olde 486s), and its own variant of full symlink support (as of 2.04). All of that takes space.
Sixthly, did you check the generated .ASM file? (Only 683 bytes in my "Hello, world!" test.) Obviously all of the overhead is caused by the startup and libraries, not the actual code generated itself. If PRINT takes 9k, don't use it if you don't need it. Try inline asm (yeah, maybe a pain for complex stuff, but somebody can help, I'm sure.)
Hope this helps! ;-)
EDIT: Yes, apparently, FreeBASIC's libc.a was built on March 15, 2007 by GCC 3.4.4. I'm betting you could shrink it if you recompiled it with "-Os -fomit-frame-pointer -march=i386", but don't quote me on that! ;-)
EDIT #2: Globbing is very useful for char ranges "*.[a-c]*" and recursion (".../*.txt"), but obviously this simplistic example doesn't need it, so yeah, turn it off.
I just have a quick question?
Are you running this on a 386 with 2M RAM and a 40M harddrive?
100K for an executable is not unreasonable for DOS, at least not when you take some considerations into account:
Mostly that it's providing you with a complete 32-bit layer to access the common hardware (hdd, fdd, etc) which will be a good chunk of that 100k. All those micro-drivers that it provides or at very least, the pmode->rmode->pmode->memcpy gateway that it must do is not trivial or small.
If you are looking to be dumped into raw 32-bit pmode, that can be done in a couple hundred bytes but you won't be able to access anything except the RAM and the CPU. You'd need to write your own interface to the hardware. The BIOS is inaccessable because the BIOS is only 16-bit with the notable exception of the SVGA VESA BIOS (version 3.0 and later) which has a pmode access point, but you must be in rmode to get that access point first.
The DOS Extender is doing exactly that Extending. It's not just a simple pmode switcher, it's providing common APIs to common or standard hardware, again, mostly the floppy and hard drives but also contains a memory manager which may include a virtual memory manager depending on the host and other DOS ties to standard hardware (CDROM, NE2000, etc). There is a reason why Windows, Linux, etc, aren't just a couple of meg.
Are you running this on a 386 with 2M RAM and a 40M harddrive?
100K for an executable is not unreasonable for DOS, at least not when you take some considerations into account:
Mostly that it's providing you with a complete 32-bit layer to access the common hardware (hdd, fdd, etc) which will be a good chunk of that 100k. All those micro-drivers that it provides or at very least, the pmode->rmode->pmode->memcpy gateway that it must do is not trivial or small.
If you are looking to be dumped into raw 32-bit pmode, that can be done in a couple hundred bytes but you won't be able to access anything except the RAM and the CPU. You'd need to write your own interface to the hardware. The BIOS is inaccessable because the BIOS is only 16-bit with the notable exception of the SVGA VESA BIOS (version 3.0 and later) which has a pmode access point, but you must be in rmode to get that access point first.
The DOS Extender is doing exactly that Extending. It's not just a simple pmode switcher, it's providing common APIs to common or standard hardware, again, mostly the floppy and hard drives but also contains a memory manager which may include a virtual memory manager depending on the host and other DOS ties to standard hardware (CDROM, NE2000, etc). There is a reason why Windows, Linux, etc, aren't just a couple of meg.
On a good day. On a bad day, one of the 1MB modules probably failes, and the upper 20M of the HD has bad sectors ( :-) )1000101 wrote:I just have a quick question?
Are you running this on a 386 with 2M RAM and a 40M harddrive?
Bloat! (not that it makes sense, but it is so fun to yell it randomly?100K for an executable is not unreasonable for DOS, at least not when you take some considerations into account:
Isn't most of that stuff in the separate go32.exe binary?Mostly that it's providing you with a complete 32-bit layer to access the common hardware (hdd, fdd, etc) which will be a good chunk of that 100k. All those micro-drivers that it provides or at very least, the pmode->rmode->pmode->memcpy gateway that it must do is not trivial or small.
P.s. you might want to make a faq about this. You're welcome to copy fromly the floppy and hard drives but also contains a memory manager which may include a virtual memory manager depending on the host and other DOS ties to standard hardware (CDROM, NE2000, etc). There is a reason why Windows, Linux, etc, aren't just a couple of meg.
http://wiki.freepascal.org/Size_Matters and then specially the embedded paragraph:
http://wiki.freepascal.org/Size_Matters#Embedded
In short: people really strapped for size don't use prebuilt RTLs, but customize their own runtime anyway.
No, the only thing a (go32v2) DJGPP program (or FB DOS program) needs is a DPMI extender; everything needed from the runtime library is linked into the executable.marcov wrote:Isn't most of that stuff in the separate go32.exe binary?Mostly that it's providing you with a complete 32-bit layer to access the common hardware (hdd, fdd, etc) which will be a good chunk of that 100k. All those micro-drivers that it provides or at very least, the pmode->rmode->pmode->memcpy gateway that it must do is not trivial or small.
OK ... ages old, broken link to UPX :-DcoderJeff wrote:From: http://www.delorie.com/djgpp/v2faq/faq8_14.html ( Why are DJGPP .exe files so large? )
The "linux emulation at source level" is probably very useful when porting sophisticated stuff from Linux (MPLAYER, *L*NKS, WGET, ...) , OTOH obviously useless for new development from scratch.The benefits of all that "bloat" can't be realized with small useless "Hello world" programs.
Thanks. :-) Can I disable the exception handling also ? It's highly redundant. And the FPU emulation. Will FB executables still run then, if I avoid floats of course ? And the "symlink"'s ...however, here's an examples using the tips in that FAQ.
2 KiB ... not much space for saving.and/or writing your own EXE loader
We know ... UPX 4.xx will support PAQ ... you're making Japheth happy :-DRugxulo wrote:may as well use UPX if you really want smaller binaries --ultra-brutal
But I want to delete useless stuff, not compress it.
I know, but that's not the main problem.Thirdly, 386+ code is always larger than 16-bit code.
OK ... not badly needed for me. At least, I got rid of the 26 KiB of file I/O now :-Dlot of stuff, including handling LFNs , FPU , symlink
Yes. Full of PTR's ...did you check the generated .ASM file?
Indeed.Obviously all of the overhead is caused by the startup and libraries, not the actual code generated itself.
Already done, see my "76.5 - Empty or a few ASM" item ;-)Try inline asm
No need ;-)maybe a pain for complex stuff, but somebody can help, I'm sure.
OK ... known stuff.1000101 wrote:... (much)
I have my FASM examples.that can be done in a couple hundred bytes
Already exists (but needs some more info).marcov wrote:you might want to make a faq about this.
I know this text. It's famous beyond FP community. :-D But, as you can guess, I definitely disagree.You're welcome to copy from
http://wiki.freepascal.org/Size_Matters and then specially the
Used to be 15 years ago with "GO32V1", no CWSDPMI yet ?Isn't most of that stuff in the separate go32.exe binary?
Stuff see above: YesDrV wrote:everything needed from the runtime library is linked into the executable.
INT $31 / $0300 stuff: Yes.
DPMI host: No.
It's not really "linux" emulation; it is also just plain C standard library stuff (which the FB runtime relies heavily on for portability). If you want to get a significantly smaller executable, you'd need to rewrite the FB rtlib to avoid using libc, at which point you've thrown away all portability of the rtlib code, greatly increasing effort required to port to a new platform (or at least making the DOS port itself harder to maintain, as it wouldn't get any new improvements or fixes from updating the (very large) shared portion of the rtlib code that currently uses the standard C library).DOS386 wrote:The "linux emulation at source level" is probably very useful when porting sophisticated stuff from Linux (MPLAYER, *L*NKS, WGET, ...) , OTOH obviously useless for new development from scratch.The benefits of all that "bloat" can't be realized with small useless "Hello world" programs.
Also, the FPU emulation code is not linked in the main executable; it is in EMU387.DXE, which is loaded dynamically if needed. I do not have (and have never had in the last 10 or more years) an x86 machine without an FPU, so I have no use for this or way of testing this works, but I assume it did at one point (this is a DJGPP feature, not related to FB in specific). Of course, there's still a small overhead of the DXE loading code in every executable, but hopefully this is much smaller than if the 387 emulation code itself was linked in.
Thanks.
What about the exception code ? Can it be easily removed / barred out from linking ?
Tested CoderJeff's code, indeed works, 60 KiB COFF :-) Nevertheless, when I delete the "conio" stuff also, it doesn't shrink even more, it grows by 5 KiB !!! Why this ?
OK, nevertheless some people refer to "Linux emulation" : the "globing" fakes behavior of Linux, "SIGILL" seems to originate from Linux rather than from Intel, ...DrV wrote:not really "linux" emulation; it is also just plain C standard library stuff
OK ... known facts ... the "libc" of DGJPP is bloated and inefficient but there is no trivial way to fix it :-(If you want to get a significantly smaller executable, you'd need to rewrite the FB rtlib to avoid using libc, at which point you've thrown away all portability of the rtlib code, greatly increasing effort required to port to a new platform (or at least making the DOS port itself harder to maintain, as it wouldn't get any new improvements or fixes from updating the (very large) shared portion of the rtlib code that currently uses the standard C library).
Exactly as I also assumed ... still, a small piece of unnecessary code. Would FB work without any FPU and without EMU387 if I avoid floats ?FPU emulation code is not linked in the main executable; it is in EMU387.DXE, which is loaded dynamically if needed. I do not have (and have never had in the last 10 or more years) an x86 machine without an FPU, so I have no use for this or way of testing this works, but I assume it did at one point (this is a DJGPP feature, not related to FB in specific). Of course, there's still a small overhead of the DXE loading code in every executable, but hopefully this is much smaller than if the 387 emulation code itself was linked in.
What about the exception code ? Can it be easily removed / barred out from linking ?
Tested CoderJeff's code, indeed works, 60 KiB COFF :-) Nevertheless, when I delete the "conio" stuff also, it doesn't shrink even more, it grows by 5 KiB !!! Why this ?
Not without modifications; at the very least, the rtlib initialization sets the FPU rounding mode and precision. There are other places that use floating-point parameters which might not be obvious, like graphics functions (PUT, for example), so you would have to be careful to avoid these, but otherwise it should "just work" even with no FPU if you remove the FPU setup stuff.DOS386 wrote:Would FB work without any FPU and without EMU387 if I avoid floats ?
Thanks ...
As a good example, one can point to the DCT (+IDCT) algo: theory is floated, but has integer implementation ... just a question of will ;-)
Any idea about the conio removal problem from post above ?
Sad ... is this problem new to FB or is it present in DJGPP and FreePASCAL also ?DrV wrote:Not without modifications; at the very least, the rtlib initialization sets the FPU rounding mode and precision.
Parameters ? I see integers only ... Or use floats internally only ?are other places that use floating-point parameters which might not be obvious, like graphics functions (PUT, for example)
As a good example, one can point to the DCT (+IDCT) algo: theory is floated, but has integer implementation ... just a question of will ;-)
Any idea about the conio removal problem from post above ?
The initialization is in the FB rtlib, not the DJGPP-provided stuff, so it's nothing to do with DJGPP or FreePASCAL. (Perhaps the DJGPP startup code does its own FP init stuff, but you'd have to check the source.)Sad ... is this problem new to FB or is it present in DJGPP and FreePASCAL also ?Not without modifications; at the very least, the rtlib initialization sets the FPU rounding mode and precision.
http://freebasic.net/wiki/KeyPgPutGraphics doesn't specify the types, but you can see them in the gfxlib2 sources ( http://fbc.svn.sourceforge.net/viewvc/f ... iew=markup ):Parameters ? I see integers only ... Or use floats internally only ?
No idea about the conio stuff.FBCALL int fb_GfxPut(void *target, float fx, float fy, unsigned char *src, int x1, int y1, int x2, int y2, int coord_type, int put_mode, PUTTER *putter, int alpha, BLENDER *blender, void *param)
If there was a good way to disable it ...DrV wrote:The initialization is in the FB rtlib, not the DJGPP-provided stuff, so it's nothing to do with DJGPP or FreePASCAL. (Perhaps the DJGPP startup code does its own FP init stuff, but you'd have to check the source.)
What's the point of passing the coordinates in floats ?FBCALL int fb_GfxPut(void *target, float fx, float fy, unsigned char *src, int x1, int y1, int x2, int y2, int coord_type, int put_mode, PUTTER *putter, int alpha, BLENDER *blender, void *param)
If anyone else has please answer ;-)No idea about the conio stuff.
-
- Site Admin
- Posts: 6323
- Joined: Jul 05, 2005 17:32
- Location: Manchester, Lancs
Normally the screen coordinates are measured in integers, but if you use the WINDOW command, then you can recalibrate the screen mapping system to anything, e.g. (-1,-1)-(1,1). When you do this, floats are obviously needed to access non-integer coordinates.DOS386 wrote:What's the point of passing the coordinates in floats?FBCALL int fb_GfxPut(void *target, float fx, float fy, unsigned char *src, int x1, int y1, int x2, int y2, int coord_type, int put_mode, PUTTER *putter, int alpha, BLENDER *blender, void *param)
IIRC, there's quite a nice example on the WINDOW wiki page. Looks like I haven't got around to adding a screenshot though...
Yeah ... RTFM ... thanks :-)counting_pine wrote:coordinates are measured in integers, but if you use the WINDOW command, then you can recalibrate the screen mapping system to anything, e.g. (-1,-1)-(1,1). ... the WINDOW wiki page.
QB did so FB must also ... but with FB GFX as-is it doesn't break too much anyway since it won't work on 80486 or 80386 with FPU either because of lack of performance :-\Differences from QB:
* None