The war on DGJPP bloat again | Helo word sizes | how to save

DOS specific questions.
DOS386
Posts: 798
Joined: Jul 02, 2005 20:55

The war on DGJPP bloat again | Helo word sizes | how to save

Post by DOS386 »

Bloat test, compiling with FreeBASIC 0.18.5, COFF (unless stub specified) sizes in KiB, always aligned to 1/2 KiB :

76.5 - Empty or a few ASM

85.5 - Yeah ... ' ? "Hello world" ' - PRINT costs 9 KiB
87.5 - As above, with "standard" "ERROR: no DPMI" stub
107.5 - As above, with CWSDPMI stub, probably the FB Hello world size :-\
120.5 - As above, only theoretical, HDPMI stub

87 - Added just 2 UBYTE's and one STRING, optionally also ALLOC + COMMAND$(1) (no or 1/2 KiB diff)

112.5 - Added CRT.BI + FOPEN/FREAD/FCLOSE - 26 KiB cost !!!

106.5 - Disabled the globe, :WOW: - 6 KiB saved !!!

At least, now I have my own file I/O code with cost considerably lower than 26 KiB and considerably better support for > 4 GiB files ;-)

Anyway, considering that disabling the globe saves 6 KiB and even prevents bugs, are there maybe other DGJPP "features" that can be disabled ? The exception handling maybe ?
coderJeff
Site Admin
Posts: 4386
Joined: Nov 04, 2005 14:23
Location: Ontario, Canada
Contact:

Post by coderJeff »

From: http://www.delorie.com/djgpp/v2faq/faq8_14.html
( Why are DJGPP .exe files so large? )

The benefits of all that "bloat" can't be realized with small useless "Hello world" programs. However, here's an examples using the tips in that FAQ.

Code: Select all

'' fbc-dos-0.18.5 should compile
'' this to 63,488 bytes

#include "dos/conio.bi"

''
'' don't care about environment variables
''
sub __crt0_load_environment_file cdecl _
  alias "__crt0_load_environment_file" _
  ( _
    byval progname as zstring ptr _
  )
end sub

''
'' don't care about command line arguments
''
sub __crt0_setup_arguments cdecl _
  alias "__crt0_setup_arguments" _
  ( _
  )
end sub

'' ----------------
'' main
'' ----------------

'' write directly to console
cputs( !"Hello world\n" )
To go smaller, means serious messing around in the DJGPP crt startup, FB startup code, and/or writing your own EXE loader and run-time support.
rugxulo
Posts: 221
Joined: Jun 30, 2006 5:31
Location: Usono (aka, USA)
Contact:

Post by rugxulo »

coderJeff wrote: ( Why are DJGPP .exe files so large? )

The benefits of all that "bloat" can't be realized with small useless "Hello world" programs. However, here's an examples using the tips in that FAQ.

(snip)

To go smaller, means serious messing around in the DJGPP crt startup, FB startup code, and/or writing your own EXE loader and run-time support.
Well, first of all, FreeBASIC/DOS's LIBC.A is 20k larger than DJGPP 2.04's version, so it must've been recompiled and/or patched. (Temporarily swap in 2.03p2's LIBC.A if you want to save 17k in your "Hello, world!" app, heh.)

Secondly, you may as well use UPX if you really want smaller binaries! ("--ultra-brute" basically does "--best --lzma --all-filters" for DJGPP stuff, but doing the latter is a lot faster)

Thirdly, 386+ code is always larger than 16-bit code. (The largest 8086 instruction takes only six bytes.)

Fourthly, unlike GCC, you don't save any extra alignment space by targeting a plain 386 (instead of 486), at least not with a simple "Hello, world!" program.

Fifthly, don't forget that DJGPP does a lot of stuff, including handling LFNs transparently, FPU emulation (for ye olde 486s), and its own variant of full symlink support (as of 2.04). All of that takes space.

Sixthly, did you check the generated .ASM file? (Only 683 bytes in my "Hello, world!" test.) Obviously all of the overhead is caused by the startup and libraries, not the actual code generated itself. If PRINT takes 9k, don't use it if you don't need it. Try inline asm (yeah, maybe a pain for complex stuff, but somebody can help, I'm sure.)

Hope this helps! ;-)

EDIT: Yes, apparently, FreeBASIC's libc.a was built on March 15, 2007 by GCC 3.4.4. I'm betting you could shrink it if you recompiled it with "-Os -fomit-frame-pointer -march=i386", but don't quote me on that! ;-)

EDIT #2: Globbing is very useful for char ranges "*.[a-c]*" and recursion (".../*.txt"), but obviously this simplistic example doesn't need it, so yeah, turn it off.
1000101
Posts: 2556
Joined: Jun 13, 2005 23:14
Location: SK, Canada

Post by 1000101 »

I just have a quick question?

Are you running this on a 386 with 2M RAM and a 40M harddrive?

100K for an executable is not unreasonable for DOS, at least not when you take some considerations into account:

Mostly that it's providing you with a complete 32-bit layer to access the common hardware (hdd, fdd, etc) which will be a good chunk of that 100k. All those micro-drivers that it provides or at very least, the pmode->rmode->pmode->memcpy gateway that it must do is not trivial or small.

If you are looking to be dumped into raw 32-bit pmode, that can be done in a couple hundred bytes but you won't be able to access anything except the RAM and the CPU. You'd need to write your own interface to the hardware. The BIOS is inaccessable because the BIOS is only 16-bit with the notable exception of the SVGA VESA BIOS (version 3.0 and later) which has a pmode access point, but you must be in rmode to get that access point first.

The DOS Extender is doing exactly that Extending. It's not just a simple pmode switcher, it's providing common APIs to common or standard hardware, again, mostly the floppy and hard drives but also contains a memory manager which may include a virtual memory manager depending on the host and other DOS ties to standard hardware (CDROM, NE2000, etc). There is a reason why Windows, Linux, etc, aren't just a couple of meg.
marcov
Posts: 3503
Joined: Jun 16, 2005 9:45
Location: Netherlands
Contact:

Post by marcov »

1000101 wrote:I just have a quick question?

Are you running this on a 386 with 2M RAM and a 40M harddrive?
On a good day. On a bad day, one of the 1MB modules probably failes, and the upper 20M of the HD has bad sectors ( :-) )
100K for an executable is not unreasonable for DOS, at least not when you take some considerations into account:
Bloat! (not that it makes sense, but it is so fun to yell it randomly?
Mostly that it's providing you with a complete 32-bit layer to access the common hardware (hdd, fdd, etc) which will be a good chunk of that 100k. All those micro-drivers that it provides or at very least, the pmode->rmode->pmode->memcpy gateway that it must do is not trivial or small.
Isn't most of that stuff in the separate go32.exe binary?
ly the floppy and hard drives but also contains a memory manager which may include a virtual memory manager depending on the host and other DOS ties to standard hardware (CDROM, NE2000, etc). There is a reason why Windows, Linux, etc, aren't just a couple of meg.
P.s. you might want to make a faq about this. You're welcome to copy from
http://wiki.freepascal.org/Size_Matters and then specially the embedded paragraph:

http://wiki.freepascal.org/Size_Matters#Embedded

In short: people really strapped for size don't use prebuilt RTLs, but customize their own runtime anyway.
DrV
Site Admin
Posts: 2116
Joined: May 27, 2005 18:39
Location: Midwestern USA
Contact:

Post by DrV »

marcov wrote:
Mostly that it's providing you with a complete 32-bit layer to access the common hardware (hdd, fdd, etc) which will be a good chunk of that 100k. All those micro-drivers that it provides or at very least, the pmode->rmode->pmode->memcpy gateway that it must do is not trivial or small.
Isn't most of that stuff in the separate go32.exe binary?
No, the only thing a (go32v2) DJGPP program (or FB DOS program) needs is a DPMI extender; everything needed from the runtime library is linked into the executable.
DOS386
Posts: 798
Joined: Jul 02, 2005 20:55

Post by DOS386 »

coderJeff wrote:From: http://www.delorie.com/djgpp/v2faq/faq8_14.html ( Why are DJGPP .exe files so large? )
OK ... ages old, broken link to UPX :-D
The benefits of all that "bloat" can't be realized with small useless "Hello world" programs.
The "linux emulation at source level" is probably very useful when porting sophisticated stuff from Linux (MPLAYER, *L*NKS, WGET, ...) , OTOH obviously useless for new development from scratch.
however, here's an examples using the tips in that FAQ.
Thanks. :-) Can I disable the exception handling also ? It's highly redundant. And the FPU emulation. Will FB executables still run then, if I avoid floats of course ? And the "symlink"'s ...
and/or writing your own EXE loader
2 KiB ... not much space for saving.
Rugxulo wrote:may as well use UPX if you really want smaller binaries --ultra-brutal
We know ... UPX 4.xx will support PAQ ... you're making Japheth happy :-D

But I want to delete useless stuff, not compress it.
Thirdly, 386+ code is always larger than 16-bit code.
I know, but that's not the main problem.
lot of stuff, including handling LFNs , FPU , symlink
OK ... not badly needed for me. At least, I got rid of the 26 KiB of file I/O now :-D
did you check the generated .ASM file?
Yes. Full of PTR's ...
Obviously all of the overhead is caused by the startup and libraries, not the actual code generated itself.
Indeed.
Try inline asm
Already done, see my "76.5 - Empty or a few ASM" item ;-)
maybe a pain for complex stuff, but somebody can help, I'm sure.
No need ;-)
1000101 wrote:... (much)
OK ... known stuff.
that can be done in a couple hundred bytes
I have my FASM examples.
marcov wrote:you might want to make a faq about this.
Already exists (but needs some more info).
You're welcome to copy from
http://wiki.freepascal.org/Size_Matters and then specially the
I know this text. It's famous beyond FP community. :-D But, as you can guess, I definitely disagree.
Isn't most of that stuff in the separate go32.exe binary?
Used to be 15 years ago with "GO32V1", no CWSDPMI yet ?
DrV wrote:everything needed from the runtime library is linked into the executable.
Stuff see above: Yes
INT $31 / $0300 stuff: Yes.
DPMI host: No.
DrV
Site Admin
Posts: 2116
Joined: May 27, 2005 18:39
Location: Midwestern USA
Contact:

Post by DrV »

DOS386 wrote:
The benefits of all that "bloat" can't be realized with small useless "Hello world" programs.
The "linux emulation at source level" is probably very useful when porting sophisticated stuff from Linux (MPLAYER, *L*NKS, WGET, ...) , OTOH obviously useless for new development from scratch.
It's not really "linux" emulation; it is also just plain C standard library stuff (which the FB runtime relies heavily on for portability). If you want to get a significantly smaller executable, you'd need to rewrite the FB rtlib to avoid using libc, at which point you've thrown away all portability of the rtlib code, greatly increasing effort required to port to a new platform (or at least making the DOS port itself harder to maintain, as it wouldn't get any new improvements or fixes from updating the (very large) shared portion of the rtlib code that currently uses the standard C library).

Also, the FPU emulation code is not linked in the main executable; it is in EMU387.DXE, which is loaded dynamically if needed. I do not have (and have never had in the last 10 or more years) an x86 machine without an FPU, so I have no use for this or way of testing this works, but I assume it did at one point (this is a DJGPP feature, not related to FB in specific). Of course, there's still a small overhead of the DXE loading code in every executable, but hopefully this is much smaller than if the 387 emulation code itself was linked in.
DOS386
Posts: 798
Joined: Jul 02, 2005 20:55

Post by DOS386 »

Thanks.
DrV wrote:not really "linux" emulation; it is also just plain C standard library stuff
OK, nevertheless some people refer to "Linux emulation" : the "globing" fakes behavior of Linux, "SIGILL" seems to originate from Linux rather than from Intel, ...
If you want to get a significantly smaller executable, you'd need to rewrite the FB rtlib to avoid using libc, at which point you've thrown away all portability of the rtlib code, greatly increasing effort required to port to a new platform (or at least making the DOS port itself harder to maintain, as it wouldn't get any new improvements or fixes from updating the (very large) shared portion of the rtlib code that currently uses the standard C library).
OK ... known facts ... the "libc" of DGJPP is bloated and inefficient but there is no trivial way to fix it :-(
FPU emulation code is not linked in the main executable; it is in EMU387.DXE, which is loaded dynamically if needed. I do not have (and have never had in the last 10 or more years) an x86 machine without an FPU, so I have no use for this or way of testing this works, but I assume it did at one point (this is a DJGPP feature, not related to FB in specific). Of course, there's still a small overhead of the DXE loading code in every executable, but hopefully this is much smaller than if the 387 emulation code itself was linked in.
Exactly as I also assumed ... still, a small piece of unnecessary code. Would FB work without any FPU and without EMU387 if I avoid floats ?

What about the exception code ? Can it be easily removed / barred out from linking ?

Tested CoderJeff's code, indeed works, 60 KiB COFF :-) Nevertheless, when I delete the "conio" stuff also, it doesn't shrink even more, it grows by 5 KiB !!! Why this ?
DrV
Site Admin
Posts: 2116
Joined: May 27, 2005 18:39
Location: Midwestern USA
Contact:

Post by DrV »

DOS386 wrote:Would FB work without any FPU and without EMU387 if I avoid floats ?
Not without modifications; at the very least, the rtlib initialization sets the FPU rounding mode and precision. There are other places that use floating-point parameters which might not be obvious, like graphics functions (PUT, for example), so you would have to be careful to avoid these, but otherwise it should "just work" even with no FPU if you remove the FPU setup stuff.
DOS386
Posts: 798
Joined: Jul 02, 2005 20:55

Post by DOS386 »

Thanks ...
DrV wrote:Not without modifications; at the very least, the rtlib initialization sets the FPU rounding mode and precision.
Sad ... is this problem new to FB or is it present in DJGPP and FreePASCAL also ?
are other places that use floating-point parameters which might not be obvious, like graphics functions (PUT, for example)
Parameters ? I see integers only ... Or use floats internally only ?

As a good example, one can point to the DCT (+IDCT) algo: theory is floated, but has integer implementation ... just a question of will ;-)

Any idea about the conio removal problem from post above ?
DrV
Site Admin
Posts: 2116
Joined: May 27, 2005 18:39
Location: Midwestern USA
Contact:

Post by DrV »

Not without modifications; at the very least, the rtlib initialization sets the FPU rounding mode and precision.
Sad ... is this problem new to FB or is it present in DJGPP and FreePASCAL also ?
The initialization is in the FB rtlib, not the DJGPP-provided stuff, so it's nothing to do with DJGPP or FreePASCAL. (Perhaps the DJGPP startup code does its own FP init stuff, but you'd have to check the source.)
Parameters ? I see integers only ... Or use floats internally only ?
http://freebasic.net/wiki/KeyPgPutGraphics doesn't specify the types, but you can see them in the gfxlib2 sources ( http://fbc.svn.sourceforge.net/viewvc/f ... iew=markup ):
FBCALL int fb_GfxPut(void *target, float fx, float fy, unsigned char *src, int x1, int y1, int x2, int y2, int coord_type, int put_mode, PUTTER *putter, int alpha, BLENDER *blender, void *param)
No idea about the conio stuff.
DOS386
Posts: 798
Joined: Jul 02, 2005 20:55

Post by DOS386 »

DrV wrote:The initialization is in the FB rtlib, not the DJGPP-provided stuff, so it's nothing to do with DJGPP or FreePASCAL. (Perhaps the DJGPP startup code does its own FP init stuff, but you'd have to check the source.)
If there was a good way to disable it ...
FBCALL int fb_GfxPut(void *target, float fx, float fy, unsigned char *src, int x1, int y1, int x2, int y2, int coord_type, int put_mode, PUTTER *putter, int alpha, BLENDER *blender, void *param)
What's the point of passing the coordinates in floats ?
No idea about the conio stuff.
If anyone else has please answer ;-)
counting_pine
Site Admin
Posts: 6323
Joined: Jul 05, 2005 17:32
Location: Manchester, Lancs

Post by counting_pine »

DOS386 wrote:
FBCALL int fb_GfxPut(void *target, float fx, float fy, unsigned char *src, int x1, int y1, int x2, int y2, int coord_type, int put_mode, PUTTER *putter, int alpha, BLENDER *blender, void *param)
What's the point of passing the coordinates in floats?
Normally the screen coordinates are measured in integers, but if you use the WINDOW command, then you can recalibrate the screen mapping system to anything, e.g. (-1,-1)-(1,1). When you do this, floats are obviously needed to access non-integer coordinates.
IIRC, there's quite a nice example on the WINDOW wiki page. Looks like I haven't got around to adding a screenshot though...
DOS386
Posts: 798
Joined: Jul 02, 2005 20:55

Post by DOS386 »

counting_pine wrote:coordinates are measured in integers, but if you use the WINDOW command, then you can recalibrate the screen mapping system to anything, e.g. (-1,-1)-(1,1). ... the WINDOW wiki page.
Yeah ... RTFM ... thanks :-)
Differences from QB:
* None
QB did so FB must also ... but with FB GFX as-is it doesn't break too much anyway since it won't work on 80486 or 80386 with FPU either because of lack of performance :-\
Post Reply