Problem writing 8GB array to disk

General FreeBASIC programming questions.
cbruce
Posts: 163
Joined: Sep 12, 2007 19:13
Location: Dallas, Texas

Re: Problem writing 8GB array to disk

Post by cbruce »

Some helpful information about the Microsoft fwrite large data issue from:

https://social.msdn.microsoft.com/Forum ... =vcgeneral
I think nobody ever tested fwrite with data more than 4GB as Microsoft code loops forever.

MSVC 2008, 64-bit project:

fwrite( p, sizeof(int), num, fout );

num is 1024*1024*1024
sizeof(int) is 4

fwrite locks the stream and calls
size_t __cdecl _fwrite_nolock

there is nice loop there, where (bufsize is 4096)

nbytes = ( bufsize ? (unsigned)(count - count % bufsize) : (unsigned)count );

count at this point is 4*1024*1024*1024

so
nbytes = (unsigned)(4*1024*1024*1024) =0
it tries to write 0 bytes, subtracts it from count (no change of course) so infinite tight loop.

if I'm trying to write not 4GB of data but say 5GB then

nbytes = ( bufsize ? (unsigned)(count - count % bufsize) : (unsigned)count );

nbytes now 1GB, it writes 1GB, then count is 5GB-1GB = 4GB, (unsigned)4GB is 0 - infinite loop.

So no matter what the size is if it's above 4GB (i've tried to write 10GB originally) it writes out whatever is above closest multiple of 4GB (in the case of 10GB - it writes what is above 8GB hence 2GB) then it gets count as something in multiple of 4GB units (say 8GB) and does

nbytes = (unsigned)(8*1024*1024*1024) =0

and tight loop forever.

I suspect that fread may have the same issue....
Monday, November 30, 2009 5:02 AM
grigdn
badidea
Posts: 2586
Joined: May 24, 2007 22:10
Location: The Netherlands

Re: Problem writing 8GB array to disk

Post by badidea »

I Did some tests on my linux laptop with 8 GB memory.

With the code below and freebasic 64-bit, I can write 9 GB (> 8 GB) at once.

With freebasic 32-bit:
1 GB: OK
2 GB: Fail, Put returns error
3 GB: Fail, Put returns error
4 GB: 0 byte file, no error
5 GB: 1 GB file, no error
6 GB: Fail, Put returns error
etc.
So internally freebasic 32-bit uses a long (-2 GB to +2 GB) for put size / file pointer it seems.

Code: Select all

const as ulongint KB = 1024
const as ulongint MB = KB * KB
const as ulongint GB = KB * MB

function writeData(fileName as string, pData as ubyte ptr, numBytes as ulongint) as integer
	dim as integer fileNum = freefile
	if open(fileName, for binary, as fileNum) <> 0 then
		return -1
	else
		if put(fileNum, , *pData, numBytes) <> 0 then
			close(fileNum)
			return -2
		end if 
		close(fileNum)
		return 0
	end if
end function

const as ulongint NUM_GB = 9
redim shared bigdata(NUM_GB-1, GB-1) as ubyte
dim as ulongint numBytes = NUM_GB * GB * sizeof(bigdata)
print "numBytes: "; numBytes

dim as integer result
print "Writing data ..."
result = writeData("bigdata.bin", @bigdata(0, 0), numBytes)
print "result: "; result
I am happy that # is not needed for the function versions of open, put and close.
Why we have to deference the data pointer in put is so weird (having programmed in C for years), but I guess for backward compatibility to older BASIC versions.
cbruce
Posts: 163
Joined: Sep 12, 2007 19:13
Location: Dallas, Texas

Re: Problem writing 8GB array to disk

Post by cbruce »

@fxm... I believe the @grigdn article I posted above explains why I can't write an exact 4GB array to a file.

An array that is a true 4GB in size will overflow the fwrite size parameter, setting it to zero and, therefore, not writing anything.

When I take my array down to (4GB - 2 bytes), I still have a valid fwrite buffer size parameter - so then the write works.

Does that look correct to you?
cbruce
Posts: 163
Joined: Sep 12, 2007 19:13
Location: Dallas, Texas

Re: Problem writing 8GB array to disk

Post by cbruce »

@counting_pine... I was incorrect earlier about being able to GET an 8GB array.

Yes... it did return a successful read.

No... it failed... the data in the receiving array was all zeros!
cbruce
Posts: 163
Joined: Sep 12, 2007 19:13
Location: Dallas, Texas

Re: Problem writing 8GB array to disk

Post by cbruce »

.
Just tried a bunch of other different sizes and offsets over 4GB... fwrite and fread are
totally screwed for any i/o using an exact 4GB variable or greater.

It looks like writing and reading files is limited to GETs and PUTs of less than 4GB at once!

It's CHUNK'ing TIME !!!

Thanks!
Bruce
marcov
Posts: 3455
Joined: Jun 16, 2005 9:45
Location: Netherlands
Contact:

Re: Problem writing 8GB array to disk

Post by marcov »

Afaik windows API has such limits, the common win32/64 write-block-to-file writefile also has "dword" as argument, iow 32-bit.

Under *nix, the core write routine is write(2), and that has a size_t as argument, which is implementation defined, 64-bit on Linux/64.

Of course the FreeBasic wrapper could do partial writes to even this difference out, but probably doesn't (yet)
jj2007
Posts: 2326
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: Problem writing 8GB array to disk

Post by jj2007 »

marcov wrote:Afaik windows API has such limits, the common win32/64 write-block-to-file writefile also has "dword" as argument, iow 32-bit.
Indeed. Interestingly enough, even WriteFileEx has that limit. There is an easy workaround, though: MapViewOfFile.
marcov
Posts: 3455
Joined: Jun 16, 2005 9:45
Location: Netherlands
Contact:

Re: Problem writing 8GB array to disk

Post by marcov »

jj2007 wrote:
marcov wrote:Afaik windows API has such limits, the common win32/64 write-block-to-file writefile also has "dword" as argument, iow 32-bit.
Indeed. Interestingly enough, even WriteFileEx has that limit. There is an easy workaround, though: MapViewOfFile.
Afaik the native WinNT I/O are completion ports. win32 is a compatibility layer on top.
dodicat
Posts: 7976
Joined: Jan 10, 2006 20:30
Location: Scotland

Re: Problem writing 8GB array to disk

Post by dodicat »

I cannot test (only 2 gb ram)
How do the crt functions fare?
( prime generator to test functionality)

Code: Select all

 

#include "crt.bi"


Sub save(filename As String,a() As Ulong)
    Dim As FILE Ptr f = fopen(filename, "wb")
    Var size=(Ubound(a)-Lbound(a)+1)*Sizeof(a)
    fwrite(@a(Lbound(a)), size,1, f)
    fclose(f)
End Sub

Sub load(filename As String,a() As Ulong)
    Dim As FILE Ptr f = fopen(filename, "rb")
    Var size=(Ubound(a)-Lbound(a)+1)*Sizeof(a)
    fread(@a(Lbound(a)), size,1, f)
    fclose(f)
End Sub

Sub sieve(array() As Ulong,limit As Integer)
    Dim flags(limit) As Integer
    Dim ct As Integer
    Redim array(1 To limit/2)
    For n As Integer = 2 To Sqr(limit)
        If flags(n) = 0 Then
            For k As Integer = n*n To limit Step n
                flags(k) = 1
            Next k
        End If
    Next n
    For n As Integer = 2 To limit
        If flags(n)=0 Then ct+=1:array(ct)=n
    Next n
    Redim Preserve array(1 To ct)
End Sub


Dim As Ulong limit=10000000

Redim  As Ulong s()
sieve(s(),limit)

For n As Long= Ubound(s)-50 To Ubound(s):Print s(n);" ";:Next
    Print
    Print
    save("test.dat",s())
    
    Redim As Ulong res(Lbound(s) To Ubound(s))
    load("test.dat",res())
    
    
    For n As Long= Ubound(res)-50 To Ubound(res):Print res(n);" ";:Next
        Print
        
        Sleep
         
Remember to delete the file afterwards.
MrSwiss
Posts: 3910
Joined: Jun 02, 2013 9:27
Location: Switzerland

Re: Problem writing 8GB array to disk

Post by MrSwiss »

@dodicat,

file is small: 2.53 MB only, otherwise, all okay. (FBC 64, 1.06.0, Win standalone)
counting_pine
Site Admin
Posts: 6323
Joined: Jul 05, 2005 17:32
Location: Manchester, Lancs

Re: Problem writing 8GB array to disk

Post by counting_pine »

I've drafted some drop-in replacement fread/fwrite functions for the rtlib:

Code: Select all

static __inline__ size_t fread_large( void *ptr, size_t size, size_t nmemb, FILE *stream )
{
	size_t total = 0, nread, nchunk, shift;

	if (size < 1 || nmemb < 1) return 0;

	/* read at least 1MB of items per chunk */
	for (shift = 0; (size << shift) < 1048576; shift++);
	nchunk = 1 << shift;

	while (nmemb > nchunk) {
		/* read chunk */
		nread = fread( ptr, size, nchunk, stream );
		total += nread;

		/* stop early if too few items read */
		if (nread < nchunk) {
			return total;
		}

		ptr += size * nchunk;
		nmemb -= nchunk;
	}

	if (nmemb > 0) {
		/* read last chunk */
		nread = fread( ptr, size, nmemb, stream );
		total += nread;
	}

	return total;
}

static __inline__ size_t fwrite_large( const void *ptr, size_t size, size_t nmemb, FILE *stream )
{
	size_t total = 0, nwritten, nchunk, shift;

	if (size < 1 || nmemb < 1) return 0;

	/* write at least 1MB of items per chunk */
	for (shift = 0; (size << shift) < 1048576; shift++);
	nchunk = 1 << shift;

	while (nmemb > nchunk) {
		/* write chunk */
		nwritten = fwrite( ptr, size, nchunk, stream );
		total += nwritten;

		/* stop early if too few items written */
		if (nwritten < nchunk) {
			return total;
		}

		ptr += size * nchunk;
		nmemb -= nchunk;
	}

	if (nmemb > 0) {
		/* write last chunk */
		nwritten = fwrite( ptr, size, nmemb, stream );
		total += nwritten;
	}

	return total;
}

#define fread fread_large
#define fwrite fwrite_large
The functions read/write a given number of elements (of size size), rather than bytes. So the effective chunk size is a multiple of size.
The functions choose a chunk size that gives at least 1MB of items per chunk, preferring powers of two for better alignment.
(For sane element sizes, it will be 1-2MB; for powers of 2, it will be exactly 1MB.)

Note: I've not really tested yet. So far it's in an "it compiles, let's ship it" state.

I'm not sure where is best to put them, but I'm currently thinking of inlining them into fb.h and either using #defines to effectively bulk-replace existing calls, or replace them manually on a case-by-case basis.
Post Reply