CSV to fixlength data files

General FreeBASIC programming questions.
srvaldez
Posts: 3373
Joined: Sep 25, 2005 21:54

Re: CSV to fixlength data files

Post by srvaldez »

badidea wrote:Maybe I should do the tests with a external USB mechanical disk.
I would be interested to see the timings.
jj2007
Posts: 2326
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: CSV to fixlength data files

Post by jj2007 »

badidea wrote:I think that the OS is fooling me (or a mistake). Loading the whole file in 1 binary get() takes 0.14 seconds. This is ~10 x faster then the SSD read speed
This is roughly what I also measure on my machine. The file is in the cache, i.e. in RAM, and that's probably faster than the SSD. But I am not an expert for this hardware stuff...
badidea
Posts: 2586
Joined: May 24, 2007 22:10
Location: The Netherlands

Re: CSV to fixlength data files

Post by badidea »

On linux, this seems to clear the file cache:

Code: Select all

sudo sh -c "sync; echo 1 > /proc/sys/vm/drop_caches"
Then I get for a binary read call: 0.59 seconds, which corresponds to 460 MB/s. This could very well be the max. SSD read speed.
Edit: Found the disk specs: Samsumg 850 evo 500GB, sequential read speeds up to 540 MB/s.
Additional code for this:

Code: Select all

function loadProductDB(inFileName as string, byref pData as ubyte ptr) as integer
	dim as integer i, inFile = freefile()
	dim as ulong fileSize
	Dim result As Integer = Open(inFileName For Binary, Access Read, As #inFile)
	fileSize = lof(inFile)
	pData = allocate(fileSize)
	if pData = 0 then return -1
	print pData 
	get #inFile, , *pData, fileSize 'note FB is weird, dereference pointer first
	Close #inFile
	return fileSize
end function

dim as double t, dt
dim as ubyte ptr pData
dim as ulong fileSize

sleep 1,1
t = timer
fileSize = loadProductDB("products2.csv", pData)
print fileSize; " bytes,"; int(fileSize / (1024*1024)); " MB"
if pData <> 0 then deallocate(pData) else print "free error"
dt = timer - t
print dt
print int((fileSize / (1024*1024)) / dt); " MB/s"
So now a routine to search in this big binary blob for the right barcode is needed...
Also found an external USB 2.0 160 GB disk. Its heavy. Lets's see if it still works...
Edit: Judging on it weight, I would expect a nuclear power plant inside it, but no, I had to go look for the power adapter. It still works...
Edit: 14 MB/s now, much better :-)
Last edited by badidea on Aug 12, 2018 21:52, edited 1 time in total.
jj2007
Posts: 2326
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: CSV to fixlength data files

Post by jj2007 »

badidea wrote:... which corresponds to 460 MB/s. This could very well be the max. SSD read speed.
I googled a bit and found this here:
Even if you're already rocking a fast SSD (one of the best upgrades you can make), you can still improve your computer's performance by adding more memory and turning it into a RAM disk, which can be as much as 70 times faster than a regular hard drive or 20 times faster than an SSD.
caseih
Posts: 2157
Joined: Feb 26, 2007 5:32

Re: CSV to fixlength data files

Post by caseih »

The data has to get into the RAM disk from the slower disk, and get back to the disk at the end. So it works out that just loading the database into memory by your program, and saving it back out at the end, is going to be the exact same speed as a RAM disk. If the database fits in a RAM disk, it would certainly fit into memory. On the other hand if you had several programs or processes that wanted to use the database files, a RAM disk could be a fast way to do that. Although on modern OS's, any files shared between processes are cached and memory mapped anyway, so the effect is the same.
jj2007
Posts: 2326
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: CSV to fixlength data files

Post by jj2007 »

Gablea wrote:I have a data file that has over 7,000 lines in it and it is extreamly slow at reading data from the product.dat file (this is a csv file)
Btw what is "extremely slow"? How many seconds, typically?
badidea
Posts: 2586
Joined: May 24, 2007 22:10
Location: The Netherlands

Re: CSV to fixlength data files

Post by badidea »

External USB disk results (3M records, cache cleared before each run), a bit puzzling:

fbc64 csv.bas
* binary get: 18.5 sec
* input find: 18.1 sec

fbc32 -w all -exx csv.bas
* binary get: 18.3 sec
* input find: 31.0 sec

My conclusion from this is: Follow MrSwiss' advise, buy SSDs? Install a modern OS? At least under Linux, no speed gain when reading the whole file at once. Probably the OS is smart enough to read more data then what is needed for one "Input #ProductFileNumber, etc." call. Not sure how FreeDOS handles this.

Ref, without clearing cache:
fbc64 csv.bas
* binary get: 0.17 sec (1560 MB/s)
* input find: 11.7 sec

If the OS does not cache, then do it yourself. Read all data at start or after a button press. When adding/updating an item, add to cache (memory) and file. But a problem if multiple clients need the same data. Keep track of the changes then somehow.
Gablea
Posts: 1104
Joined: Apr 06, 2010 0:05
Location: Northampton, United Kingdom
Contact:

Re: CSV to fixlength data files

Post by Gablea »

jj2007 wrote:
Gablea wrote:I have a data file that has over 7,000 lines in it and it is extreamly slow at reading data from the product.dat file (this is a csv file)
Btw what is "extremely slow"? How many seconds, typically?
So far it has been up to a minute to "look" for a item but then errors out and says Item not found BUT when i search for it manually inside the actual data file i can find it.

This is the updated code that i am using

Code: Select all

Public Sub FindProductinDatabase(ByVal BarcodeNumber As String)

	If RecipitClear = 1 Then
		CreateRecipitHeadder
		CreateRecipitFooter
		'ResetPoSdataForNewSale
	End If
 
	Dim ProductFound 										As Integer = 0
	
	CloseAllFiles
	
	
	Open PathToProductDatabase For Input As #ProductFileNumber
		Do until EOF(ProductFileNumber)
			Input #ProductFileNumber, Product_barcodenumber, Product_posdescription, Product_salelocation, Product_agerestricted, Product_agelimit, Product_pricetype, Product_retailprice, Product_vatcode, Product_print_guarantee_message, Product_print_guarantee_code, Product_DisplayMessage, Product_DisplayMessage_code, Product_sendtoppr, Product_requestserial, Product_ItemNotAllowed, Product_ItemNotAllowedReason, Product_RestrictProductQty, Product_RestrictProductAllowed, Product_DiscountNotAllowed, Product_RefundNotAllowed, Product_AskQtyBeforeSelling, Product_HelhtlyStartVoucherOK

			If Trim(BarcodeNumber) = Trim(Product_barcodenumber) Then
				ProductFound = 1
				CloseAllFiles
				Exit Do
			End If		
		Loop
		
		CloseAllFiles
	
		Select Case ProductFound
			Case 0 ' Nothing Found
				CloseAllFiles
				ItemNotFound(BarcodeNumber)
			
			Case 1  'Item was found in data file
				Select Case Product_ItemNotAllowed
					Case 1 'Item is NOT allowed to be sold
						ItemNotallowedScreen
				
					Case 0 'Item is allowed to be sold
						CloseAllFiles
    					Dim LocalProduct As String = Product_posdescription
		  
						Product_posdescription = strReplace(LocalProduct, "''", """")
						  	 SaleLocationNumber = Trim(Product_salelocation)
										 PriceType = Trim(Product_pricetype)
										PriceCheck = Val(Trim(Product_retailprice))
									   PriceCheck =(PriceCheck * 100)								
						
						Select Case Product_agerestricted
							Case 0
								'PriceTypeCheck													'Normal Item
									
							Case 1
								'AgeLimitDisplay(Val(Trim(Product_agelimit)))			'Age limited Item		
						End Select
					Exit Sub
				End Select
		End Select
	Exit Sub
End Sub
as you can see for now I have comment out some of the functions all of the Fields starting with Product_ have been declared at the start of the module so they can be access all though it (this module is called Database.bi

I would like to find a solution for this as my NPoS application ( one that is designed to run on Linux and Windows) also uses csv as it data structure so i would like to find a better solution (I am looking into direct MySQL support for that version runs on Linux so it can access the MySQL Server)
grindstone
Posts: 862
Joined: May 05, 2015 5:35
Location: Germany

Re: CSV to fixlength data files

Post by grindstone »

Gablea wrote:So far it has been up to a minute to "look" for a item but then errors out and says Item not found BUT when i search for it manually inside the actual data file i can find it.
Sounds rather like a bug than like a bottleneck.
caseih
Posts: 2157
Joined: Feb 26, 2007 5:32

Re: CSV to fixlength data files

Post by caseih »

7000 records is very small. Even if you opened the file and read it in one record at a time each time you needed to query it, it would still be a fraction of a second (disk caching would essentially make that an in-memory operation). So something isn't quite right with your algorithm.
jj2007
Posts: 2326
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: CSV to fixlength data files

Post by jj2007 »

One minute for that tiny file is roughly a factor 2000 too slow. Rewrite this loop completely:

Code: Select all

      Do until EOF(ProductFileNumber)
         Input #ProductFileNumber, Product_....
         If Trim(BarcodeNumber) = Trim(Product_barcodenumber) Then
            ProductFound = 1
            CloseAllFiles
            Exit Do
         End If      
      Loop
Do the following:
- read one full line at a time
- check if the string contains the barcodenumber
- then parse the string to get the other items

That will work in under a second. If the parsing looks too complicated,
- open the file in binary mode
- store the position before the Line Input #1, string
- if the line contains the barcode, seek #1 oldposition, then do the Input #ProductFileNumber, Product_.... etc
Gablea
Posts: 1104
Joined: Apr 06, 2010 0:05
Location: Northampton, United Kingdom
Contact:

Re: CSV to fixlength data files

Post by Gablea »

I though it was slow but you see the code is based on the code I’m using in my NPoS app (first code posted here) and that does it min milliseconds in the same data file.

I’m using the same basic code and NPoS finds products so way would a simpler version of it not work? Unless I’ve done something really stupid.
jj2007
Posts: 2326
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: CSV to fixlength data files

Post by jj2007 »

Reading a bunch of parameters for every line is indeed very slow, that's why a proposed the other solution above. It would be good to analyse where your code is slow but it throws so many build errors that my enthusiasm to dig deeper is very low.
Gablea
Posts: 1104
Joined: Apr 06, 2010 0:05
Location: Northampton, United Kingdom
Contact:

Re: CSV to fixlength data files

Post by Gablea »

If you like kJ I can upload the source code to mysever and you can view the whole project (I can upload both the NPoS and KPoS with data files so you can “download” them)


I appreciate all the help you all are offering me
jj2007
Posts: 2326
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: CSV to fixlength data files

Post by jj2007 »

Gablea wrote:If you like kJ I can upload the source code
I have tested the "updated source code" above, the problem is it throws many errors:

Code: Select all

... using \AllBasics\FreeBasic\fbc.exe -t 8000 -s console "\AllBasics\FreeBasic\tmp\TmpFile.bas"
____________________
\AllBasics\FreeBasic\tmp\TmpFile.bas(3) error 41: Variable not declared, RecipitClear
\AllBasics\FreeBasic\tmp\TmpFile.bas(3) error 3: Expected End-of-Line, found 'RecipitClear'
\AllBasics\FreeBasic\tmp\TmpFile.bas(4) error 41: Variable not declared, CreateRecipitHeadder
\AllBasics\FreeBasic\tmp\TmpFile.bas(5) error 41: Variable not declared, CreateRecipitFooter
\AllBasics\FreeBasic\tmp\TmpFile.bas(7) error 124: Expected 'END SUB', found 'End'
\AllBasics\FreeBasic\tmp\TmpFile.bas(11) error 41: Variable not declared, CloseAllFiles
\AllBasics\FreeBasic\tmp\TmpFile.bas(14) error 41: Variable not declared, PathToProductDatabase
\AllBasics\FreeBasic\tmp\TmpFile.bas(15) error 41: Variable not declared, ProductFileNumber
\AllBasics\FreeBasic\tmp\TmpFile.bas(16) error 9: Expected expression, found 'ProductFileNumber'
\AllBasics\FreeBasic\tmp\TmpFile.bas(18) error 9: Expected expression, found 'Product_barcodenumber'
\AllBasics\FreeBasic\tmp\TmpFile.bas(18) error 132: Too many errors, exiting
Post Reply