CSV to fixlength data files

General FreeBASIC programming questions.
srvaldez
Posts: 1764
Joined: Sep 25, 2005 21:54

Re: CSV to fixlength data files

Postby srvaldez » Aug 12, 2018 15:22

badidea wrote:Maybe I should do the tests with a external USB mechanical disk.

I would be interested to see the timings.
jj2007
Posts: 945
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: CSV to fixlength data files

Postby jj2007 » Aug 12, 2018 19:14

badidea wrote:I think that the OS is fooling me (or a mistake). Loading the whole file in 1 binary get() takes 0.14 seconds. This is ~10 x faster then the SSD read speed
This is roughly what I also measure on my machine. The file is in the cache, i.e. in RAM, and that's probably faster than the SSD. But I am not an expert for this hardware stuff...
badidea
Posts: 1061
Joined: May 24, 2007 22:10
Location: The Netherlands

Re: CSV to fixlength data files

Postby badidea » Aug 12, 2018 20:59

On linux, this seems to clear the file cache:

Code: Select all

sudo sh -c "sync; echo 1 > /proc/sys/vm/drop_caches"

Then I get for a binary read call: 0.59 seconds, which corresponds to 460 MB/s. This could very well be the max. SSD read speed.
Edit: Found the disk specs: Samsumg 850 evo 500GB, sequential read speeds up to 540 MB/s.
Additional code for this:

Code: Select all

function loadProductDB(inFileName as string, byref pData as ubyte ptr) as integer
   dim as integer i, inFile = freefile()
   dim as ulong fileSize
   Dim result As Integer = Open(inFileName For Binary, Access Read, As #inFile)
   fileSize = lof(inFile)
   pData = allocate(fileSize)
   if pData = 0 then return -1
   print pData
   get #inFile, , *pData, fileSize 'note FB is weird, dereference pointer first
   Close #inFile
   return fileSize
end function

dim as double t, dt
dim as ubyte ptr pData
dim as ulong fileSize

sleep 1,1
t = timer
fileSize = loadProductDB("products2.csv", pData)
print fileSize; " bytes,"; int(fileSize / (1024*1024)); " MB"
if pData <> 0 then deallocate(pData) else print "free error"
dt = timer - t
print dt
print int((fileSize / (1024*1024)) / dt); " MB/s"

So now a routine to search in this big binary blob for the right barcode is needed...
Also found an external USB 2.0 160 GB disk. Its heavy. Lets's see if it still works...
Edit: Judging on it weight, I would expect a nuclear power plant inside it, but no, I had to go look for the power adapter. It still works...
Edit: 14 MB/s now, much better :-)
Last edited by badidea on Aug 12, 2018 21:52, edited 1 time in total.
jj2007
Posts: 945
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: CSV to fixlength data files

Postby jj2007 » Aug 12, 2018 21:40

badidea wrote:... which corresponds to 460 MB/s. This could very well be the max. SSD read speed.
I googled a bit and found this here:
Even if you're already rocking a fast SSD (one of the best upgrades you can make), you can still improve your computer's performance by adding more memory and turning it into a RAM disk, which can be as much as 70 times faster than a regular hard drive or 20 times faster than an SSD.
caseih
Posts: 1302
Joined: Feb 26, 2007 5:32

Re: CSV to fixlength data files

Postby caseih » Aug 12, 2018 21:57

The data has to get into the RAM disk from the slower disk, and get back to the disk at the end. So it works out that just loading the database into memory by your program, and saving it back out at the end, is going to be the exact same speed as a RAM disk. If the database fits in a RAM disk, it would certainly fit into memory. On the other hand if you had several programs or processes that wanted to use the database files, a RAM disk could be a fast way to do that. Although on modern OS's, any files shared between processes are cached and memory mapped anyway, so the effect is the same.
jj2007
Posts: 945
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: CSV to fixlength data files

Postby jj2007 » Aug 12, 2018 22:04

Gablea wrote:I have a data file that has over 7,000 lines in it and it is extreamly slow at reading data from the product.dat file (this is a csv file)
Btw what is "extremely slow"? How many seconds, typically?
badidea
Posts: 1061
Joined: May 24, 2007 22:10
Location: The Netherlands

Re: CSV to fixlength data files

Postby badidea » Aug 12, 2018 22:11

External USB disk results (3M records, cache cleared before each run), a bit puzzling:

fbc64 csv.bas
* binary get: 18.5 sec
* input find: 18.1 sec

fbc32 -w all -exx csv.bas
* binary get: 18.3 sec
* input find: 31.0 sec

My conclusion from this is: Follow MrSwiss' advise, buy SSDs? Install a modern OS? At least under Linux, no speed gain when reading the whole file at once. Probably the OS is smart enough to read more data then what is needed for one "Input #ProductFileNumber, etc." call. Not sure how FreeDOS handles this.

Ref, without clearing cache:
fbc64 csv.bas
* binary get: 0.17 sec (1560 MB/s)
* input find: 11.7 sec

If the OS does not cache, then do it yourself. Read all data at start or after a button press. When adding/updating an item, add to cache (memory) and file. But a problem if multiple clients need the same data. Keep track of the changes then somehow.
Gablea
Posts: 1017
Joined: Apr 06, 2010 0:05
Location: Northampton, United Kingdom
Contact:

Re: CSV to fixlength data files

Postby Gablea » Aug 12, 2018 23:28

jj2007 wrote:
Gablea wrote:I have a data file that has over 7,000 lines in it and it is extreamly slow at reading data from the product.dat file (this is a csv file)
Btw what is "extremely slow"? How many seconds, typically?


So far it has been up to a minute to "look" for a item but then errors out and says Item not found BUT when i search for it manually inside the actual data file i can find it.

This is the updated code that i am using

Code: Select all

Public Sub FindProductinDatabase(ByVal BarcodeNumber As String)

   If RecipitClear = 1 Then
      CreateRecipitHeadder
      CreateRecipitFooter
      'ResetPoSdataForNewSale
   End If
 
   Dim ProductFound                               As Integer = 0
   
   CloseAllFiles
   
   
   Open PathToProductDatabase For Input As #ProductFileNumber
      Do until EOF(ProductFileNumber)
         Input #ProductFileNumber, Product_barcodenumber, Product_posdescription, Product_salelocation, Product_agerestricted, Product_agelimit, Product_pricetype, Product_retailprice, Product_vatcode, Product_print_guarantee_message, Product_print_guarantee_code, Product_DisplayMessage, Product_DisplayMessage_code, Product_sendtoppr, Product_requestserial, Product_ItemNotAllowed, Product_ItemNotAllowedReason, Product_RestrictProductQty, Product_RestrictProductAllowed, Product_DiscountNotAllowed, Product_RefundNotAllowed, Product_AskQtyBeforeSelling, Product_HelhtlyStartVoucherOK

         If Trim(BarcodeNumber) = Trim(Product_barcodenumber) Then
            ProductFound = 1
            CloseAllFiles
            Exit Do
         End If      
      Loop
      
      CloseAllFiles
   
      Select Case ProductFound
         Case 0 ' Nothing Found
            CloseAllFiles
            ItemNotFound(BarcodeNumber)
         
         Case 1  'Item was found in data file
            Select Case Product_ItemNotAllowed
               Case 1 'Item is NOT allowed to be sold
                  ItemNotallowedScreen
            
               Case 0 'Item is allowed to be sold
                  CloseAllFiles
                   Dim LocalProduct As String = Product_posdescription
       
                  Product_posdescription = strReplace(LocalProduct, "''", """")
                        SaleLocationNumber = Trim(Product_salelocation)
                               PriceType = Trim(Product_pricetype)
                              PriceCheck = Val(Trim(Product_retailprice))
                              PriceCheck =(PriceCheck * 100)                        
                  
                  Select Case Product_agerestricted
                     Case 0
                        'PriceTypeCheck                                       'Normal Item
                           
                     Case 1
                        'AgeLimitDisplay(Val(Trim(Product_agelimit)))         'Age limited Item      
                  End Select
               Exit Sub
            End Select
      End Select
   Exit Sub
End Sub


as you can see for now I have comment out some of the functions all of the Fields starting with Product_ have been declared at the start of the module so they can be access all though it (this module is called Database.bi

I would like to find a solution for this as my NPoS application ( one that is designed to run on Linux and Windows) also uses csv as it data structure so i would like to find a better solution (I am looking into direct MySQL support for that version runs on Linux so it can access the MySQL Server)
grindstone
Posts: 599
Joined: May 05, 2015 5:35
Location: Germany

Re: CSV to fixlength data files

Postby grindstone » Aug 12, 2018 23:46

Gablea wrote:So far it has been up to a minute to "look" for a item but then errors out and says Item not found BUT when i search for it manually inside the actual data file i can find it.
Sounds rather like a bug than like a bottleneck.
caseih
Posts: 1302
Joined: Feb 26, 2007 5:32

Re: CSV to fixlength data files

Postby caseih » Aug 13, 2018 2:59

7000 records is very small. Even if you opened the file and read it in one record at a time each time you needed to query it, it would still be a fraction of a second (disk caching would essentially make that an in-memory operation). So something isn't quite right with your algorithm.
jj2007
Posts: 945
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: CSV to fixlength data files

Postby jj2007 » Aug 13, 2018 6:40

One minute for that tiny file is roughly a factor 2000 too slow. Rewrite this loop completely:

Code: Select all

      Do until EOF(ProductFileNumber)
         Input #ProductFileNumber, Product_....
         If Trim(BarcodeNumber) = Trim(Product_barcodenumber) Then
            ProductFound = 1
            CloseAllFiles
            Exit Do
         End If     
      Loop

Do the following:
- read one full line at a time
- check if the string contains the barcodenumber
- then parse the string to get the other items

That will work in under a second. If the parsing looks too complicated,
- open the file in binary mode
- store the position before the Line Input #1, string
- if the line contains the barcode, seek #1 oldposition, then do the Input #ProductFileNumber, Product_.... etc
Gablea
Posts: 1017
Joined: Apr 06, 2010 0:05
Location: Northampton, United Kingdom
Contact:

Re: CSV to fixlength data files

Postby Gablea » Aug 13, 2018 7:38

I though it was slow but you see the code is based on the code I’m using in my NPoS app (first code posted here) and that does it min milliseconds in the same data file.

I’m using the same basic code and NPoS finds products so way would a simpler version of it not work? Unless I’ve done something really stupid.
jj2007
Posts: 945
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: CSV to fixlength data files

Postby jj2007 » Aug 13, 2018 8:07

Reading a bunch of parameters for every line is indeed very slow, that's why a proposed the other solution above. It would be good to analyse where your code is slow but it throws so many build errors that my enthusiasm to dig deeper is very low.
Gablea
Posts: 1017
Joined: Apr 06, 2010 0:05
Location: Northampton, United Kingdom
Contact:

Re: CSV to fixlength data files

Postby Gablea » Aug 13, 2018 10:24

If you like kJ I can upload the source code to mysever and you can view the whole project (I can upload both the NPoS and KPoS with data files so you can “download” them)


I appreciate all the help you all are offering me
jj2007
Posts: 945
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: CSV to fixlength data files

Postby jj2007 » Aug 13, 2018 11:40

Gablea wrote:If you like kJ I can upload the source code
I have tested the "updated source code" above, the problem is it throws many errors:

Code: Select all

... using \AllBasics\FreeBasic\fbc.exe -t 8000 -s console "\AllBasics\FreeBasic\tmp\TmpFile.bas"
____________________
\AllBasics\FreeBasic\tmp\TmpFile.bas(3) error 41: Variable not declared, RecipitClear
\AllBasics\FreeBasic\tmp\TmpFile.bas(3) error 3: Expected End-of-Line, found 'RecipitClear'
\AllBasics\FreeBasic\tmp\TmpFile.bas(4) error 41: Variable not declared, CreateRecipitHeadder
\AllBasics\FreeBasic\tmp\TmpFile.bas(5) error 41: Variable not declared, CreateRecipitFooter
\AllBasics\FreeBasic\tmp\TmpFile.bas(7) error 124: Expected 'END SUB', found 'End'
\AllBasics\FreeBasic\tmp\TmpFile.bas(11) error 41: Variable not declared, CloseAllFiles
\AllBasics\FreeBasic\tmp\TmpFile.bas(14) error 41: Variable not declared, PathToProductDatabase
\AllBasics\FreeBasic\tmp\TmpFile.bas(15) error 41: Variable not declared, ProductFileNumber
\AllBasics\FreeBasic\tmp\TmpFile.bas(16) error 9: Expected expression, found 'ProductFileNumber'
\AllBasics\FreeBasic\tmp\TmpFile.bas(18) error 9: Expected expression, found 'Product_barcodenumber'
\AllBasics\FreeBasic\tmp\TmpFile.bas(18) error 132: Too many errors, exiting

Return to “General”

Who is online

Users browsing this forum: Baidu [Spider] and 0 guests