Change string to ordered index ?

General FreeBASIC programming questions.
ppf
Posts: 88
Joined: Oct 10, 2017 6:41

Change string to ordered index ?

Post by ppf »

Hi,

for saving disk space it interest me, if exist here some generic solution/formula for conversion string (e.g. string*18) to integer value (index - position) in (incoming ) order 1 to N.
Say, something produce string*18.
Then happens:
a)if string is empty, null - do nothing
b)if its known already, - only increment its occurency
c)compute its ordered index & change it to UDT of 3 members (string*18 + its incoming ordered index + occrurency) accordingly

Thank you for any hints
MrSwiss
Posts: 3910
Joined: Jun 02, 2013 9:27
Location: Switzerland

Re: Change string to ordered index ?

Post by MrSwiss »

Sorry, but I fail to see, how this is supposed to save any Disk/SSD resources.

Please explain the basic idea, behind the scenes. What do you want to "get done"?

Epecially the Type (UDT), doesn't make any sense (in the given context).
(if I'm correctly guessing: "just save a string once")
badidea
Posts: 2591
Joined: May 24, 2007 22:10
Location: The Netherlands

Re: Change string to ordered index ?

Post by badidea »

Something like this?

Code: Select all

type data_type
	dim as string text
	dim as integer count
	declare operator cast () as string
end type

operator data_type.cast () as string
  return str(count) + " x " + text
end operator

'-------------------------------------------------------------------------------

type simple_list
	private:
	dim as data_type myData(any)
	public:
	declare function update(newText as string) as integer
	declare sub printAll()
end type

function simple_list.update(newText as string) as integer
	if newText = "" then return -1
	dim as integer index = -1
	for i as integer = 0 to ubound(myData)
		if myData(i).text = newText then
			index = i
			exit for
		end if
	next
	if index >= 0 then 'found
		myData(index).count += 1
	else 'not listed, add
		dim as integer ub = ubound(myData) + 1
		redim preserve myData(ub) 'increase array size
		myData(ub).text = newText
		myData(ub).count = 1
	end if
	return 0
end function

'loop all and print
sub simple_list.printAll()
	for i as integer = 0 to ubound(myData)
		print i, myData(i)
	next
end sub

'-------------------------------------------------------------------------------

dim as simple_list list

list.update("test123")
list.update("test123")
list.update("ABC")
list.update("test123")
list.update("ABC")
list.update("EDF")
list.update("test123")
list.printAll()

Note: For large data sets 'redim preserve' can slow things down. Other list or tree types could be considered.
ppf
Posts: 88
Joined: Oct 10, 2017 6:41

Re: Change string to ordered index ?

Post by ppf »

Hi MrSwiss
due memory troubles on 32 bit distro I must reduce amount of data saved to disk/ramdisk.
Complete range of generated data is 1 to 17M.Here I know formula to get index & compute all needed things with pleasure.

I got 3M for now, impossible to handel with.
Moreover, mostly it's garbage, empty, null strings.
Amount of obtained valuable strings is 100k.So I need to know/recode index in new range 1 -100k,
Spared data looses 2 parameters, for now unimportant.We'll see..
ppf
Posts: 88
Joined: Oct 10, 2017 6:41

Re: Change string to ordered index ?

Post by ppf »

Hi badidea

very helpful code, looks closer to what I am exactly looking for, thank you very much !
badidea
Posts: 2591
Joined: May 24, 2007 22:10
Location: The Netherlands

Re: Change string to ordered index ?

Post by badidea »

What do you mean with 17M? 17 Megabyte? That is close to nothing.
ppf
Posts: 88
Joined: Oct 10, 2017 6:41

Re: Change string to ordered index ?

Post by ppf »

Sure.But next associated things, e.g. .as online finalizing sorts of sheet 17M rows x 50 cols and store that to (temporary) archive files is harder.
I would like to see it working on 64bit, what is a computing speed there and possible way of flow design.
Lost Zergling
Posts: 538
Joined: Dec 02, 2011 22:51
Location: France

Re: Change string to ordered index ?

Post by Lost Zergling »

Hello ppf. Consider this one viewtopic.php?f=8&t=26533 (adapted for larger data sets, 5000 entries+)
One exemple here viewtopic.php?f=2&t=27568&p=261028#p261028 Some few documentation here : viewtopic.php?f=9&t=26551 Welcome trying it. Have fun.
MrSwiss
Posts: 3910
Joined: Jun 02, 2013 9:27
Location: Switzerland

Re: Change string to ordered index ?

Post by MrSwiss »

Hi ppf,

using a UDT (Type) for writing/reading to/from file, its important to NOT use
the data-type 'Integer' inside the UDT, because it doesn't have a fixed size.
For your specified sizes, I'd opt for a ULong (unsigned 32 bit).
This makes the data-files compatible, with the .exe, whether its 32/64 bits
compiled (NOT so with Integer, differing size!).
Lost Zergling
Posts: 538
Joined: Dec 02, 2011 22:51
Location: France

Re: Change string to ordered index ?

Post by Lost Zergling »

@MrSwiss : you mean for binary files ? (or non ascii)
MrSwiss
Posts: 3910
Joined: Jun 02, 2013 9:27
Location: Switzerland

Re: Change string to ordered index ?

Post by MrSwiss »

Look at the UDT as a 'record', which must always work, with the same number of bytes.
Aka: The UDT's size (in bytes) remains equal, independent of the compilers bitness.
(from here onwards, you can surely work it out, on your own)
Lost Zergling
Posts: 538
Joined: Dec 02, 2011 22:51
Location: France

Re: Change string to ordered index ?

Post by Lost Zergling »

Ok, got it. Tool originally designed to work with strings, not UDT as records and Integer were almost designed for internal countdown (or pointers casting,.., afterward) better than for datas. Indeed ,you're right till you consider udt as 'records', integer in types should be turned to uLong, (if so check for no -1 value).
ps Addendum : a somewhat inconsistent tool in that it is supposed to be for beginners and also more advanced users (but beefy programmers may prefer to do without it or to adapt it). Thus, a beta version, so we can still expect some bugs in 64 bits especially if we try to push the tool to the limits. But until the next delivery, it seems to meet many expectations.
MrSwiss
Posts: 3910
Joined: Jun 02, 2013 9:27
Location: Switzerland

Re: Change string to ordered index ?

Post by MrSwiss »

Well, I don't agree with your conclusions (so far). The reason is simple:

One cannot expect a Tool to evolve, by adding more new functionality,
without also accepting the contras, like: more complexity in handling it.

There is almost always, a certain amount of tradeoffs, to live with.
(Btw. you can expect the very same bug's, in the 32 bit version!)
Lost Zergling
Posts: 538
Joined: Dec 02, 2011 22:51
Location: France

Re: Change string to ordered index ?

Post by Lost Zergling »

Almost all the features added so far were thought of at the design stage, which explains their integration. Neither the manipulation nor the complexity of the basic instruction set has been impacted. On the other hand, the more advanced functionalities add a complexity of use which is related to the increase of the functional possibilities. Exception handling is excluded from parser, the only possible place left is the execution context. Exceptions are handled around the parser(s). The addition of functionalities impacts the complexity of the context because the kinematics of use of the instruction set is multiplied. Bugsfix must sometimes be thought of in a global way. If you precisely identify new bugs in the tool, I'm listening for them quickly or in the next version or take note of it depending on the severity. Users are judges of technical tradeoffs. This tool should be evaluated according to the use it brings, what users want it to do, and not only on the beauty of the code. Nevertheless I understand your point of view event thought I do not share.
dodicat
Posts: 7983
Joined: Jan 10, 2006 20:30
Location: Scotland

Re: Change string to ordered index ?

Post by dodicat »

Regarding saving a udt to a file.
If anybody can save and reload an array of udts with a fixed length string field (As perhaps required in this thread), then please demonstrate.
(similar to a previous topic)
I have tried every which way, but the data back from from the file is not correct.
So I convert a udt with a fixed length string to a udt of ubyte array.
This way the data is recovered.

Code: Select all

 


#include "file.bi"

type stringtype  'cannot save and reoad this type efficiently from disk
    as string * 18  value
    as long index
    as long occurrency
    declare operator cast() as string
end type

type arraytype   'this can be saved and reloaded from disk.
    as ubyte value(1 to 18)
    as long index
    as long occurrency
end type

operator stringtype.cast() as string 'print out the stringtype results
print "'";value;"'"
print index
print occurrency
return ""
end operator

sub convert overload(a() as arraytype,s() as stringtype) 'arraytype to stringtype
    for n as long=lbound(a) to ubound(a)
   for m as long=1 to 18: s(n).value+= chr(a(n).value(m)):next
   s(n).index=a(n).index
   s(n).occurrency=a(n).occurrency
   next
end sub

sub convert overload(s() as stringtype,a() as arraytype) 'stringtype to arraytype
    for n as long=lbound(a) to ubound(a)
   for m as long=0 to 17: a(n).value(m+1)= (s(n).value[m]):next
   a(n).index=s(n).index
   a(n).occurrency=s(n).occurrency
   next
end sub

sub loadfile(file as string,b() as arraytype)
   If FileExists(file)=0 Then Print file;" not found":Sleep:end
   var  f=freefile
    Open file For Binary Access Read As #f
    If Lof(f) > 0 Then
      Get #f, , b()
    End If
    Close #f
end sub

Sub savefile(filename As String,p() As arraytype)
    Dim As Integer n
    n=Freefile
    If Open (filename For Binary Access Write As #n)=0 Then
        Put #n,,p()
        Close
    Else
        Print "Unable to load " + filename
    End If
End Sub

dim as string z="AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz"

  #define range(f,l) Int(Rnd*((l+1)-(f))+(f))

dim as stringtype st(1 to 10)  'or 1 to whatever
dim as arraytype at(1 to ubound(st))
'create some instances of stringtype
for n as long=1 to ubound(st)
    with st(n)
        .value=mid(z,range(1,(52-18)),18)
        .index=n
        .occurrency=range(5,15)
    end with
next n
print "Save this data to file:"
print
for n as long=lbound(st) to ubound(st) 'show them
    print st(n)
    next
print "____________________________"
print

convert(st(),at())'convert internal string data to ubyte array
savefile("text.txt",at()) 'must save arraytype

erase st,at
'all saved to drive, erase all arrays
'===================================================


'reload from drive

var lngth=filelen("text.txt")\sizeof(arraytype) 'get incoming array dimension
dim as arraytype x(1 to lngth)
dim as stringtype y(1 to lngth)
loadfile("text.txt",x()) 'must load to arraytype
convert(x(),y())  'convert to stringtype
print "Returned data:"
print
for n as long=lbound(st) to ubound(st) 'show them
    print y(n)
    next
sleep
 
Post Reply