## Instr()

General FreeBASIC programming questions.
albert
Posts: 5676
Joined: Sep 28, 2006 2:41
Location: California, USA

### Instr()

@Coders

Is there a way to speed up the instr() function...

Building a 4 byte dictionary of a input of 10,000,000 bytes takes hours...

chrs = 10,000,000 bytes input
if instr( 1 , dict , ( next 4 chrs ) ) = 0 then dict+= next 4 chrs

Searching the dict , to try to find the 4 chrs takes over a half hour... I opened Linux "TOP" and killed it after 40 minutes.
caseih
Posts: 1504
Joined: Feb 26, 2007 5:32

### Re: Instr()

Might help if you describe what you are trying to do and sought for input on how to create an algorithm to do it effectively. Showing a complete, working example of your present code might help also.
albert
Posts: 5676
Joined: Sep 28, 2006 2:41
Location: California, USA

### Re: Instr()

@caseih

Try the below code , it takes forever.....

Code: Select all

`screen 19    dim as string str1    for a as longint = 0 to 10000000        str1+= chr( int( rnd * 256 )    next        dim as string dict = ""    dim as string n1    for a as longint = 1 to len( str1 ) step 4                n1 = mid( str1 , a , 4 )                if instr( 1 , dict , n1 ) = 0 then dict+= n1                if a mod 1000000 = 0 then print len( str1 ) - a            next    print len( dict )sleepend`
Munair
Posts: 886
Joined: Oct 19, 2017 15:00
Location: 't Zand, NL
Contact:

### Re: Instr()

It is not just the Instr() function. You use mid() and you create new strings like str1+= chr( int( rnd * 256 ) and dict+= n1. That isn't fast code by definition.

A faster way would be to use predefined string buffers to avoid string allocation with each iteration. And if you want to make it really fast, access string positions with pointers.
jj2007
Posts: 1464
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

### Re: Instr()

Munair is correct. You need to create the strings differently, like this:

Code: Select all

`    dim as string str1    str1=space(10000000)    for a as longint = 1 to 10000000        mid(str1, a, 1)= chr( int( rnd * 256 ))    next`
fxm
Posts: 9637
Joined: Apr 22, 2009 12:46
Location: Paris suburbs, FRANCE

### Re: Instr()

You are both right, and therefore rather for speed:

Code: Select all

`    dim as string str1    str1 = space(10000000)    for a as longint = 0 to 10000000-1        str1[a] = int( rnd * 256 )    next`
grindstone
Posts: 721
Joined: May 05, 2015 5:35
Location: Germany

### Re: Instr()

Double post, erased. (Sorry)
Last edited by grindstone on Mar 02, 2020 14:00, edited 1 time in total.
grindstone
Posts: 721
Joined: May 05, 2015 5:35
Location: Germany

### Re: Instr()

Double post, erased.
Last edited by grindstone on Mar 02, 2020 13:59, edited 1 time in total.
grindstone
Posts: 721
Joined: May 05, 2015 5:35
Location: Germany

### Re: Instr()

Just for pleasure, the snippet from above with pointer access:

Code: Select all

`Dim As String str1For a As LongInt = 0 To 10000000   str1+= Chr( Int( Rnd * 256 ))Next'str1 = "abcdefghijklmnopqrstuvwx"Dim As String dict = String(Len(str1) + 10, 0)Dim As String n1, n2Dim As Long Ptr lp = Cast(Long Ptr, StrPtr(str1))Dim As Long Ptr dp = Cast(Long Ptr, StrPtr(dict))Dim As Long b, c, dictlen? Len( str1 )?For a As LongInt = 1 To Len( str1 ) Step 4   n1 = Mid( str1 , a , 4 )   b = (a-1)/4   n2 = Mki(*(lp + b))      Dim As Long n1v = cvi(n1)   Do      For c = 0 To dictlen / 4         If *(dp + c) = n1v Then            Exit Do         EndIf      Next      c -= 1      *(dp + c) = n1v      dictlen += 4   Loop until -1      'If a Mod 1000000 = 0 Then Print Len( str1 ) - a   If a/4 Mod 10000 = 0 Then Print Len( str1 ) - aNext?'? dictPrint Len( dict )? dictlendict = Left(dict, dictlen)'? dict;"*"Print Len( dict )SleepEnd`

It doesn't seem to be much faster.
grindstone
Posts: 721
Joined: May 05, 2015 5:35
Location: Germany

### Re: Instr()

Double post, erased. (Sorry)
jimg
Posts: 24
Joined: Jan 16, 2020 19:43
Location: Oregon

### Re: Instr()

albert-

with your instr instruction, you are testing for any occurrence of a four byte string. Did you mean for the match to occur anywhere within the string, or only at four byte boundaries?

e.g.
if str1 looks like-

Code: Select all

`abcdefghijklmnopbcdefghi1   2   3   4   5   6    7   8   9   `

and you pick up the fifth group (bcde) as n1 to test, it matches the second character of str1.
Was that your intention, or was it only supposed to match if it started on a four character boundary?
If it was only on a four character boundary, the search time could be shortened quite a bit I think.
albert
Posts: 5676
Joined: Sep 28, 2006 2:41
Location: California, USA

### Re: Instr()

The dictionary is on a 4 byte boundary...

It steps through the input string by 4 bytes and tests the dictionary , to see if it's in the dictionary..
If it's not in the dictionary , then it adds the 4 bytes to the dictionary..

Dictionary searches , have to be on a 4 byte boundary.. location mod 4 = 1
So you can divide the dict location by 4 , output = dict( location \ 4 )

I'd like to somehow to determine if a dict location , occurs more than once in the input string....
So you can shorten the dictionary to just bytes that occur more than a certain number of times..( not sure?? )

Small lengths of input , would obviously expand...
But Gigabyte size inputs , might have several of a 4 byte set?? and might compress??
dodicat
Posts: 6497
Joined: Jan 10, 2006 20:30
Location: Scotland

### Re: Instr()

Here are a couple of functions Albert.

Code: Select all

`Function tally overload(somestring As String,partstring As String,arr() As Integer) As Integer    Dim As Integer i,j,ln,lnp,count,num    Dim As boolean filler=false    ln=Len(somestring)    lnp=Len(partstring)    start:    count=0    i=-1    Do        i+=1        If somestring[i] <> partstring Then continue do        If somestring[i] = partstring Then            For j=0 To lnp-1                 If somestring[j+i]<>partstring[j] Then continue do            Next j        End If        count+=1        If filler = true Then arr(count)=i+1         i=i+lnp-1    Loop Until i>=ln-1     If filler = false Then Redim arr(count) 'size is now known, repeat the operation to fil arr()    arr(0)=count                            '  save tally in (0)    num=count    If filler=true Then Goto _return    filler=true    Goto start    _return:    Return numEnd FunctionFunction tally overload(somestring As String,partstring As String) As Integer    Dim As Integer i,j,ln,lnp,count,num    ln=Len(somestring)    lnp=Len(partstring)    count=0    i=-1    Do        i+=1        If somestring[i] <> partstring Then continue do        If somestring[i] = partstring Then            For j=0 To lnp-1                 If somestring[j+i]<>partstring[j] then continue do            Next j        End If        count+=1        i=i+lnp-1    Loop Until i>=ln-1     Return countEnd Function' //=========== Trial =========== //Dim shared As String s:s="abababababab"For i As Long=1 To 20     s+=sNextprint "string length ";len(s)Redim As Integer arr()Dim As Double t1,t2print tally(s,"aba")," number of hits"print "now with array"t1=TimerVar num=tally(s,"aba",arr())t2=TimerPrint num," number of hits"Print t2-t1,"time taken"Printprint "positions of first 400 hits:"Dim As String commaFor i As Long=1 To 400  'the tally is also held in arr(0)    If i<arr(0) Then comma ="," Else comma =""    Print arr(i);comma;Next iPrintsleep  `
jimg
Posts: 24
Joined: Jan 16, 2020 19:43
Location: Oregon

### Re: Instr()

If I understood you correctly, then your instr is not the right thing to use. As I showed in my example, it will find a match at byte two, which is not on a 4-byte boundary. If everything is to be in units of four bytes, then it should be faster to do everything in four-byte integers (longs), rather than characters.
jimg
Posts: 24
Joined: Jan 16, 2020 19:43
Location: Oregon

### Re: Instr()

Using integers would look something like this. Still not terribly fast, but I don't see how you could run any faster using brute force code.
To get any more speed, you would have to do some kind of hashing. Perhaps make 256 dictionaries, where all words in a dictionary would start with the same letter, so you would have much less to search for duplicates.

Anyway, this works-

Code: Select all

`   Dim as string str1   Dim As LongInt a   Dim As Double tt,lt,st      Randomize 0  ' get same string for comparison while testing   str1 = space(10000000)   For a = 0 to 10000000-1     str1[a] = Int(Rnd*95+32)   ' get printable characters for testing, normally int( rnd * 256 )   Next   Mid(str1,5,4)="abcd"    ' ** for testing   Mid(str1,33,4)="abcd"   ' ** for testing, make fake duplicate to delete   Print "len(str1)=";Len(str1)   Print "str1="   Print Left(str1,80)   print   Dim As Long b = 0   Dim as string dict = ""   Dim as string n1   Dim As Long d = 1   Dim As ZString Ptr sptr,dptr   Dim As Long dcount   dict=Space(10000000)   sptr=StrPtr(str1)   dptr=StrPtr(dict)   'Print sptr,dptr      tt=Timer:lt=tt   dcount=0   n1=Left(str1,4)  ' always do first one   Mid(dict,1,4)=n1   sptr=sptr+4   dcount=1   For a = 1 to len(str1)/4      'n1 = mid( str1 , a , 4 )      Asm         mov eax,[sptr]  ' address of next 4byte item in str1 wanted         mov eax,[eax] 'pick up 4-byte string wanted         mov edx,[dptr]  ' start at beginning of Dict         mov ecx,[dcount]         lpstrt:            cmp eax,[edx]            je  dfound     ' found it, skip out and set up next one            add edx,4            dec ecx            jnz lpstrt  ' go try next one            mov [edx],eax  ' didn't find, save it in dict            mov ecx,[dcount]            inc ecx            mov [dcount],ecx         dfound:      End Asm      sptr=sptr+4            b=b+1      If b=20000 Then    ' take a peek at results every 20,000 items         lt=Timer-lt         If a<21000 Then  ' print out test val first time            Print "dict="            Print Left(dict,80)  ' test print            Print "check to see if second abcd is skipped in dict (col 32)"            print            'Sleep         EndIf         Print a;" dict count=";dcount;" t=";lt;" tt=";timer-tt         b=0         lt=Timer      EndIf   Next   Print "final count=";dcount   sleep   sleepEnd`