Search binary bytes in the Big arrays

General FreeBASIC programming questions.
VANYA
Posts: 1370
Joined: Oct 24, 2010 15:16
Location: Ярославль
Contact:

Search binary bytes in the Big arrays

Postby VANYA » Jul 08, 2015 4:37

Hi All!

I need a example of a quick search in a huge array of bytes, having any of the characters (even 0). I found one example, but it is very slow. Can anyone have a quick solution?

Code: Select all

'/* Program for Bad Character Heuristic of Boyer Moore String Matching Algorithm */

#include "crt.bi"

#Define NO_OF_CHARS 256

'// A utility function to get maximum of two integers
function max (a As integer, b As integer) As Integer
   return IIf(a > b, a, b)
End Function

'// The preprocessing function for Boyer Moore's bad character heuristic
sub badCharHeuristic( str_ As Ubyte Ptr , size As integer, badchar() As Integer)

   Dim As Integer i

   '// Initialize all occurrences as -1
   for i = 0 To NO_OF_CHARS-1
      badchar(i) = -1
   Next


   '// Fill the actual value of last occurrence of a character
   for i = 0 To size-1
      badchar(str_[i]) = i
   Next

End Sub


'/* A pattern searching function that uses Bad Character Heuristic of
'   Boyer Moore Algorithm */
function search( txt As UByte ptr,txtLen As integer , pat As Ubyte Ptr, patLen As integer) As Integer

   Dim As Integer m = patLen
   Dim As Integer n = txtLen

   Dim As Integer badchar(NO_OF_CHARS)

   '/* Fill the bad character array by calling the preprocessing
   '   function badCharHeuristic() for given pattern */
   badCharHeuristic(pat, m, badchar())

   Dim As Integer s = 0  '// s is shift of the pattern with respect to text

   while(s <= (n - m))

      Dim As Integer j = m-1

      '/* Keep reducing index j of pattern while characters of
      '   pattern and text are matching at this shift s */
      While (j >= 0 and (pat[j] = txt[s+j]))
         j-=1
      Wend

      '/* If the pattern is present at current shift, then index j
      '   will become -1 after the above loop */
      if (j < 0) Then
         
         Return s+1
         '/* Shift the pattern so that the next character in text
         '   aligns with the last occurrence of it in pattern.
         '   The condition s+m < n is necessary for the case when
         '   pattern occurs at the end of text */
         s += IIF((s+m) < n, m-badchar(txt[s+m]) , 1)

      else
         '/* Shift the pattern so that the bad character in text
         '   aligns with the last occurrence of it in pattern. The
         '   max function is used to make sure that we get a positive
         '   shift. We may get a negative shift if the last occurrence
         '   of bad character in pattern is on the right side of the
         '   current character. */
         s += max(1, j - badchar(txt[s+j]))

      EndIf
   Wend
End Function


Dim Shared As UByte ub(&h6fffffff)
ub(&h6fffffff) = 65
ub(&h6ffffffE) = 65
ub(&h6ffffffD) = 65
ub(&h6ffffffC) = 65
ub(&h6ffffffB) = 65
ub(&h6ffffffA) = 65

Dim As String sz = "AAAAAA"
Dim As UBYTE Ptr szPat = SAdd(sz)

Var t = Timer
? "Position=";search(@ub(0),&h6fffffff+1, szpat,2)
? "Time =";Timer -t
Sleep
VANYA
Posts: 1370
Joined: Oct 24, 2010 15:16
Location: Ярославль
Contact:

Re: Search binary bytes in the Big arrays

Postby VANYA » Jul 08, 2015 4:52

Even a simple enumeration is faster:

Code: Select all

function search( txt As UByte ptr,txtLen As integer , pat As Ubyte Ptr, patLen As integer) As Integer
   For i As Integer = 0 To txtLen
      If txt[i] = pat[0] Then
         For j As Integer = 1 To patLen
            If txt[i+j] <> pat[j] Then
               Return 0
            EndIf
         Next
         Return i           
      EndIf
   Next
End Function


Dim Shared As UByte ub(&h6fffffff)
ub(&h6fffffff) = 65
ub(&h6ffffffE) = 65
ub(&h6ffffffD) = 65
ub(&h6ffffffC) = 65
ub(&h6ffffffB) = 65
ub(&h6ffffffA) = 65

Dim As String sz = "AAAAAA"
Dim As UBYTE Ptr szPat = SAdd(sz)

Var t = Timer
? "Position=";search(@ub(0),&h6fffffff+1, szpat,2)
? "Time =";Timer -t
Sleep


Added last:

Code bit fixed:

Code: Select all

function search( txt As UByte ptr,txtLen As integer , pat As Ubyte Ptr, patLen As integer) As Integer
   Dim As Integer Nofind
   If txt = 0 or pat = 0 Or txtLen = 0 Or patLen = 0 Or txtLen<patLen Then Return 0
   For i As Integer = 0 To txtLen-1
      If txt[i] = pat[0] Then
         For j As Integer = 1 To patLen-1
            If txt[i+j] <> pat[j] Then
               Nofind = 1
               Exit For            
            EndIf
         Next
         If Nofind = 0 Then
            Return i+1
         Else
            Nofind = 0
         EndIf                   
      EndIf
   Next
End Function


Dim Shared As UByte ub(&h6fffffff)
ub(&h6fffffff) = 65
ub(&h6ffffffE) = 65
ub(&h6ffffffD) = 65
ub(&h6ffffffC) = 65
ub(&h6ffffffB) = 65
ub(&h6ffffffA) = 65

Dim As String sz = "AAAAAA"
Dim As UBYTE Ptr szPat = SAdd(sz)

Var t = Timer
? "Position=";search(@ub(0),&h6fffffff+1, szpat,6)
? "Time =";Timer -t
Sleep
grindstone
Posts: 726
Joined: May 05, 2015 5:35
Location: Germany

Re: Search binary bytes in the Big arrays

Postby grindstone » Jul 08, 2015 23:52

Hi VANYA!

With a simple trick you can use the 'InStr' - function to search for those byte patterns. It's about 5 times faster than your Boyer Moore search function.

Code: Select all

Type tStrDescr
   txtPtr As UByte Ptr
   txtLen As UInteger
   txtMem As UInteger
End Type

Dim Shared As UByte ub(&h6fffffff)

ub(&h6fffffff) = 65
ub(&h6ffffffE) = 65
ub(&h6ffffffD) = 65
ub(&h6ffffffC) = 65
ub(&h6ffffffB) = 65
ub(&h6ffffffA) = 65

Dim As String sz = "AAAAAA"

'make FB to think of the array as a string
Dim As String dummyString = "" 'create string descriptor
Dim As tStrDescr Ptr dummyStringPtr = Cast(tStrDescr Ptr,@dummyString) 'make the string descriptor of 'dummyString' accessable
dummyStringPtr->txtPtr = Cast(UByte Ptr,@ub(0)) 'set the text pointer of 'dummyString' to the array
dummyStringPtr->txtLen = &h6fffffff+1 'set the text length

Var t = Timer
? "Position="; InStr(dummyString,sz)
? "Time =";Timer -t
?

Sleep


Regards
grindstone
VANYA
Posts: 1370
Joined: Oct 24, 2010 15:16
Location: Ярославль
Contact:

Re: Search binary bytes in the Big arrays

Postby VANYA » Jul 09, 2015 3:49

Wow! Looking for a fast, thanks for the advice!
caseih
Posts: 1504
Joined: Feb 26, 2007 5:32

Re: Search binary bytes in the Big arrays

Postby caseih » Jul 09, 2015 4:49

Very cool hack. Does the fact that you're overwriting the string's descriptor information cause any leaks? Will the runtime library attempt to free the ub memory inappropriately when the dummyString goes out of scope? Maybe it would be a good idea to save the string descriptor information and then restore it after your operation is done, just so things don't leak or get freed at the wrong time?
grindstone
Posts: 726
Joined: May 05, 2015 5:35
Location: Germany

Re: Search binary bytes in the Big arrays

Postby grindstone » Jul 09, 2015 9:00

@VANYA: I'm pleased you like my little snippet.

@caseih: Your concerns are reasonable! There shouldn't be performed any write operations on the dummy string while the pointer is redirected (at least unless you know exactly what you're doing). Especially any operation that changes the length of the string should be avoided. To ensure safety the original pointer and length value should be memorized and restored after the search is done.

Regards
grindstone
fxm
Posts: 9700
Joined: Apr 22, 2009 12:46
Location: Paris suburbs, FRANCE

Re: Search binary bytes in the Big arrays

Postby fxm » Jul 09, 2015 11:59

I often use this hacking.
For an initially empty string, reset to 0 the descriptor fields before that the string goes out of scope is sufficient for a safe termination:

Code: Select all

Type tStrDescr
    txtPtr As UByte Ptr = 0
    txtLen As UInteger = 0
    txtMem As UInteger = 0
End Type

.....

Sleep
dummyStringPtr->Constructor()
fxm
Posts: 9700
Joined: Apr 22, 2009 12:46
Location: Paris suburbs, FRANCE

Re: Search binary bytes in the Big arrays

Postby fxm » Jul 09, 2015 12:19

Perhaps a little better formulated:

Code: Select all

Type tStrDescr
    Declare Constructor (Byval address As Ubyte Ptr, Byval size As uinteger)
    txtPtr As UByte Ptr
    txtLen As UInteger
    txtMem As UInteger 'unused
End Type
Constructor tStrDescr (Byval address As Ubyte Ptr, Byval size As uinteger)
    This.txtPtr = address
    This.txtLen = size
End Constructor

Dim Shared As UByte ub(&h6fffffff)

ub(&h6fffffff) = 65
ub(&h6ffffffE) = 65
ub(&h6ffffffD) = 65
ub(&h6ffffffC) = 65
ub(&h6ffffffB) = 65
ub(&h6ffffffA) = 65

Dim As String sz = "AAAAAA"

'make FB to think of the array as a string
Dim As String dummyString = "" 'create string descriptor
'construct a tStrDescr object at memory address of the dummyString descriptor
Dim As tStrDescr Ptr dummyStringPtr = New(Cast(Any Ptr, @dummyString)) _
                                         tstrDescr(@ub(Lbound(ub)), Ubound(ub)-Lbound(ub)+1)

Var t = Timer
? "Position="; InStr(dummyString,sz)
? "Time =";Timer -t
?

Sleep
dummyStringPtr->Constructor(0, 0)
caseih
Posts: 1504
Joined: Feb 26, 2007 5:32

Re: Search binary bytes in the Big arrays

Postby caseih » Jul 10, 2015 4:32

No trying to be pedantic here, but it appears that your code snippet leaks the new tstrDescr object that's assigned to dummyStringPtr. Am I correct in this? EDIT. No it appears I'm not correct. The syntax of the New() line threw me off.

Here's an attempt at making this hack more self-contained and more of a "normal" type:

Code: Select all

' Would be in the .bi file
type tStrDescr
    str_ptr As UByte Ptr
    str_len As UInteger
    str_mem As UInteger 'unused
end type

type BufferString
private:
    temp_string as string
    temp_string_desc as tStrDescr ptr

public:
    declare constructor(buffer as any ptr, length as uinteger)
    declare destructor()

    declare operator cast() byref as string

end type

declare operator len (byref bs as BufferString) as uinteger

' would be in the implementation .bas file
constructor BufferString(buffer as any ptr, length as uinteger)
    'temp_string is already empty
    temp_string_desc = Cast (any ptr, @temp_string)

    temp_string_desc->str_ptr = buffer
    temp_string_desc->str_len = length

    print len(temp_string)

end constructor

destructor BufferString()
    temp_string_desc->str_ptr = 0
    temp_string_desc->str_len = 0
end destructor

operator BufferString.cast() byref as string
    return temp_string
end operator

operator len (byref bs as BufferString) as uinteger
    return len(cast(string, bs))
end operator

'testing here
dim a(0 to 49) as ubyte

a(38) = 65
a(39) = 65
a(40) = 65
a(49) = 66

dim b as BufferString = BufferString(@a(0),50)
print len(cast(string,b)) 'normal built-in len
print len(b) 'overridden custom len
print cast(string,b) 'explicit cast
print b  'implicit cast

print instr(b, "AAA")


Ends up keeping the string descriptor UDT outside of the class, so that the class can encapsulate the logic of working with the dummy string, which is passed out by reference for use by any function that wants a string.
Last edited by caseih on Jul 10, 2015 16:45, edited 1 time in total.
fxm
Posts: 9700
Joined: Apr 22, 2009 12:46
Location: Paris suburbs, FRANCE

Re: Search binary bytes in the Big arrays

Postby fxm » Jul 10, 2015 7:08

caseih wrote:No trying to be pedantic here, but it appears that your code snippet leaks the new tstrDescr object that's assigned to dummyStringPtr. Am I correct in this? EDIT. No it appears I'm not correct. The syntax of the New() line threw me off.

I have used the operator Operator Placement New which constructs an object at a specified memory address, without any specific memory allocation.
The only memory (for the descriptor) is allocated by 'Dim As String dummyString = ""'.
The constructor only initializes the values of the descriptor.
At end, there is not operator 'Delete()' to call, and the descriptor memory will be deallocated when 'dummyString' will go out the scope.

Remark about your above code:
'dim a(50)' is equivalent to 'dim a(0 to 50)' so the array has 51 elements from a(0) to a(50).
grindstone
Posts: 726
Joined: May 05, 2015 5:35
Location: Germany

Re: Search binary bytes in the Big arrays

Postby grindstone » Jul 10, 2015 9:35

Hi guys!

I'm using this hack for several purposes, just because strings are easy to handle, but I've never thought of making an object of it. That's a very interesting idea, maybe in this way it can be used to do more complex operations.

But for now - thanks to your inspiration - I walked the other direction and managed to do it without an auxiliary structure, accessing the string descriptor immediately, with only 3 lines of code:

Code: Select all

Dim Shared As UByte ub(&h6fffffff)

ub(&h6fffffff) = 65
ub(&h6ffffffE) = 65
ub(&h6ffffffD) = 65
ub(&h6ffffffC) = 65
ub(&h6ffffffB) = 65
ub(&h6ffffffA) = 65

Dim As String sz = "AAAAAA"

''make FB to think of the array as a string
Dim As String dummyString = "" 'create string descriptor
*Cast(UInteger Ptr,@dummyString) = Cast(UInteger,@ub(0)) 'set the pointer
*(Cast(UInteger Ptr,@dummyString)+1) = &h6ffffffff+1 'set the length value

Var t = Timer
? "Position="; InStr(dummyString,sz)
? "Time =";Timer -t
?

Sleep


Regards
grindstone

EDIT: I inserted the missing f (see below) so the array size is correct now.
Last edited by grindstone on Jul 11, 2015 8:41, edited 1 time in total.
dodicat
Posts: 6547
Joined: Jan 10, 2006 20:30
Location: Scotland

Re: Search binary bytes in the Big arrays

Postby dodicat » Jul 10, 2015 12:06

I took me ages to discover why grindstone's second offering was so fast, the array has been shortened by an f.
The final positions were way out, that should have been a clue right enough, but I had to fudge yet another way, just as well, or I would be sitting here wasting more time.

Nevertheless, good idea Grindstone, changing a ubyte array into a string.
Here was my own fudge:

Code: Select all


Dim Shared As UByte ub(&h6ffffff) 'Shortened from original by an f
ub(&h6ffffff) = 65
ub(&h6fffffE) = 65
ub(&h6fffffD) = 65
ub(&h6fffffC) = 65
ub(&h6fffffB) = 65
ub(&h6fffffA) = 65

dim as double start=timer
'get rid of 0's (for a while)
for n as integer=0 to ubound(ub)
if ub(n)=0 then ub(n)=-1
next n

Dim As String sz = "AAAAAA"
dim as zstring ptr w=cptr(zstring ptr,@ub(0))

print instr(*w,sz)
print ubound(ub)-lbound(ub)+1 -5
'restore array
for n as integer=0 to ubound(ub)
if ub(n)=-1 then ub(n)=0
next n
print "Restored, time taken ";timer-start
sleep
Roland Chastain
Posts: 859
Joined: Nov 24, 2011 19:49
Location: France
Contact:

Re: Search binary bytes in the Big arrays

Postby Roland Chastain » Jul 10, 2015 13:31

@dodicat

Interesting example. But the ubyte cannot be set to -1!

Code: Select all

dim ub as ubyte = -1
print(ub) ' 255
print(ub = -1) ' 0
caseih
Posts: 1504
Joined: Feb 26, 2007 5:32

Re: Search binary bytes in the Big arrays

Postby caseih » Jul 10, 2015 13:50

Roland Chastain wrote:Interesting example. But the ubyte cannot be set to -1!

Sure it can. 255 and -1 are the same value. In fact setting unsigned values to -1 is a great way to set them to their max value without having to explicitly deal with the size of the type. -1 works for 8, 16, 32, or 64 bits. That's the beauty of two's complement.

grindstone wrote:But for now - thanks to your inspiration - I walked the other direction and managed to do it without an auxiliary structure, accessing the string descriptor immediately, with only 3 lines of code:

Yes but unless you reset the descriptor back to zero pointer and zero length, when the string goes out of scope, the fB runtime will free the memory associated with your array, which almost certainly isn't what you want. Especially when FB will also try to free the array when it goes out of scope. This would end up in a double free() situation, which will cause a crash. That's the reason for having the object-oriented wrapper to make sure that cleanup is correct.

fxm wrote:'dim a(50)' is equivalent to 'dim a(0 to 50)' so the array has 51 elements from a(0) to a(50).

Ahh yes true enough. You can tell I'm a C programmer.
grindstone
Posts: 726
Joined: May 05, 2015 5:35
Location: Germany

Re: Search binary bytes in the Big arrays

Postby grindstone » Jul 11, 2015 8:35

dodicat wrote:I took me ages to discover why grindstone's second offering was so fast, the array has been shortened by an f.
I confess myself guilty. At the moment I have not enough memory (only 2GB) to run the programme with its original array size, so I deleted one f. Alas, I forgot to reinsert it before posting the code. I apologise for the inconvenience.

Regards
grindstone

Return to “General”

Who is online

Users browsing this forum: No registered users and 10 guests