Insert/Replace String procedures

Post your FreeBASIC tips and tricks here. Please don’t post your code without including an explanation.
Munair
Posts: 836
Joined: Oct 19, 2017 15:00
Location: 't Zand, NL
Contact:

Insert/Replace String procedures

Postby Munair » Dec 08, 2018 20:48

While it seems simple and straight forward to replace specific occurrences of a string using the InStr() function, a loop can easily get stuck by repeating the same replacement again and again, possibly creating a string until memory runs out.

Here is example code of a Replace and ReplaceAll procedure. Also included are overloaded InsertStr() procedures that can be used to insert and insert/replace (part of) a string.

Code: Select all

sub InsertStr overload (byref s1 as string, byref s2 as const string, _
byval start as uinteger)
   ' in <s1> insert <s2> at <start>
   s1 = left(s1, start - 1) + s2 + mid(s1, start)
end sub

sub InsertStr overload (byref s1 as string, byref s2 as const string, _
byval start as uinteger, byval count as uinteger)
   ' in <s1> insert <s2> at <start> and replace <count> characters
   s1 = left(s1, start - 1) + s2 + mid(s1, start + count)
end sub

sub Replace(byref s1 as string, byref s2 as const string, _
byref s3 as const string)
   ' in <s1> replace <s2> by <s3>
   dim p as uinteger
   
   if s3 <> s2 then
      p = instr(s1, s2)
      if p then
         InsertStr(s1, s3, p, len(s2))
      end if
   end if
end sub

sub ReplaceAll(byref s1 as string, byref s2 as const string, _
byref s3 as const string)
   ' in <s1> replace all occurrences of <s2> by <s3>
   dim p as uinteger
   dim q as uinteger
   
   if s3 <> s2 then
      p = instr(s1, s2)
      if p then
         q = len(s3)
         if q = 0 then q = 1
         do
            InsertStr(s1, s3, p, len(s2))
            p = instr(p + q, s1, s2)
         loop until p = 0
      end if
   end if
end sub

dim s as string = "I want apples and oranges and pears and berries."
print s
Replace(s, " want ", " want to eat ")
print s
ReplaceAll(s, " and ", " or ")
print s
Last edited by Munair on Dec 09, 2018 21:37, edited 1 time in total.
jj2007
Posts: 1215
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: Insert/Replace String procedures

Postby jj2007 » Dec 09, 2018 10:08

Hi Munair,

The idea is nice, and it works fine for small files. But I made a test with mshtmlc.bi (2.5 MB), and that became a little bit slow:

Code: Select all

Dim Content As String=LoadFile("mshtmlc.bi")
So I tried to create a faster version, and what I got is below. I spent the whole night fighting with FB syntax and compilation errors, and now it works to a certain degree but there is still a bug, clearly marked below. There is also a little assembly hack for which I need help - therefore 32-bit compilation, sorry. Grateful if you or somebody else could have a look to see what's wrong...

Code: Select all

see post of Dec 10, 2018 0:15 below
Last edited by jj2007 on Dec 10, 2018 0:01, edited 3 times in total.
Munair
Posts: 836
Joined: Oct 19, 2017 15:00
Location: 't Zand, NL
Contact:

Re: Insert/Replace String procedures

Postby Munair » Dec 09, 2018 11:49

I wouldn't use my procedures on large texts (files). I would probably use different buffers and append/insert them.
jj2007
Posts: 1215
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: Insert/Replace String procedures

Postby jj2007 » Dec 09, 2018 12:39

Actually, I got it working now, see above. Speed is fine now, this is what I get for mshtmlc.bi replacing "declare" with "* DECLARE *":

Code: Select all

 23916 times declare
 0.0182 seconds for replace all
 2512349 bytes written
And I still need a hint how to avoid the 'acrobatics' shown above. There must be a simpler way...
grindstone
Posts: 644
Joined: May 05, 2015 5:35
Location: Germany

Re: Insert/Replace String procedures

Postby grindstone » Dec 09, 2018 12:59

@jj2007: You used the wrong value to calculate 'diff':

Code: Select all

Function ReplaceAllJ(ByRef s1 As String, ByRef s2 As Const String, ByRef s3 As Const String) As ZString Ptr
   ' in <s1> replace all occurrences of <s2> by <s3>
   Dim As Integer posx=0, posPrevious, ct=0, diff, l2=Len(s2), l3=Len(s3)
   Dim As Any Ptr posSrc
   Dim As Any Ptr posDest
   Dim p As UInteger
   Dim q As UInteger
   If s3 <> s2 Then
      diff=l3-l2
      Do
         posx=InStr(posx+1, s1, s2)
         ct+=diff
      Loop Until posx=0
      posSrc=StrPtr(s1)      ' pointer to source
      retStrLen=Len(s1)+diff
      retStr=Callocate(retStrLen+100000)   ' pointer to destination
      posDest=retStr
      posx=0
      Do         ' ########## the bug is in this loop!! ###########
         posPrevious=posx+1      ' set new start
         posx=InStr(posPrevious, s1, s2)
         If posx Then
            'diff=posx-posPrevious      ' bytes to copy from source     '<<<<<<<<<<<<<<<<<<<<<<<<<<
            Dim As Integer h1 = posSrc - Cast(UInteger, StrPtr(s1)) + 1 '<<<<<<<<<<<<<<<<<<<<<<<<<<
            diff=posx - h1      ' bytes to copy from source             '<<<<<<<<<<<<<<<<<<<<<<<<<<
            memcpy(posDest, posSrc, diff)   ' copy string between matches
            posDest+=diff      ' correct destination
            posSrc+=diff+l2      ' correct source position, including string to be replaced
            memcpy(posDest, StrPtr(s3), l3)   ' copy the replace string
            posDest+=l3      ' and correct the destination
         EndIf
      Loop Until posx=0
      ' Print "Rest=";retStrLen;"-";posPrevious;"=";retStrLen-posPrevious
      memcpy(posDest, posSrc, retStrLen-posPrevious)
   EndIf
   Print ct; " times ";s2
   Return retStr
End Function


EDIT: Ooops, I see you found the bug yourself in the meantime.
jj2007
Posts: 1215
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: Insert/Replace String procedures

Postby jj2007 » Dec 09, 2018 13:27

Yes, my version posted above works just fine. Still, I need advice with that de-referencing problem ('acrobatics').
grindstone
Posts: 644
Joined: May 05, 2015 5:35
Location: Germany

Re: Insert/Replace String procedures

Postby grindstone » Dec 09, 2018 16:03

Maybe you should DIM 'ContentNew' as a ZString Ptr instead of a String Ptr, as well as the return value of ReplaceAllJ. Furthermore you're submitting different pointer types to SaveFilePtr (String Ptr as Any Ptr). You should only use ZString pointers when you're working with the WinAPI.
jj2007
Posts: 1215
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: Insert/Replace String procedures

Postby jj2007 » Dec 09, 2018 23:15

Thanks, you brought me on the right track. FB is pretty confusing... here it is, no assembly code left, so it compiles also in 64-bit mode:

Code: Select all

#include "crt.bi"   ' *** compile as 32- or 64-bit ***
#include "Windows.bi"
Dim shared as zstring ptr retStr
#define testfile "mshtmlc.bi"   ' the real thing - over 2MB!
#define tempfile "TmpSaved.txt"
#define findstring "declare"   ' some bi files have lowercase declare
#define replstring "* DECLARE *"   ' for testing: a longer string

Function ReplaceAllJ(byref s1 as string, byref s2 as const string, byref s3 as const string) as zstring ptr
   ' in <s1> replace all occurrences of <s2> by <s3>
   dim as integer posx=0, posPrevious, ct=0, diff, l2=len(s2), l3=len(s3)
   dim as any ptr posSrc
   dim as any ptr posDest
   if s3 <> s2 then
      diff=l3-l2
    Do
       posx=Instr(posx+1, s1, s2)
       ct+=diff
    Loop until posx=0
    posSrc=StrPtr(s1)      ' pointer to source
    retStr=CAllocate(len(s1)+ct)      ' pointer to destination
    posDest=retStr
    Do         ' ########## innermost loop ###########
       posPrevious=posx+1      ' set new start
       posx=Instr(posPrevious, s1, s2)
       if posx then
          diff=posx-posPrevious      ' bytes to copy from source
          memcpy(posDest, posSrc, diff)   ' copy string between matches
          posDest+=diff      ' correct destination
          posSrc+=diff+l2      ' correct source position, including string to be replaced
          memcpy(posDest, StrPtr(s3), l3)   ' copy the replace string
          posDest+=l3      ' and correct the destination
          posx+=l2-1      ' same for source
       endif
   Loop until posx=0
   diff=len(s1)-posPrevious+1      ' bytes to copy from source
   ' Print "Rest=";len(s1);"-";posPrevious;"=";diff
   memcpy(posDest, posSrc, diff)
   endif
   print ct; " times ";s2
   return retStr
end function

Function LoadFile(ByRef filename As String) As String   
  Dim h As Integer=FreeFile
  Dim txt As String=""
  If Open(filename For Binary Access Read As #h)=0 Then
   If Lof(h) Then
      txt = String(Lof(h), 0)
      If Get(#h, 0, txt) Then txt = ""
   End If
   Close #h
  endif
  Return txt
End Function

Sub SaveFilePtr(ByRef filename As String, ByRef source As zstring ptr)
  Dim written As integer
  Dim h As handle=CreateFile(filename, GENERIC_WRITE, 0, 0, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, 0)
  If h<>INVALID_HANDLE_VALUE Then
   WriteFile(h, source, strlen(source), @written, 0)
   CloseHandle(h)
   print written;" bytes written"
  endif
End Sub

Dim Content As String=LoadFile(testfile)
Dim as double t=timer
Dim ContentNew as zstring ptr=ReplaceAllJ(Content, findstring, replstring)
print using "##.#### seconds for replace all"; timer-t
SaveFilePtr(tempfile, ContentNew)
sleep

Typical results:

Code: Select all

 23916 times declare
 0.0170 seconds for replace all
 2512349 bytes written

The 64-bit version is about 20% faster. My assembly version is a factor 3-4 faster, though.
dodicat
Posts: 5913
Joined: Jan 10, 2006 20:30
Location: Scotland

Re: Insert/Replace String procedures

Postby dodicat » Dec 10, 2018 0:30

This version of replace is extremely fast jj2007, and easy to use, (skipping all the defines and skipping include windows.bi).
thank you.
Munair
Posts: 836
Joined: Oct 19, 2017 15:00
Location: 't Zand, NL
Contact:

Re: Insert/Replace String procedures

Postby Munair » Dec 10, 2018 5:52

Yes, the SaveFilePtr stuff seems to be Windows only.
sancho3
Posts: 358
Joined: Sep 30, 2017 3:22

Re: Insert/Replace String procedures

Postby sancho3 » Dec 10, 2018 7:25

@jj2007
I haven't run the code (because I'm on Linux tonight) but I have some questions.
Firstly something seems strange here:
You declare a global variable retStr and then in the replace function you actually return that variable. That seems odd to me.
In your test code you store the returned value in a pointer. contentNew. Isn't that pointer now just pointing to the same memory as the global retStr?
So could you not just eliminate the function, use a sub and end up in the same place? contentNew and retStr will be the exact same reference won't they?
Alternatively maybe move retStr into the procedure. It doesn't need to be global.

Doesn't the compiler create a copy of the global variable because its used in the line 'return retStr'? Its a pointer so not exactly slowing things down but it does seem like one unnecessary copy.

Should you be deallocating the memory you create to build retStr? I'm not sure how that works when dealing with strings.
I think that the allocated memory is lost for the life of the program and successive calls to replaceAll would eat up more memory in the same fashion.
jj2007
Posts: 1215
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: Insert/Replace String procedures

Postby jj2007 » Dec 10, 2018 8:51

@dodicat: Thanks ;-)

@sancho3: Valid points, of course. I've struggled a lot to get it running, FB is very opaque and confusing IMHO, but if you can improve it, please do! I chose WriteFile not because I am fond of Windows but because I couldn't get Put #n to work. At a certain point I simply gave up fighting with the compler. Just to give you an idea how it could be handled, from a user perspective - these two lines do the same as the whole code posted above, in one third of the time:

Code: Select all

  Let esi=FileRead$("mshtmlc.bi")
  FileWrite "TmpFile.txt", Cat$(Replace$(esi, "declare", "* DECLARE *"))

Btw, no, the compiler does not create a copy of retStr when performing the return retStr. I've checked that in the disassemblies for both the 32- and 64-bit Gcc versions. So there shouldn't be a memleak. But you are right that there might be more elegant ways, and that that memory must obviously be deallocated.
sancho3
Posts: 358
Joined: Sep 30, 2017 3:22

Re: Insert/Replace String procedures

Postby sancho3 » Dec 12, 2018 4:29

In my opinion the tricky part is in the fact that you can't deallocate the memory until after the user has gotten and dealt with the result string. There is no way for the function to know when that happens.
So do we devise a running list of pointers to new memory each time replace is called? I don't like that idea.
My plan is to turn retStr into a static variable within the function. If the pointer <> 0 then it has been allocated prior to this round and can be deallocated. The theory being that the user is finished with that string before calling replace again.
I changed very little in your code. I did have a problem that it didn't seem to be working and so I added a bunch of comments to debug. It turns out that all of my 'declares' in my test file were mixed case ('Declare') and instr wasn't finding them.
That is an issue that needs to be looked at in a proper replaceall function, but it means customizing an instr routine. Too much for me for now.
There are 3 changes in this version noted by ***

Code: Select all

#include "crt.bi"   ' *** compile as 32- or 64-bit ***
#include "Windows.bi"
'Dim shared as zstring ptr retStr   ***changed
#define testfile "mshtmlc.bi"   ' the real thing - over 2MB!
#define tempfile "TmpSaved.txt"
#define findstring "declare"   ' some bi files have lowercase declare
#define replstring "! DECLARE !"   ' for testing: a longer string

Function ReplaceAllJ(byref s1 as string, byref s2 as const string, byref s3 as const string) as zstring ptr
   ' in <s1> replace all occurrences of <s2> by <s3>

   Static As ZString Ptr retStr = 0    ' *** added

   dim as integer posx=0, posPrevious, ct=0, diff, l2=len(s2), l3=len(s3)
   dim as any ptr posSrc
   dim as any ptr posDest
   if s3 <> s2 then
      
      ' this block adds to ct the difference in length of strings s3 and s2 each time s2 is found in s1
      ' this is done to accurately allocate enough room to allow for replacing instances of s2 with s3 if they are not the same size
      diff=l3-l2                        
       Do
          posx=Instr(posx+1, s1, s2)
          ct+=diff
       Loop until posx=0

      ' at this point posx = 0
       posSrc=StrPtr(s1)      ' pointer to source

       If retStr <> 0 Then Deallocate(retStr)      ' *** added

      ' allocate enough mem to allow for replacements
      retStr=CAllocate(len(s1)+ct)      ' pointer to destination
   
      posDest=retStr      ' set posDest to the beginning of the new memory      

      Do         ' ########## innermost loop ###########
         posPrevious=posx+1      ' set new start (posPrevious = 1 at first)

         ' find the first instance of s2 in s1 starting from position posx
         posx=Instr(posPrevious, s1, s2)
         if posx <> 0 Then
            diff=posx-posPrevious      ' bytes to copy from source
            memcpy(posDest, posSrc, diff)   ' copy string between matches
            posDest+=diff      ' correct destination (move the destination pointer to just past the copied string)
            posSrc+=diff+l2      ' correct source position, including string to be replaced (move the source to just past the found s2)
            memcpy(posDest, StrPtr(s3), l3)   ' copy the replace string
            posDest+=l3      ' and correct the destination
            posx+=l2-1      ' same for source
         endif
      Loop until posx=0

   diff=len(s1)-posPrevious+1      ' bytes to copy from source
   ' Print "Rest=";len(s1);"-";posPrevious;"=";diff
   memcpy(posDest, posSrc, diff)
   endif
   print ct; " times ";s2
   return retStr
end function

Function LoadFile(ByRef filename As String) As String   
  Dim h As Integer=FreeFile
  Dim txt As String=""
  If Open(filename For Binary Access Read As #h)=0 Then
   If Lof(h) Then
      txt = String(Lof(h), 0)
      If Get(#h, 0, txt) Then txt = ""
   End If
   Close #h
  EndIf
  Return txt
End Function

Sub SaveFilePtr(ByRef filename As String, ByRef source As zstring ptr)
  Dim written As integer
  Dim h As handle=CreateFile(filename, GENERIC_WRITE, 0, 0, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, 0)
  If h<>INVALID_HANDLE_VALUE Then
   WriteFile(h, source, strlen(source), @written, 0)
   CloseHandle(h)
   print written;" bytes written"
  endif
End Sub

Dim Content As String=LoadFile(testfile)
Dim as double t=timer
Dim ContentNew as zstring ptr=ReplaceAllJ(Content, findstring, replstring)
print using "##.#### seconds for replace all"; timer-t
SaveFilePtr(tempfile, ContentNew)
sleep

I noted your loop to find all occurrences of the string to be replaced in order to size retStr. That slows the process down for sure.
That could probably be replaced by a chunk strategy that first allocates to the size of the original string then adds chunks as needed. I will leave that as well for another time.
In any case your code is very, very fast. Nice work.
jj2007
Posts: 1215
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: Insert/Replace String procedures

Postby jj2007 » Dec 12, 2018 11:49

sancho3 wrote:I noted your loop to find all occurrences of the string to be replaced in order to size retStr. That slows the process down for sure. That could probably be replaced by a chunk strategy that first allocates to the size of the original string then adds chunks as needed.
That is possible, not even too difficult, but keep in mind that HeapAlloc & friends cost thousands of cycles. Still, you are right that starting with an oversized allocation and trimming it afterwards once could be a good strategy.

In any case your code is very, very fast. Nice work.
Thanks - it's a port to FB of my Replace$() routine which, btw, has options for case-insensitive and full word search.
WQ1980
Posts: 41
Joined: Sep 25, 2015 12:04
Location: Russia

Re: Insert/Replace String procedures

Postby WQ1980 » Dec 13, 2018 11:30

Code: Select all

Dim Content As String
Dim ContentNew As ZString Ptr
Content="D:\Desktop\0.bmp"

ContentNew=ReplaceAllJ(Content, "0", "1")
?*ContentNew '' -> D:\Desktop\1.bmp#Щkи6


returns a damaged string
if len(searchstring) = len(replacestring) ...

Return to “Tips and Tricks”

Who is online

Users browsing this forum: No registered users and 31 guests