Insert/Replace String procedures

Post your FreeBASIC source, examples, tips and tricks here. Please don’t post code without including an explanation.
Munair
Posts: 1286
Joined: Oct 19, 2017 15:00
Location: Netherlands
Contact:

Insert/Replace String procedures

Post by Munair »

While it seems simple and straight forward to replace specific occurrences of a string using the InStr() function, a loop can easily get stuck by repeating the same replacement again and again, possibly creating a string until memory runs out.

Here is example code of a Replace and ReplaceAll procedure. Also included are overloaded InsertStr() procedures that can be used to insert and insert/replace (part of) a string.

Code: Select all

sub InsertStr overload (byref s1 as string, byref s2 as const string, _
byval start as uinteger)
	' in <s1> insert <s2> at <start>
	s1 = left(s1, start - 1) + s2 + mid(s1, start)
end sub

sub InsertStr overload (byref s1 as string, byref s2 as const string, _
byval start as uinteger, byval count as uinteger)
	' in <s1> insert <s2> at <start> and replace <count> characters
	s1 = left(s1, start - 1) + s2 + mid(s1, start + count)
end sub

sub Replace(byref s1 as string, byref s2 as const string, _
byref s3 as const string)
	' in <s1> replace <s2> by <s3>
	dim p as uinteger
	
	if s3 <> s2 then
		p = instr(s1, s2)
		if p then
			InsertStr(s1, s3, p, len(s2))
		end if
	end if
end sub

sub ReplaceAll(byref s1 as string, byref s2 as const string, _
byref s3 as const string)
	' in <s1> replace all occurrences of <s2> by <s3>
	dim p as uinteger
	dim q as uinteger
	
	if s3 <> s2 then
		p = instr(s1, s2)
		if p then
			q = len(s3)
			if q = 0 then q = 1
			do
				InsertStr(s1, s3, p, len(s2))
				p = instr(p + q, s1, s2)
			loop until p = 0
		end if
	end if
end sub

dim s as string = "I want apples and oranges and pears and berries."
print s
Replace(s, " want ", " want to eat ")
print s
ReplaceAll(s, " and ", " or ")
print s
Last edited by Munair on Dec 09, 2018 21:37, edited 1 time in total.
jj2007
Posts: 2326
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: Insert/Replace String procedures

Post by jj2007 »

Hi Munair,

The idea is nice, and it works fine for small files. But I made a test with mshtmlc.bi (2.5 MB), and that became a little bit slow:

Code: Select all

Dim Content As String=LoadFile("mshtmlc.bi")
So I tried to create a faster version, and what I got is below. I spent the whole night fighting with FB syntax and compilation errors, and now it works to a certain degree but there is still a bug, clearly marked below. There is also a little assembly hack for which I need help - therefore 32-bit compilation, sorry. Grateful if you or somebody else could have a look to see what's wrong...

Code: Select all

see post of Dec 10, 2018 0:15 below
Last edited by jj2007 on Dec 10, 2018 0:01, edited 3 times in total.
Munair
Posts: 1286
Joined: Oct 19, 2017 15:00
Location: Netherlands
Contact:

Re: Insert/Replace String procedures

Post by Munair »

I wouldn't use my procedures on large texts (files). I would probably use different buffers and append/insert them.
jj2007
Posts: 2326
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: Insert/Replace String procedures

Post by jj2007 »

Actually, I got it working now, see above. Speed is fine now, this is what I get for mshtmlc.bi replacing "declare" with "* DECLARE *":

Code: Select all

 23916 times declare
 0.0182 seconds for replace all
 2512349 bytes written
And I still need a hint how to avoid the 'acrobatics' shown above. There must be a simpler way...
grindstone
Posts: 862
Joined: May 05, 2015 5:35
Location: Germany

Re: Insert/Replace String procedures

Post by grindstone »

@jj2007: You used the wrong value to calculate 'diff':

Code: Select all

Function ReplaceAllJ(ByRef s1 As String, ByRef s2 As Const String, ByRef s3 As Const String) As ZString Ptr
	' in <s1> replace all occurrences of <s2> by <s3>
	Dim As Integer posx=0, posPrevious, ct=0, diff, l2=Len(s2), l3=Len(s3)
	Dim As Any Ptr posSrc
	Dim As Any Ptr posDest
	Dim p As UInteger
	Dim q As UInteger
	If s3 <> s2 Then
		diff=l3-l2
		Do
			posx=InStr(posx+1, s1, s2)
			ct+=diff
		Loop Until posx=0
		posSrc=StrPtr(s1)      ' pointer to source
		retStrLen=Len(s1)+diff
		retStr=Callocate(retStrLen+100000)   ' pointer to destination
		posDest=retStr
		posx=0
		Do         ' ########## the bug is in this loop!! ###########
			posPrevious=posx+1      ' set new start
			posx=InStr(posPrevious, s1, s2)
			If posx Then
				'diff=posx-posPrevious      ' bytes to copy from source     '<<<<<<<<<<<<<<<<<<<<<<<<<<
				Dim As Integer h1 = posSrc - Cast(UInteger, StrPtr(s1)) + 1 '<<<<<<<<<<<<<<<<<<<<<<<<<<
				diff=posx - h1      ' bytes to copy from source             '<<<<<<<<<<<<<<<<<<<<<<<<<<
				memcpy(posDest, posSrc, diff)   ' copy string between matches
				posDest+=diff      ' correct destination
				posSrc+=diff+l2      ' correct source position, including string to be replaced
				memcpy(posDest, StrPtr(s3), l3)   ' copy the replace string
				posDest+=l3      ' and correct the destination
			EndIf
		Loop Until posx=0
		' Print "Rest=";retStrLen;"-";posPrevious;"=";retStrLen-posPrevious
		memcpy(posDest, posSrc, retStrLen-posPrevious)
	EndIf
	Print ct; " times ";s2
	Return retStr
End Function
EDIT: Ooops, I see you found the bug yourself in the meantime.
jj2007
Posts: 2326
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: Insert/Replace String procedures

Post by jj2007 »

Yes, my version posted above works just fine. Still, I need advice with that de-referencing problem ('acrobatics').
grindstone
Posts: 862
Joined: May 05, 2015 5:35
Location: Germany

Re: Insert/Replace String procedures

Post by grindstone »

Maybe you should DIM 'ContentNew' as a ZString Ptr instead of a String Ptr, as well as the return value of ReplaceAllJ. Furthermore you're submitting different pointer types to SaveFilePtr (String Ptr as Any Ptr). You should only use ZString pointers when you're working with the WinAPI.
jj2007
Posts: 2326
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: Insert/Replace String procedures

Post by jj2007 »

Thanks, you brought me on the right track. FB is pretty confusing... here it is, no assembly code left, so it compiles also in 64-bit mode:

Code: Select all

#include "crt.bi"	' *** compile as 32- or 64-bit ***
#include "Windows.bi"
Dim shared as zstring ptr retStr
#define testfile "mshtmlc.bi"	' the real thing - over 2MB!
#define tempfile "TmpSaved.txt"
#define findstring "declare"	' some bi files have lowercase declare
#define replstring "* DECLARE *"	' for testing: a longer string

Function ReplaceAllJ(byref s1 as string, byref s2 as const string, byref s3 as const string) as zstring ptr
   ' in <s1> replace all occurrences of <s2> by <s3>
   dim as integer posx=0, posPrevious, ct=0, diff, l2=len(s2), l3=len(s3)
   dim as any ptr posSrc
   dim as any ptr posDest
   if s3 <> s2 then
   	diff=l3-l2
	 Do
	 	posx=Instr(posx+1, s1, s2)
	 	ct+=diff
	 Loop until posx=0
	 posSrc=StrPtr(s1)		' pointer to source
	 retStr=CAllocate(len(s1)+ct)		' pointer to destination
	 posDest=retStr
	 Do			' ########## innermost loop ###########
	 	posPrevious=posx+1		' set new start
	 	posx=Instr(posPrevious, s1, s2)
	 	if posx then
	 		diff=posx-posPrevious		' bytes to copy from source
	 		memcpy(posDest, posSrc, diff)	' copy string between matches
	 		posDest+=diff		' correct destination
	 		posSrc+=diff+l2		' correct source position, including string to be replaced
	 		memcpy(posDest, StrPtr(s3), l3)	' copy the replace string
	 		posDest+=l3		' and correct the destination
	 		posx+=l2-1		' same for source
	 	endif
	Loop until posx=0
	diff=len(s1)-posPrevious+1		' bytes to copy from source
	' Print "Rest=";len(s1);"-";posPrevious;"=";diff
	memcpy(posDest, posSrc, diff)
   endif
   print ct; " times ";s2
   return retStr
end function

Function LoadFile(ByRef filename As String) As String    
  Dim h As Integer=FreeFile
  Dim txt As String=""
  If Open(filename For Binary Access Read As #h)=0 Then
	If Lof(h) Then
		txt = String(Lof(h), 0)
		If Get(#h, 0, txt) Then txt = ""
	End If
	Close #h
  endif
  Return txt
End Function

Sub SaveFilePtr(ByRef filename As String, ByRef source As zstring ptr)
  Dim written As integer
  Dim h As handle=CreateFile(filename, GENERIC_WRITE, 0, 0, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, 0)
  If h<>INVALID_HANDLE_VALUE Then
	WriteFile(h, source, strlen(source), @written, 0)
	CloseHandle(h)
	print written;" bytes written"
  endif
End Sub

Dim Content As String=LoadFile(testfile)
Dim as double t=timer
Dim ContentNew as zstring ptr=ReplaceAllJ(Content, findstring, replstring)
print using "##.#### seconds for replace all"; timer-t
SaveFilePtr(tempfile, ContentNew)
sleep
Typical results:

Code: Select all

 23916 times declare
 0.0170 seconds for replace all
 2512349 bytes written
The 64-bit version is about 20% faster. My assembly version is a factor 3-4 faster, though.
dodicat
Posts: 7983
Joined: Jan 10, 2006 20:30
Location: Scotland

Re: Insert/Replace String procedures

Post by dodicat »

This version of replace is extremely fast jj2007, and easy to use, (skipping all the defines and skipping include windows.bi).
thank you.
Munair
Posts: 1286
Joined: Oct 19, 2017 15:00
Location: Netherlands
Contact:

Re: Insert/Replace String procedures

Post by Munair »

Yes, the SaveFilePtr stuff seems to be Windows only.
sancho3
Posts: 358
Joined: Sep 30, 2017 3:22

Re: Insert/Replace String procedures

Post by sancho3 »

@jj2007
I haven't run the code (because I'm on Linux tonight) but I have some questions.
Firstly something seems strange here:
You declare a global variable retStr and then in the replace function you actually return that variable. That seems odd to me.
In your test code you store the returned value in a pointer. contentNew. Isn't that pointer now just pointing to the same memory as the global retStr?
So could you not just eliminate the function, use a sub and end up in the same place? contentNew and retStr will be the exact same reference won't they?
Alternatively maybe move retStr into the procedure. It doesn't need to be global.

Doesn't the compiler create a copy of the global variable because its used in the line 'return retStr'? Its a pointer so not exactly slowing things down but it does seem like one unnecessary copy.

Should you be deallocating the memory you create to build retStr? I'm not sure how that works when dealing with strings.
I think that the allocated memory is lost for the life of the program and successive calls to replaceAll would eat up more memory in the same fashion.
jj2007
Posts: 2326
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: Insert/Replace String procedures

Post by jj2007 »

@dodicat: Thanks ;-)

@sancho3: Valid points, of course. I've struggled a lot to get it running, FB is very opaque and confusing IMHO, but if you can improve it, please do! I chose WriteFile not because I am fond of Windows but because I couldn't get Put #n to work. At a certain point I simply gave up fighting with the compler. Just to give you an idea how it could be handled, from a user perspective - these two lines do the same as the whole code posted above, in one third of the time:

Code: Select all

  Let esi=FileRead$("mshtmlc.bi")
  FileWrite "TmpFile.txt", Cat$(Replace$(esi, "declare", "* DECLARE *"))
Btw, no, the compiler does not create a copy of retStr when performing the return retStr. I've checked that in the disassemblies for both the 32- and 64-bit Gcc versions. So there shouldn't be a memleak. But you are right that there might be more elegant ways, and that that memory must obviously be deallocated.
sancho3
Posts: 358
Joined: Sep 30, 2017 3:22

Re: Insert/Replace String procedures

Post by sancho3 »

In my opinion the tricky part is in the fact that you can't deallocate the memory until after the user has gotten and dealt with the result string. There is no way for the function to know when that happens.
So do we devise a running list of pointers to new memory each time replace is called? I don't like that idea.
My plan is to turn retStr into a static variable within the function. If the pointer <> 0 then it has been allocated prior to this round and can be deallocated. The theory being that the user is finished with that string before calling replace again.
I changed very little in your code. I did have a problem that it didn't seem to be working and so I added a bunch of comments to debug. It turns out that all of my 'declares' in my test file were mixed case ('Declare') and instr wasn't finding them.
That is an issue that needs to be looked at in a proper replaceall function, but it means customizing an instr routine. Too much for me for now.
There are 3 changes in this version noted by ***

Code: Select all

#include "crt.bi"   ' *** compile as 32- or 64-bit ***
#include "Windows.bi"
'Dim shared as zstring ptr retStr	***changed
#define testfile "mshtmlc.bi"   ' the real thing - over 2MB!
#define tempfile "TmpSaved.txt"
#define findstring "declare"   ' some bi files have lowercase declare
#define replstring "! DECLARE !"   ' for testing: a longer string

Function ReplaceAllJ(byref s1 as string, byref s2 as const string, byref s3 as const string) as zstring ptr
   ' in <s1> replace all occurrences of <s2> by <s3>

   Static As ZString Ptr retStr = 0 	' *** added 

   dim as integer posx=0, posPrevious, ct=0, diff, l2=len(s2), l3=len(s3)
   dim as any ptr posSrc
   dim as any ptr posDest
   if s3 <> s2 then
		
		' this block adds to ct the difference in length of strings s3 and s2 each time s2 is found in s1
		' this is done to accurately allocate enough room to allow for replacing instances of s2 with s3 if they are not the same size 
		diff=l3-l2								 
		 Do
		    posx=Instr(posx+1, s1, s2)
		    ct+=diff
		 Loop until posx=0

		' at this point posx = 0
    	posSrc=StrPtr(s1)      ' pointer to source

    	If retStr <> 0 Then Deallocate(retStr)		' *** added 

		' allocate enough mem to allow for replacements
		retStr=CAllocate(len(s1)+ct)      ' pointer to destination
    
		posDest=retStr		' set posDest to the beginning of the new memory		

		Do         ' ########## innermost loop ###########
			posPrevious=posx+1      ' set new start (posPrevious = 1 at first) 

			' find the first instance of s2 in s1 starting from position posx 
			posx=Instr(posPrevious, s1, s2)
			if posx <> 0 Then
				diff=posx-posPrevious      ' bytes to copy from source
				memcpy(posDest, posSrc, diff)   ' copy string between matches
				posDest+=diff      ' correct destination (move the destination pointer to just past the copied string)
				posSrc+=diff+l2      ' correct source position, including string to be replaced (move the source to just past the found s2) 
				memcpy(posDest, StrPtr(s3), l3)   ' copy the replace string
				posDest+=l3      ' and correct the destination
				posx+=l2-1      ' same for source
			endif
		Loop until posx=0

   diff=len(s1)-posPrevious+1      ' bytes to copy from source
   ' Print "Rest=";len(s1);"-";posPrevious;"=";diff
   memcpy(posDest, posSrc, diff)
   endif
   print ct; " times ";s2
   return retStr
end function

Function LoadFile(ByRef filename As String) As String    
  Dim h As Integer=FreeFile
  Dim txt As String=""
  If Open(filename For Binary Access Read As #h)=0 Then
   If Lof(h) Then
      txt = String(Lof(h), 0)
      If Get(#h, 0, txt) Then txt = ""
   End If
   Close #h
  EndIf
  Return txt
End Function

Sub SaveFilePtr(ByRef filename As String, ByRef source As zstring ptr)
  Dim written As integer
  Dim h As handle=CreateFile(filename, GENERIC_WRITE, 0, 0, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, 0)
  If h<>INVALID_HANDLE_VALUE Then
   WriteFile(h, source, strlen(source), @written, 0)
   CloseHandle(h)
   print written;" bytes written"
  endif
End Sub

Dim Content As String=LoadFile(testfile)
Dim as double t=timer
Dim ContentNew as zstring ptr=ReplaceAllJ(Content, findstring, replstring)
print using "##.#### seconds for replace all"; timer-t
SaveFilePtr(tempfile, ContentNew)
sleep

I noted your loop to find all occurrences of the string to be replaced in order to size retStr. That slows the process down for sure.
That could probably be replaced by a chunk strategy that first allocates to the size of the original string then adds chunks as needed. I will leave that as well for another time.
In any case your code is very, very fast. Nice work.
jj2007
Posts: 2326
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: Insert/Replace String procedures

Post by jj2007 »

sancho3 wrote:I noted your loop to find all occurrences of the string to be replaced in order to size retStr. That slows the process down for sure. That could probably be replaced by a chunk strategy that first allocates to the size of the original string then adds chunks as needed.
That is possible, not even too difficult, but keep in mind that HeapAlloc & friends cost thousands of cycles. Still, you are right that starting with an oversized allocation and trimming it afterwards once could be a good strategy.
In any case your code is very, very fast. Nice work.
Thanks - it's a port to FB of my Replace$() routine which, btw, has options for case-insensitive and full word search.
WQ1980
Posts: 48
Joined: Sep 25, 2015 12:04
Location: Russia

Re: Insert/Replace String procedures

Post by WQ1980 »

Code: Select all

Dim Content As String
Dim ContentNew As ZString Ptr
Content="D:\Desktop\0.bmp"

ContentNew=ReplaceAllJ(Content, "0", "1")
?*ContentNew '' -> D:\Desktop\1.bmp#Щkи6
returns a damaged string
if len(searchstring) = len(replacestring) ...
Post Reply