In my opinion the tricky part is in the fact that you can't deallocate the memory until after the user has gotten and dealt with the result string. There is no way for the function to know when that happens.
So do we devise a running list of pointers to new memory each time replace is called? I don't like that idea.
My plan is to turn retStr into a static variable within the function. If the pointer <> 0 then it has been allocated prior to this round and can be deallocated. The theory being that the user is finished with that string before calling replace again.
I changed very little in your code. I did have a problem that it didn't seem to be working and so I added a bunch of comments to debug. It turns out that all of my 'declares' in my test file were mixed case ('Declare') and instr wasn't finding them.
That is an issue that needs to be looked at in a proper replaceall function, but it means customizing an instr routine. Too much for me for now.
There are 3 changes in this version noted by ***
Code: Select all
#include "crt.bi" ' *** compile as 32- or 64-bit ***
#include "Windows.bi"
'Dim shared as zstring ptr retStr ***changed
#define testfile "mshtmlc.bi" ' the real thing - over 2MB!
#define tempfile "TmpSaved.txt"
#define findstring "declare" ' some bi files have lowercase declare
#define replstring "! DECLARE !" ' for testing: a longer string
Function ReplaceAllJ(byref s1 as string, byref s2 as const string, byref s3 as const string) as zstring ptr
' in <s1> replace all occurrences of <s2> by <s3>
Static As ZString Ptr retStr = 0 ' *** added
dim as integer posx=0, posPrevious, ct=0, diff, l2=len(s2), l3=len(s3)
dim as any ptr posSrc
dim as any ptr posDest
if s3 <> s2 then
' this block adds to ct the difference in length of strings s3 and s2 each time s2 is found in s1
' this is done to accurately allocate enough room to allow for replacing instances of s2 with s3 if they are not the same size
diff=l3-l2
Do
posx=Instr(posx+1, s1, s2)
ct+=diff
Loop until posx=0
' at this point posx = 0
posSrc=StrPtr(s1) ' pointer to source
If retStr <> 0 Then Deallocate(retStr) ' *** added
' allocate enough mem to allow for replacements
retStr=CAllocate(len(s1)+ct) ' pointer to destination
posDest=retStr ' set posDest to the beginning of the new memory
Do ' ########## innermost loop ###########
posPrevious=posx+1 ' set new start (posPrevious = 1 at first)
' find the first instance of s2 in s1 starting from position posx
posx=Instr(posPrevious, s1, s2)
if posx <> 0 Then
diff=posx-posPrevious ' bytes to copy from source
memcpy(posDest, posSrc, diff) ' copy string between matches
posDest+=diff ' correct destination (move the destination pointer to just past the copied string)
posSrc+=diff+l2 ' correct source position, including string to be replaced (move the source to just past the found s2)
memcpy(posDest, StrPtr(s3), l3) ' copy the replace string
posDest+=l3 ' and correct the destination
posx+=l2-1 ' same for source
endif
Loop until posx=0
diff=len(s1)-posPrevious+1 ' bytes to copy from source
' Print "Rest=";len(s1);"-";posPrevious;"=";diff
memcpy(posDest, posSrc, diff)
endif
print ct; " times ";s2
return retStr
end function
Function LoadFile(ByRef filename As String) As String
Dim h As Integer=FreeFile
Dim txt As String=""
If Open(filename For Binary Access Read As #h)=0 Then
If Lof(h) Then
txt = String(Lof(h), 0)
If Get(#h, 0, txt) Then txt = ""
End If
Close #h
EndIf
Return txt
End Function
Sub SaveFilePtr(ByRef filename As String, ByRef source As zstring ptr)
Dim written As integer
Dim h As handle=CreateFile(filename, GENERIC_WRITE, 0, 0, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, 0)
If h<>INVALID_HANDLE_VALUE Then
WriteFile(h, source, strlen(source), @written, 0)
CloseHandle(h)
print written;" bytes written"
endif
End Sub
Dim Content As String=LoadFile(testfile)
Dim as double t=timer
Dim ContentNew as zstring ptr=ReplaceAllJ(Content, findstring, replstring)
print using "##.#### seconds for replace all"; timer-t
SaveFilePtr(tempfile, ContentNew)
sleep
I noted your loop to find all occurrences of the string to be replaced in order to size retStr. That slows the process down for sure.
That could probably be replaced by a chunk strategy that first allocates to the size of the original string then adds chunks as needed. I will leave that as well for another time.
In any case your code is very, very fast. Nice work.