.
jj2007, if all of the strings that you want to concatenate are already available as literals or in variables, then go ahead and get their lengths and allocate a buffer... just as you suggest.
But that does not work if you have to concatenate many unknown strings being passed from other routines, read from data files, and so on. In this case, it is much more efficient to use a string builder.
If you allocate a new buffer that is double the size of the previous buffer every time you run out of space - you have a simple move to get the contents of the old buffer into the new buffer - and you also have room for adding at least as much data as you already had in the previous buffer.
The whole point of this is to
minimize the number of memory allocations/deallocations that you need to do - because they kill your program's performance if they are done too often. (Which is why this whole thread got started).
So we create a buffer, stuff strings into it until it is (or would be) filled, create another, bigger buffer, stuff the old buffer's contents into the new buffer - and continue - with a minimal amount of allocations.
A string builder stops us from doing:
buf = buf + str1
buf = buf + str2
buf = buf + str3
buf = buf + str4
...
buf = buf + strN
Concatenations cause new allocations/deallocations.
Once is ok - but 1,000 or 1,000,000 times would be very bad.
A string builder uses a buffer and stuffs strings into it:
Code: Select all
B1 = SPACE(30)
S1 = "111111111" 'len = 9
S2 = "2222" 'len = 4
'
'B1 >>> <<<
MID(B1,1,9) = S1
'B1 >>>111111111 <<<
MID(B1,10,4) = S2
'B1 >>>1111111112222 <<<
'
'When strings won't fit, allocate a second, larger string B2 as your
'new buffer...
B2 = SPACE(LEN(B1) * 2)
'MID() the old B1 contents into the B2 buffer...
MID(B2, 1, end_of_data_in_buffer) = LEFT(B1, end_of_data_in_buffer)
'Deallocate your old buffer...
B1 = ""
'and continue
'...
'...
'switch back to B1 the next time the builder needs to grow.
'etc...
'
'when you are done string building...
'grab the *real* string contents out of the buffer you are working in...
myresult_string = MID(Bx, 1, end_of_data_in_buffer)
Moving strings into a large buffer is MUCH more efficient than doing a bunch of concatenations!
If you look at Microsoft's .NET stringbuilder class, you will see that they start with a buffer of 16 bytes, and then double it each time from there. Java's class is very similar, and so is everyone else's.
.NET StringBuilder Class (System.Text)
-- Refer to the "Memory Allocation" section:
https://msdn.microsoft.com/en-us/librar ... 2147217396
You can see from the MS doc that "double the buffer size each time" is just a basic rule of thumb - (one that has worked well for most generic string builder classes for the past 10-20 years). Specific needs should drive the parameters of your string builder if you are looking for the best performance.
I
personally tend to start with a 1k buffer and grow by 4x... but that is because of the
specific data that I work with a lot of the time.
.