Memory allocation study for character data of var-len string

General FreeBASIC programming questions.
Post Reply
fxm
Moderator
Posts: 12081
Joined: Apr 22, 2009 12:46
Location: Paris suburbs, FRANCE

Memory allocation study for character data of var-len string

Post by fxm »

Linked and following this topic (and the posts 8 and 9):
Dynamic arrays in UDTs (once again)


Optimizing the execution speed in case of dynamic memory allocation for character data of var-len string:
- To limit the occurrence of memory allocation calls (which are eating CPU time), every change in the number of required elements, do not fit exactly the size of the allocated memory.
- The memory allocation is done in successive steps (memory allocation per block), so that it can withstand small changes in the number of elements without changing the memory allocation.

FreeBASIC handles variable-length string, using a string descriptor (12 bytes, from @string):
- address pointing to the string character data,
- size actually used by the string,
- string size available in memory.


This small program allows you to monitor the memory management by FreeBASIC for var-len string:
- first half of the window corresponding to each direct string assignment for several different sizes (string = other_string),
- second half of the window corresponding to one string successively continuous inflating by the right (string = string + other_string) then continuous deflating by the right (string = Left(string, N))
We can see that management is not so trivial as that.

Code: Select all

Dim s As String
Dim t(639, 2) As Integer

Screen 12
Print
Color 7
Print " String-data pointer: 'Cast(Integer Ptr, @string)[0]' modulo &h100"
Color 13
Print " Used string size inside allocated memory: 'Cast(Integer Ptr, @string)[1]'"
Color 10
Print " Available string size in allocated memory: 'Cast(Integer Ptr, @string)[2]'"
Print
Color 15
Print "        Direct string assignments          Continuous inflating then deflating"

For k As Integer = 0 To 639
  t(k, 0) = Cast(Integer Ptr, @s)[0]
  t(k, 1) = Cast(Integer Ptr, @s)[1]
  t(k, 2) = Cast(Integer Ptr, @s)[2]
  If k < 159 Then
    s = String(2 * k, 0)
  Elseif k < 319 Then
    s = String(158 * 4 - 2 * k, 0)
  Elseif k < 479 Then
    s = s + String(2, 0)
'    s = s + Chr(0) + Chr(0)
'    s = s + (Chr(0) + Chr(0))
'    s = s + String(2*(k - 318) - Len(s), 0)
'    s = s + String((2*(k - 318) - Len(s)), 0)
'    *@s = *@s + String(2, 0)
  Else
    dim I As Integer = Len(s) - 2
    s = Left(s, I)
  End If
Next K

Pset(0, 479), 8
For k As Integer = 0 To 639
  Line -(k, 479 - (t(k, 0) And &hFF)), 8
Next k

Pset(0, 479), 13
For k As Integer = 0 To 639
  Line -(k, 479 - t(k, 1)), 13
Next k

Pset(0, 479), 10
For k As Integer = 0 To 639
  Line -(k, 479 - t(k, 2)), 10
Next k

Sleep
- The memory allocation is always done in successive steps.
- We can verify than the expression 'string = string + other_string' (continuous inflating by the right) is well optimized by the compiler (one fix pointer during each step of memory size)
- But the expression 'string = Left(string, N)' (continuous deflating by the right) is not optimized by the compiler. Why?

Remarks:
- The expression 'string = string + other_string' (continuous inflating by the right) is well optimized by the compiler only for simple 'other_string' expressions:
for example 'Chr(0)' or 'String(N, 0)', but not for more complex expressions as for example 'String(N - Len(s), 0)'.
(in the program above, we can modify the line 25 with several other proposed syntaxes and see the result on the memory pointer values)
- The compiler optimization works only with direct string variable access, and not with pointer:
*(@string) = *(@string) + String(2, 0) is not optimized by the compiler.
- The compiler optimization works only with direct string variable access, and not with reference:
In a procedure, syntax with string passed by reference is not optimized by the compiler:
a work around is to swap it with a local string, to inflate the local string, then to swap again with the passed reference.


The previous program is modified in order to introduce a procedure 'Inflating_by_the_right' processing the string (passed by reference), and we can test the result on the memory pointer values, using or not using the 'Swap' method:

Code: Select all

Sub Inflating_by_the_right (Byref s As String, Byval k As Integer)
'  s = s + String(k, 0)
  Dim s0 As String
  Swap s, s0
  s0 = s0 + String(k, 0)
  Swap s0, s
End Sub

Dim s As String
Dim t(639, 2) As Integer

Screen 12
Print
Color 7
Print " String-data pointer: 'Cast(Integer Ptr, @string)[0]' modulo &h100"
Color 13
Print " Used string size inside allocated memory: 'Cast(Integer Ptr, @string)[1]'"
Color 10
Print " Available string size in allocated memory: 'Cast(Integer Ptr, @string)[2]'"
Print
Color 15
Print "        Direct string assignments          Continuous inflating then deflating"

For k As Integer = 0 To 639
  t(k, 0) = Cast(Integer Ptr, @s)[0]
  t(k, 1) = Cast(Integer Ptr, @s)[1]
  t(k, 2) = Cast(Integer Ptr, @s)[2]
  If k < 159 Then
    s = String(2 * k, 0)
  Elseif k < 319 Then
    s = String(158 * 4 - 2 * k, 0)
  Elseif k < 479 Then
'    s = s + String(2, 0)
'    s = s + Chr(0) + Chr(0)
'    s = s + (Chr(0) + Chr(0))
'    s = s + String(2*(k - 318) - Len(s), 0)
'    s = s + String((2*(k - 318) - Len(s)), 0)
'    *@s = *@s + String(2, 0)
    Inflating_by_the_right(s, 2)
  Else
    dim I As Integer = Len(s) - 2
    s = Left(s, I)
  End If
Next K

Pset(0, 479), 8
For k As Integer = 0 To 639
  Line -(k, 479 - (t(k, 0) And &hFF)), 8
Next k

Pset(0, 479), 13
For k As Integer = 0 To 639
  Line -(k, 479 - t(k, 1)), 13
Next k

Pset(0, 479), 10
For k As Integer = 0 To 639
  Line -(k, 479 - t(k, 2)), 10
Next k

Sleep
Last edited by fxm on Apr 26, 2012 20:24, edited 1 time in total.
fxm
Moderator
Posts: 12081
Joined: Apr 22, 2009 12:46
Location: Paris suburbs, FRANCE

Re: Memory allocation study for character data of var-len st

Post by fxm »

fxm wrote:- The memory allocation is always done in successive steps.
- We can verify than the expression 'string = string + other_string' (continuous inflating by the right) is well optimized by the compiler (one fix pointer during each step of memory size)
- But the expression 'string = Left(string, N)' (continuous deflating by the right) is not optimized by the compiler. Why?
Small program to enlighten the behavior:

Code: Select all

Dim S As String

Print "Memory allocation for character data of var-len string: 'S = S + String(5, 0)'"
Print " I", " Pointer", " Used data", " Allocated data"
Print Cast(Integer Ptr, @S)[0], Cast(Integer Ptr, @S)[1], Cast(Integer Ptr, @S)[2]
For I As Integer = 1 To 21
  S = S + String(5, 0)
  Print I, Cast(Integer Ptr, @S)[0], Cast(Integer Ptr, @S)[1], Cast(Integer Ptr, @S)[2]
Next I

Sleep

Print
Print "Memory allocation for character data of var-len string: 'S = Left(S, 5 * I)'"  
Print " I", " Pointer", " Used data", " Allocated data"
For I As Integer = 20 To 0 Step -1
  S = Left(S, 5 * I)
  Print I, Cast(Integer Ptr, @S)[0], Cast(Integer Ptr, @S)[1], Cast(Integer Ptr, @S)[2]
Next I

Sleep

Code: Select all

Memory allocation for character data of var-len string: 'S = S + String(5, 0)'
 I             Pointer       Used data     Allocated data
 0             0             0
 1             3353992       5             36
 2             3353992       10            36
 3             3353992       15            36
 4             3353992       20            36
 5             3353992       25            36
 6             3353992       30            36
 7             3353992       35            36
 8             3353992       40            72
 9             3353992       45            72
 10            3353992       50            72
 11            3353992       55            72
 12            3353992       60            72
 13            3353992       65            72
 14            3353992       70            72
 15            3353992       75            108
 16            3353992       80            108
 17            3353992       85            108
 18            3353992       90            108
 19            3353992       95            108
 20            3353992       100           108
 21            3353992       105           108

Memory allocation for character data of var-len string: 'S = Left(S, 5 * I)'
 I             Pointer       Used data     Allocated data
 20            3354112       100           144
 19            3353992       95            108
 18            3354272       90            108
 17            3353992       85            108
 16            3354272       80            108
 15            3353992       75            108
 14            3354272       70            108
 13            3353992       65            108
 12            3354392       60            72
 11            3359248       55            72
 10            3354392       50            72
 9             3359248       45            72
 8             3354392       40            72
 7             3359248       35            72
 6             3353944       30            36
 5             3354480       25            36
 4             3353944       20            36
 3             3354480       15            36
 2             3353944       10            36
 1             3354480       5             36
 0             0             0             0
- string continuously inflating by the right, 'S = S + String(5, 0)': OK
- string continuously deflating by the right, 'S = Left(S, 5 * I)': compiler optimization NOK, the pointer would be constant at least during each step (one value for 108, another for 72, another for 36)

IMHO, the following string syntaxes would be optimized by the compiler:
- Swap string1, string2 : OK
- Mid(string1, K, N) = other_string : OK
- string1 = string1 + other_string : OK, but only for the simplest expressions of other_string
- string1 = Left(string1, N), or string1 = Mid(string1, 1, N) : NOK
and all working not only with local variable but with reference from pointer and passed parameter by reference and member of a type ('This.*').

[Edit]
- Add 'Mid' (statement) which is OK for compiler optimization.
- Add member of a type ('This.*') which is NOK for compiler optimization.
Last edited by fxm on Apr 27, 2012 5:21, edited 2 times in total.
fxm
Moderator
Posts: 12081
Joined: Apr 22, 2009 12:46
Location: Paris suburbs, FRANCE

Re: Memory allocation study for character data of var-len st

Post by fxm »

FreeBASIC code principle to force the execution time optimization, when inflating by the right (if possible) or deflating by the right a var-len string:

Code: Select all

Dim As String s1 = "Hello"
Dim As String s2 = " FreeBASIC!"

Print "String", , " Pointer", " Used data", " Allocated data"
Print
Print "'" & s1 & "'", , Cast(Integer Ptr, @s1)[0], Cast(Integer Ptr, @s1)[1], Cast(Integer Ptr, @s1)[2]
Print "'" & s2 & "'", , Cast(Integer Ptr, @s2)[0], Cast(Integer Ptr, @s2)[1], Cast(Integer Ptr, @s2)[2]
Print

If Cast(Integer Ptr, @s1)[2] - Len(s1) >= Len(s2) Then
  *(Strptr(s1) + Len(s1)) = s2 ' or *@s1[Len(s1)] = s2
  Cast(Integer Ptr, @s1)[1] = Cast(Integer Ptr, @s1)[1] + Len(s2)
End If
Print "'" & s1 & "'", Cast(Integer Ptr, @s1)[0], Cast(Integer Ptr, @s1)[1], Cast(Integer Ptr, @s1)[2]
Print "'" & *Strptr(s1) & "'" ' to verify the end character Chr(0)
Print


Clear s1[Len(s1) - Len(s2)], 0, Len(s2)
Cast(Integer Ptr, @s1)[1] = Cast(Integer Ptr, @s1)[1] - Len(s2)
Print "'" & s1 & "'", , Cast(Integer Ptr, @s1)[0], Cast(Integer Ptr, @s1)[1], Cast(Integer Ptr, @s1)[2]
Print "'" & *Strptr(s1) & "'" ' to verify the end character Chr(0)

Sleep

Code: Select all

String                       Pointer       Used data     Allocated data

'Hello'                      3353952       5             32
' FreeBASIC!'                3354000       11            32

'Hello FreeBASIC!'           3353952       16            32
'Hello FreeBASIC!'

'Hello'                      3353952       5             32
'Hello'
fxm
Moderator
Posts: 12081
Joined: Apr 22, 2009 12:46
Location: Paris suburbs, FRANCE

Re: Memory allocation study for character data of var-len st

Post by fxm »

fxm wrote:The previous program is modified in order to introduce a procedure 'Inflating_by_the_right' processing the string (passed by reference), and we can test the result on the memory pointer values, using or not using the 'Swap' method:
..........
A procedure 'Deflating_by_the_right' is also added in order to optimize the deflating execution time (with compromise between execution time and memory use) instead of the compiler, and we can also test the result on the memory pointer values, using or not using this deflating optimization:

Code: Select all

Sub Inflating_by_the_right (Byref s As String, Byval k As Integer)
'  s = s + String(k, 0)
  Dim s0 As String
  Swap s, s0
  s0 = s0 + String(k, 0)
  Swap s0, s
End Sub

Sub Deflating_by_the_right (Byref s As String, Byval k As Integer)
'  s = Left(s, k)
  If Cast(Integer Ptr, @s)[2] - k <= (36 + k Shr 3) And k > 0 Then
    Clear s[k], 0, Len(s) - k
    Cast(Integer Ptr, @s)[1] = k
  Else
    s = Left(s, k)
  End If
End Sub

Dim s As String
Dim t(639, 2) As Integer

Screen 12
Print
Color 7
Print " String-data pointer: 'Cast(Integer Ptr, @string)[0]' modulo &h100"
Color 13
Print " Used string size inside allocated memory: 'Cast(Integer Ptr, @string)[1]'"
Color 10
Print " Available string size in allocated memory: 'Cast(Integer Ptr, @string)[2]'"
Print
Color 15
Print "        Direct string assignments          Continuous inflating then deflating"

For k As Integer = 0 To 639
  t(k, 0) = Cast(Integer Ptr, @s)[0]
  t(k, 1) = Cast(Integer Ptr, @s)[1]
  t(k, 2) = Cast(Integer Ptr, @s)[2]
  If k < 159 Then
    s = String(2 * k, 0)
  Elseif k < 319 Then
    s = String(158 * 4 - 2 * k, 0)
  Elseif k < 479 Then
'    s = s + String(2, 0)
'    s = s + Chr(0) + Chr(0)
'    s = s + (Chr(0) + Chr(0))
'    s = s + String(2*(k - 318) - Len(s), 0)
'    s = s + String((2*(k - 318) - Len(s)), 0)
'    *@s = *@s + String(2, 0)
    Inflating_by_the_right(s, 2)
  Else
    dim i As Integer = Len(s) - 2
'    s = Left(s, I)
    Deflating_by_the_right(s, i)
  End If
Next K

Pset(0, 479), 8
For k As Integer = 0 To 639
  Line -(k, 479 - (t(k, 0) And &hFF)), 8
Next k

Pset(0, 479), 13
For k As Integer = 0 To 639
  Line -(k, 479 - t(k, 1)), 13
Next k

Pset(0, 479), 10
For k As Integer = 0 To 639
  Line -(k, 479 - t(k, 2)), 10
Next k

Sleep
Note about the deflation optimization above:
The condition 'Cast(Integer Ptr, @s)[2] - k <= (36 + k Shr 3) And k > 0' (with k = requested length) allows to be consistent with the observed string memory allocation management.
In case of inflating par step of one character, I studied the memory margin step and it seems verify the formula:
margin memory step = 36 * (1 + Int(K / 288))
The formula '36 + k Shr 3' is a simpler linear approximation by greater value.
Last edited by fxm on Apr 26, 2012 20:25, edited 1 time in total.
fxm
Moderator
Posts: 12081
Joined: Apr 22, 2009 12:46
Location: Paris suburbs, FRANCE

Re: Memory allocation study for character data of var-len st

Post by fxm »

Summary of the bad compiler behavior about the optimization of var-len string instructions:

IMHO, the following var-len string syntaxes would be always optimized by the compiler:
- Swap string1, string2 : OK
- Mid(string1, K, N) = other_string : OK
- string1 = string1 + other_string : OK, but only for the simplest expressions of other_string
- string1 = Left(string1, N), or string1 = Mid(string1, 1, N) : NOK
- and all theses syntaxes should work not only with direct var-len string variable but also with reference from string pointer and passed string parameter by reference and member of a type ('This.*').

@Admin,
Is that this bad behavior should be considered as a bug or should induce a feature request?

I proposed temporarily these two small subroutines as a work around to always ensure the execution time optimization on var-len string, instead of the instruction 's = s + String(k, 0)' and the instruction 's = Left(s, k)' or 's = Mid(s, 1, k)':

Code: Select all

Sub Inflating_by_the_right (Byref s As String, Byval k As Integer)
'----- Initial code -----
'  s = s + String(k, 0)
'------------------------
'----- Work around -----
  Dim s0 As String
  Swap s, s0
  s0 = s0 + String(k, 0)
  Swap s0, s
'-----------------------
End Sub

Sub Deflating_by_the_right (Byref s As String, Byval k As Integer)
'----- Initial code -----
'  s = Left(s, k) ' or s = Mid(s, 1, k)
'------------------------
'----- Work around -----
  If Cast(Integer Ptr, @s)[2] - k <= (36 + k Shr 3) And k > 0 Then
    Clear s[k], 0, Len(s) - k
    Cast(Integer Ptr, @s)[1] = k
  Else
    s = Left(s, k)
  End If
'-----------------------
End Sub
[Edit]
- Add 'Mid' (statement) which is OK for compiler optimization.
- Add member of a type ('This.*') which is NOK for compiler optimization.
Last edited by fxm on Apr 28, 2012 19:56, edited 4 times in total.
dkl
Site Admin
Posts: 3235
Joined: Jul 28, 2005 14:45
Location: Germany

Re: Memory allocation study for character data of var-len st

Post by dkl »

Things like changing s = left( s, N ) to some form of SelfLeft( s, N ) aren't yet implemented in the compiler, but they'd be really nice to have. Perhaps even prepending strings could be optimized to work without a temporary string, like appending strings. Not to mention compile-time evaluation in case of constants, and so on...
fxm
Moderator
Posts: 12081
Joined: Apr 22, 2009 12:46
Location: Paris suburbs, FRANCE

Re: Memory allocation study for character data of var-len st

Post by fxm »

dkl wrote:Things like changing s = left( s, N ) to some form of SelfLeft( s, N ) aren't yet implemented in the compiler, but they'd be really nice to have. Perhaps even prepending strings could be optimized to work without a temporary string, like appending strings. Not to mention compile-time evaluation in case of constants, and so on...
OK for that you name 'SelfLeft( s, N)'
=> feature request!

And for the other problems about 'string = string + other_string' (see previous posts).
No optimization when:
- 'other_string' is not a very simple expression,
- or 'string' is a reference from a string pointer or a procedure parameter passed by reference or a member of a type ('This.*').
=> bug report?
fxm
Moderator
Posts: 12081
Joined: Apr 22, 2009 12:46
Location: Paris suburbs, FRANCE

Re: Memory allocation study for character data of var-len st

Post by fxm »

fxm wrote:I proposed temporarily a work around to always ensure the execution time optimization on var-len string, instead of the instruction 's = s + String(k, 0)' and the instruction 's = Left(s, k)' or 's = Mid(s, 1, k)'.
- An enhanced execution time improvement is already obtained over a sequence of string-memory allocation, firstly continuously increasing and finally continuously decreasing:

Code: Select all

Type string_memory
  Declare Function CReallocate_basic (Byval byte_size As Integer) As Any Ptr
  Declare Function CReallocate_improved (Byval byte_size As Integer) As Any Ptr
  Dim As String S
End Type

Function string_memory.CReallocate_basic (Byval byte_size As Integer) As Any Ptr
  Dim As Integer deltaS = byte_size - Len(This.S)
'----- Initial code -----
  If deltaS > 0 Then
    This.S = This.S + String(deltaS, 0)
  ElseIf deltaS < 0 Then
    This.S = Left(This.S, byte_size)
  End If
'------------------------
  Function = Strptr(This.S)
End Function

Function string_memory.CReallocate_improved (Byval byte_size As Integer) As Any Ptr
  Dim As Integer deltaS = byte_size - Len(This.S)
'----- Work around -----
  If deltaS > 0 Then
    Dim S0 As String
    Swap This.S, S0
    S0 = S0 + String(deltaS, 0)
    Swap S0, This.S
  Elseif deltaS < 0 Then
    If Cast(Integer Ptr, @This.S)[2] - byte_size <= (36 + byte_size Shr 3) And byte_size > 0 Then
      Clear This.S[byte_size], 0, Len(This.S) - byte_size
      Cast(Integer Ptr, @This.S)[1] = byte_size
    Else
      This.S = Left(This.S, byte_size)
    End If
  End If
'-----------------------
  Function = Strptr(This.S)
End Function



Dim SM As string_memory
Dim bp As Byte Ptr
Dim t0 As Double

Print "String-memory allocation by step of 1 byte,"
Print " from 1 byte to 128KB,"
Print " then from 128KB to 0 byte."
Print

Print "- Using the member function 'CReallocate_basic()'"
sleep 1000
t0 = Timer
For K As Integer = 1 To 128 * 1024
  bp = SM.CReallocate_basic(K)
Next K
Print using "##.#####s"; Timer - t0
sleep 1000
t0 = Timer
For K As Integer = 128 * 1024 - 1 To 0 Step -1
  bp = SM.CReallocate_basic(K)
Next K
Print using "##.#####s"; Timer - t0
Print

Print "- Using the member function 'CReallocate_improved()'"
sleep 1000
t0 = Timer
For K As Integer = 1 To 128 * 1024
  bp = SM.CReallocate_improved(K)
Next K
Print using "##.#####s"; Timer - t0
sleep 1000
t0 = Timer
For K As Integer = 128 * 1024 - 1 To 0 Step -1
  bp = SM.CReallocate_improved(K)
Next K
Print using "##.#####s"; Timer - t0
Print

Sleep

Code: Select all

String-memory allocation by step of 1 byte,
 from 1 byte to 128KB,
 then from 128KB to 0 byte.

- Using the member function 'CReallocate_basic()'
 1.74086s
 1.69856s

- Using the member function 'CReallocate_improved()'
 0.04106s
 0.05427s
- For a size of string-memory allocation little varying around a mean functioning point, the execution time improvement could be still greater.
fxm
Moderator
Posts: 12081
Joined: Apr 22, 2009 12:46
Location: Paris suburbs, FRANCE

Re: Memory allocation study for character data of var-len st

Post by fxm »

fxm wrote:
dkl wrote:Things like changing s = left( s, N ) to some form of SelfLeft( s, N ) aren't yet implemented in the compiler, but they'd be really nice to have. Perhaps even prepending strings could be optimized to work without a temporary string, like appending strings. Not to mention compile-time evaluation in case of constants, and so on...
OK for that you name 'SelfLeft( s, N)'
=> feature request!

And for the other problems about 'string = string + other_string' (see previous posts).
No optimization when:
- 'other_string' is not a very simple expression,
- or 'string' is a reference from a string pointer or a procedure parameter passed by reference or a member of a type ('This.*').
=> bug report?
For all these problems, feature request filled: Execution speed optimization of var-len string instructions
coderJeff
Site Admin
Posts: 4313
Joined: Nov 04, 2005 14:23
Location: Ontario, Canada
Contact:

Re: Memory allocation study for character data of var-len st

Post by coderJeff »

fxm wrote:No optimization when:
- 'other_string' is not a very simple expression,
- or 'string' is a reference from a string pointer or a procedure parameter passed by reference or a member of a type ('This.*').
=> bug report?
----
Here are a few details on what fbc is doing under the hood:

When fbc allocates (initializes) a string, it allocates just enough space rounded up to next multiple of 32 bytes.
When fbc assigns a string, if the old length and new length are not the same, a new string is allocated and the old one deleted.
When fbc concatenates 2 strings, it (re)allocates enough space for both strings rounded up to next multiple of 32 bytes PLUS 12.5%

----
Simple concatenations, that are not self-referencing, for example:

Code: Select all

dim as string s, x, y, z
s += x + y + z
Optimizes to:

Code: Select all

s += x     '' use remaining 12.5%, otherwise reallocate len(s) + len(x) + 12.5%     
s += y     '' use remaining 12.5%, otherwise reallocate len(s) + len(y) + 12.5% 
s += z     '' use remaining 12.5%, otherwise reallocate len(s) + len(z) + 12.5%  
Initially, len(s) will be zero, if it wasn't set to any other string first.

----
From other thread Speed issue with string concatenation and a solution

For this case:

Code: Select all

sub adding(sg as string)
   main_string += sg
end sub
main_string could be passed in sg, which would be same as main_string += main_string. main_string can't be reallocated and also be valid, so a temporary is always used for parameters:

Code: Select all

sub adding(sg as string)
	var tmp = main_string + sg '' allocate + copy + concat
    main_string = tmp          '' delete + allocate + copy
end sub
To optimize this case, need to change fbc to allow the optimization as in the simple case, and also add a check in rtlib to determine if we are concatenating a string to itself, in which case a temporary allocation must be used.

----
So, if sg is often a small length, can reduce the number of allocations with this variation:

Code: Select all

sub adding(sg as string)
   dim as string local_sg = sg '' allocate + copy
   main_string += local_sg     '' concat using remaining 12.5%, otherwise reallocate + concat
end sub
fxm
Moderator
Posts: 12081
Joined: Apr 22, 2009 12:46
Location: Paris suburbs, FRANCE

Re: Memory allocation study for character data of var-len st

Post by fxm »

coderJeff wrote: From other thread Speed issue with string concatenation and a solution

For this case:

Code: Select all

sub adding(sg as string)
   main_string += sg
end sub
main_string could be passed in sg, which would be same as main_string += main_string. main_string can't be reallocated and also be valid, so a temporary is always used for parameters:

Code: Select all

sub adding(sg as string)
	var tmp = main_string + sg '' allocate + copy + concat
    main_string = tmp          '' delete + allocate + copy
end sub
To optimize this case, need to change fbc to allow the optimization as in the simple case, and also add a check in rtlib to determine if we are concatenating a string to itself, in which case a temporary allocation must be used.
I do not understand.
In the principle below of my own procedure "adding1()" which considerably improves the execution time (see viewtopic.php?p=259962#p259962), I never test the case of self-concatenation, and yet it also works in this case:

Code: Select all

dim shared as string main_string

sub adding(sg as string)
   main_string += sg
end sub

sub adding1(sg as string)
   if cptr(integer ptr, @main_string)[2] > cptr(integer ptr, @main_string)[1] + len(sg) then
      for i as integer = 0 to len(sg) - 1
         cptr(ubyte ptr ptr, @main_string)[0][cptr(integer ptr, @main_string)[1] + i] = sg[i]
      next i
      cptr(integer ptr, @main_string)[1] += len(sg)
      print "no reallocation"
   else
      var tmp = main_string + sg '' allocate + copy + concat
      main_string = tmp          '' delete + allocate + copy
      print "reallocation"
   end if
end sub

main_string = ""
main_string = "12345"
for i as integer = 1 to 5
  adding(main_string)
next i
print main_string
print

main_string = ""
main_string = "12345"
for i as integer = 1 to 5
  adding1(main_string)
next i
print main_string
coderJeff
Site Admin
Posts: 4313
Joined: Nov 04, 2005 14:23
Location: Ontario, Canada
Contact:

Re: Memory allocation study for character data of var-len string

Post by coderJeff »

For the 'a += b' string optimizations, I think the goals are as follows:
1) zero or one (re)allocation of memory for each operation
2) With combined concat+assign '+=', add an extra buffer (12.5%) for string building

The rtlib currently has 2 functions available:
1) (a + b), concatenate a and b and return in a temporary string
2) a += b, concatenate b on to a, using/creating a 12.5% extra buffer; required is strptr(a)<>strptr(b).

----
I will start again with this:

Code: Select all

dim shared as string m
sub m_concat(a as string)
	m += a
end sub
fbc does not well optimize m += a because of the byref paramater. If 'm' and 'a' are same data, fbc does not have an optimized solution for self referencing concatenation, so instead, it uses a safer version that always allocates a new buffer.

fxm, you are correct; testing for self referencing is not specifically needed. Your check based on sizes indirectly guards against the self referencing problem.

----
Continuing the investigation, due the byref parameter, fbc generates this equivalent code:

Code: Select all

dim shared as string m
sub m_concat(a as string)
	'' m += a 
	var {t} = m + a  '' allocate, copy, copy 
	m = {t}          '' copy temp descriptor, delete 
end sub
My earlier post was incorrect about the low-level operations. I confused myself about what code I wrote and what code fbc creates. Because the temporary variable {t} is created by fbc itself, only the descriptor is copied to 'm' on assignment, and the original m is deleted.

The problem now is due the generic/temporary concatenate: which does not allocate an extra 12.5% in the buffer. Therefore a reallocation occurs on every usage.

----
In this workaround:

Code: Select all

dim shared as string m
sub m_concat(a as string)
	var t = a      '' allocate, copy
	m += t         '' concat using remaining 12.5%, otherwise reallocate
end sub
It's slightly better, with 1 or 2 allocations max. Ideally, we'd like 0 or 1 allocations.

----
HACK: going back to the idea of testing pointers, we can work around the byref parameter problem partially by manipulating the string descriptors (won't work with a const byref string). And it's only safe if strptr(m) <> strptr(a):

Code: Select all

dim shared as string m
sub m_concat(a as string)
	if( strptr(a) <> strptr(m) ) then
		dim as string t  '' null descriptor on stack
		swap t, a        '' swap descriptors only
		m += t           '' concat using remaining 12.5%, otherwise reallocate
		swap t, a        '' swap descriptors back
	else
		'' 1 or 2 allocations 
		var t = a      '' allocate, copy
		m += t         '' concat using remaining 12.5%, otherwise reallocate		
	end if 
end sub   
Still not fully optimized, but the usage of 'm += m' does not seem like a common use case.

----
So where that leaves us is:
- fbc needs changes to allow optimization of byref parameters 'a += b'
- rtlib needs logic changes in existing concat+assign, or a new separate function, to determine if the operation needs a temporary copy or not to complete the operation
Post Reply