Weird behavior in fixed length strings
Re: Weird behavior in fixed length strings
Marcov
Even the string datatype has this capability in freepascal.
code
var g:string[80]=chr(0)+'Alpha'+chr(0)+'Beta';
var g2:string[80];
begin
writeln(ord(g[1]),' ',g);
writeln(length(g));
g2:=chr(0)+'Alpha'+chr(0)+'Beta';
writeln(ord(g2[1]),' ',g2);
end.
result
0 Alpha Beta
11
0 Alpha Beta
So what do you suggest for FreeBASIC?
It would be handy for FreeBASIC to have bare strings (without the null terminator), for the sake of loading from external files if nothing else, but what a load of work I reckon for another datatype.
Even the string datatype has this capability in freepascal.
code
var g:string[80]=chr(0)+'Alpha'+chr(0)+'Beta';
var g2:string[80];
begin
writeln(ord(g[1]),' ',g);
writeln(length(g));
g2:=chr(0)+'Alpha'+chr(0)+'Beta';
writeln(ord(g2[1]),' ',g2);
end.
result
0 Alpha Beta
11
0 Alpha Beta
So what do you suggest for FreeBASIC?
It would be handy for FreeBASIC to have bare strings (without the null terminator), for the sake of loading from external files if nothing else, but what a load of work I reckon for another datatype.
Re: Weird behavior in fixed length strings
Pascal never was zero terminated. Afaik the ancient convention (before string became a built in type) was to backwards space pad an allocation string, making it impossible to end a string with a space. That limitation was worked around with keeping a separate length and that evolved in to string types with separate length. Pascal successor Modula2 does use zero termination, except when it exactly fits in an allocation. (more C str-n function like)
The best thing to do? Implement all Delphi/FPC string types of course. I've made a list: http://www.stack.nl/~marcov/delphistringtypes.txt
Seriously however, it depends. Multiple languages make multiple choices for other reasons.
C is a bit double in that there are multiple kinds of uses. For instance embedded with a lot of static buffers and the more modern all dynamic allocated and str-n-function using as advocated by GNU, Microsoft and other vendors to get rid of the buffer overflows.
If you go for something close to C I would go for the latter, and do away with the former on all non embedded targets. (and rule of thumb: anything with over 64-128kb RAM is not embedded).
Then there is the C++ model that sacrifices everything to keep string a library concept, with high flexibility for an implementer, but makes its usage multi faceted and complex.
Finally there are languages like Delphi that simply makes string a distinct type. Together with some char* compatibility for easy interfacing. But then they ran into evolving notions about string types as described in the above URL.
Anyway, first you need to make choices, but since FB already has a language (rather than library) type, it is hard to beat the dynamically allocated "string" types of Delphi (ansi/wide/unicodestring). Fairly efficient, very easy, very compatible. I don't see why not.
-
- Posts: 4313
- Joined: Jan 02, 2017 0:34
- Location: UK
- Contact:
Re: Weird behavior in fixed length strings
There is another approach to the above.
Add some notes to the String topic of the manual to highlight the difference between angros47's first code snippet, in the opening post, and the second code snippet. We will have then a Dim assignment versus a separate assignment; the first will keep any Chr(0) and the second will have any Chr(0) stripped out.
The two methods can then be treated as a feature of fixed length strings as if that was intentional, but the manual did not mention it.
Add some notes to the String topic of the manual to highlight the difference between angros47's first code snippet, in the opening post, and the second code snippet. We will have then a Dim assignment versus a separate assignment; the first will keep any Chr(0) and the second will have any Chr(0) stripped out.
The two methods can then be treated as a feature of fixed length strings as if that was intentional, but the manual did not mention it.
Re: Weird behavior in fixed length strings
I cannot find IIf(InStr(s, Chr(0)) > 0, InStr(s, Chr(0)) - 1, Len(s)), in the .chm anyway.SARG wrote: ↑Jul 08, 2022 13:24this seems to be the best solution.
From the manual :
Note: For the fixed-length string type only (QB-style fixed-length string), the 'Len()' keyword always returns the declared constant number of characters, regardless of the number of characters assigned to it by user.
(hence the formula: 'user_characters_length = IIf(InStr(s, Chr(0)) > 0, InStr(s, Chr(0)) - 1, Len(s))')
Code: Select all
dim s as string*80=chr(0)+"Alpha"+chr(0)+"Beta" print "lenght=";IIf(InStr(s, Chr(0)) > 0, InStr(s, Chr(0)) - 1, Len(s));" !!!!" print s sleep
You could do it this way
Code: Select all
namespace __zz__
dim as string _x_
#define length(s) instrrev((s),any __zz__._x_)
sub __set__ constructor
for n as long=1 to 255
__zz__._x_+=chr(n)
next
end sub
end namespace
'-------------------------------------------------------------
dim s as string*80 =chr(0)+"Alpha"+chr(0)+"Beta"
print length(s),len(s)
sleep
Re: Weird behavior in fixed length strings
Search the page : Strings (string, zstring, and wstring)
or https://www.freebasic.net/wiki/ProPgStringsTypes
Re: Weird behavior in fixed length strings
Glad my pages added to the Programmer's Guide are being used!
Re: Weird behavior in fixed length strings
Sry, I didn't read the whole thread. Just something to keep in mind when writing tests to prove out fbc and/or runtime is correct or wrong: please do write tests that check both optimizations (constant folding) and runtime evaluation (expressions). Across multiple string types, we also have to contend with fbc compiler optimizing certain expressions before it generates code for runtime. Ideally, should be consistent across all modes, but as probably someone will demonstrate, we don't always get that.
Re: Weird behavior in fixed length strings
Note: It is not recommended to use explicit NULL character (chr(0)) in a string expression involving a fixed length string variable because this can lead to different unexpected results depending on usage context (initialization, assignment, concatenation, ...).
- STRING documentation page updated with this above warning:
KeyPgString → fxm [added warning about using a null character (Chr(0)) in an expression involving a fix-len string]
Demonstrative example with fix-len string (different unexpected results depending on usage context):
Code: Select all
Dim s1 As String * 20 = "Alpha" + Chr(0) + "Beta"
Print s1
Print "'" & s1 & "'"
Print
Dim As String * 20 s2
s2 = "Alpha" + Chr(0) + "Beta"
Print s2
Print
Dim As String * 20 s3 = s1
Print s3
Print
Dim As String s = s1
Print s
Sleep
Code: Select all
Alpha Beta
'Alpha'
AlphaBeta
Alpha
Alpha
No problem with (fix-len) zstring:
Code: Select all
Dim z1 As Zstring * 20 = "Alpha" + Chr(0) + "Beta"
Print z1
Print "'" & z1 & "'"
Print
Dim As Zstring * 20 z2
z2 = "Alpha" + Chr(0) + "Beta"
Print z2
Print
Dim As Zstring * 20 z3 = z1
Print z3
Print
Dim As String s = z1
Print s
Sleep
Code: Select all
Alpha
'Alpha'
Alpha
Alpha
Alpha
[edit]
Bug report filled in:
966 Weird behavior in fix-len string (Dim As String * N ...) if explicit NULL character (Chr(0)) used in a string expression
Last edited by fxm on Jul 14, 2022 8:58, edited 1 time in total.
Reason: Updated.
Reason: Updated.
-
- Posts: 538
- Joined: Dec 02, 2011 22:51
- Location: France
Re: Weird behavior in fixed length strings
Some more fun
Code: Select all
Dim s1 as string *20="Alpha"+Chr(0)+"Beta"
Dim s2 as string *20= Left(s1, len(s1))
'Dim s3 as string = s1 ' var strings assigments from fixed strings follow zstrings rules
Dim s3 as string = Left(s1, len(s1))
Dim zs4 As zstring ptr= Allocate(40)
*zs4="Alpha"+Chr(0)+"Beta"
Dim s5 as string
' Dim zs6 as zstring ptr=@s1
? s1 ' Alpha Beta
? s2 ' Alpha Beta
? s3 ' Alpha Beta
? *zs4 ' Alpha
? "--------------------------"
? *@s1 & "%" ' Alpha
? *@s2 & "%" ' Alpha
? *@s3 & "%" ' Alpha Beta
? "--------------------------"
? Left(s1, len(s1) ) & "%" & len(s1) ' Alpha Beta + spaces
? Left(s2, len(s2) ) & "%" & len(s2) ' Alpha Beta + spaces
? Left(s3, len(s3) ) & "%" & len(s3) ' Alpha Beta + spaces
? "So many thanks, Left"
? "--------------------------"
s1="Burp"
? Left(s1, len(s1) ) & "%" & len(s1) & " wouldn't be so much better logic and fast not clearing right places on fixed lenght ?" ' Burp - fixed strings assigments follow zstrings rules + clearing right memory places - Why not just say Fixed strings follow zstrings rules ?
s1="B" & Chr(0) & "rp"
? Left(s1, len(s1) ) & "%" & len(s1) ' Brp - fixed strings assigments follow zstrings rules except for exceptions (chr(0) as gone)
s5="B" & Chr(0) & "r"
? "Printed " & s5 & "p" ' B rp - Just to be sure what is printed
s1= s5 & "p"
' *zs6=s5 & "p" 'Check !
' s1= Left(s5, len(s5)) & "p" 'Check !
' s1= Left(s5 & "p", len(s5)+1) 'Check !
? "Assigned " & Left(s1, len(s1)-9 ) & "%" & len(s1) & " Hao ? Where is my 'r' ? By what right did you kill my chr(0) ? So much refining here." ' Bp
s3="B" & Chr(0) & "rp"
? Left(s3, len(s1) ) & "%" & len(s3) & space(18) & "Would you like a cup of tea ?" ' B rp - var strings assigments, string len updated
Deallocate zs4
sleep
Re: Weird behavior in fixed length strings
I agree Lost Zergling
A fixed length string should accept chr(0) (and no hidden chr(0) at the end).
Maybe another data type is needed.
Otherwise you must do a fake cast to string at each implied fixed length string.
Which is an unwholesome hack
A fixed length string should accept chr(0) (and no hidden chr(0) at the end).
Maybe another data type is needed.
Otherwise you must do a fake cast to string at each implied fixed length string.
Code: Select all
namespace whatever
#define Cstr(s) mid((s),1,length(s))
dim as string _x_
#define length(s) instrrev((s),any whatever._x_)
sub setX constructor
for n as long=1 to 255
_x_+=chr(n)
next
end sub
end namespace
Dim s1 As String * 20 = "Alpha" + Chr(0) + "Beta"
Print Cstr(s1),"length = ";length(s1)
print
Print "'" & Cstr(s1) & "'","length = ";length("'" & Cstr(s1) & "'")
Print
Dim As String * 20 s2
s2 = Cstr("Alpha" + Chr(0) + "Beta")
Print Cstr(s2),"length = ";length(s2)
Print
Dim As String * 20 s3 = Cstr(s1)
Print Cstr(s3),"length = ";length(s3)
Print
Dim As String s = Cstr(s1)
Print s,"length = ";length(s)
Sleep
Re: Weird behavior in fixed length strings
It wouldn't be too difficult to rustle up a raw ubyte string.
For fun
For fun
Code: Select all
#cmdline "-exx"
Type rawstring
Declare Constructor
Declare Constructor(As String)
Declare Constructor(As Long)
Declare Operator Let(As String)
Declare Operator Cast() As String
Declare Operator [](As Long) As Ubyte
As Ubyte u(Any)
End Type
Constructor rawstring
End Constructor
Constructor rawstring(s As String)
Redim u(Len(s)-1)
For n As Long=0 To Len(s)-1
u(n)=s[n]
Next
End Constructor
Constructor rawstring(L As Long)
Redim u(L-1)
End Constructor
Operator rawstring.let(s As String)
this.constructor(s)
End Operator
Operator rawstring.cast() As String
Dim As String g
For n As Long=0 To Ubound(u)
g+=Chr(u(n))
Next n
Return g
End Operator
Function RawLen(x As rawstring) As Long
Return Ubound(x.u)+1
End Function
Operator rawstring.[](s As Long) As Ubyte
Return u(s)
End Operator
Operator Len(x As rawstring) As Long
Return Ubound(x.u)+1
End Operator
Operator +(a As rawstring,b As rawstring) As rawstring
Dim As String aa=a,bb=b
Dim As rawstring ret=aa+bb
Return ret
End Operator
'================================
Dim As rawstring g
g="123456"+chr(0)+"abcde"
Print "Rawstring and length ", g,Len(g)
Print "Mid(g,5,4) ",Mid(g,5,4)
Mid(*cast(string ptr,@g),4,2)="XX"
Print "Mid(g,4,2)= XX " , g
Dim As rawstring f=(5)
Print "Length of empty rawstring ",Len(f)
Dim As rawstring r="press "+"any "+"key "+"to "+"finish"+". . ."
For n As Long=0 To Len(r)-1
Print Chr(r[n]);
Next
Print
Sleep
Re: Weird behavior in fixed length strings
Small note on above code
In general, calling a constructor on an instance ('instance.constructor(...)' or 'this.constructor(...)') is not recommended because member objects are not properly destroyed before they are reconstructed.
In your example, array elements are not properly deallocated before reallocation (see example below where unnecessary new element address appears while element count is unchanged) inducing memory leak:
An inelegant workaround is to call the destructor on the object just before the constructor:
But the better solution (IMHO) is that the user copy-construction code is reported in the Let operator body and the copy-constructor code calls the Let operator ('this = s'):
In general, calling a constructor on an instance ('instance.constructor(...)' or 'this.constructor(...)') is not recommended because member objects are not properly destroyed before they are reconstructed.
In your example, array elements are not properly deallocated before reallocation (see example below where unnecessary new element address appears while element count is unchanged) inducing memory leak:
Code: Select all
Type rawstring
Declare Constructor(As String)
Declare Operator Let(As String)
As Ubyte u(Any)
End Type
Constructor rawstring(s As String)
Redim u(Len(s)-1)
For n As Long=0 To Len(s)-1
u(n)=s[n]
Next
Print "constructor(as string)", @this, @u(0)
End Constructor
Operator rawstring.let(s As String)
this.constructor(s)
Print "operator let(as string)", @this, @u(0)
End Operator
'================================
Dim As rawstring g = "abcd"
g="1234"
Sleep
Code: Select all
constructor(as string) 1703552 11473656
constructor(as string) 1703552 11473704
operator let(as string) 1703552 11473704
An inelegant workaround is to call the destructor on the object just before the constructor:
Code: Select all
Type rawstring
Declare Constructor(As String)
Declare Operator Let(As String)
As Ubyte u(Any)
End Type
Constructor rawstring(s As String)
Redim u(Len(s)-1)
For n As Long=0 To Len(s)-1
u(n)=s[n]
Next
Print "constructor(as string)", @this, @u(0)
End Constructor
Operator rawstring.let(s As String)
this.destructor()
this.constructor(s)
Print "operator let(as string)", @this, @u(0)
End Operator
'================================
Dim As rawstring g = "abcd"
g="1234"
Sleep
Code: Select all
constructor(as string) 1703552 4395768
constructor(as string) 1703552 4395768
operator let(as string) 1703552 4395768
But the better solution (IMHO) is that the user copy-construction code is reported in the Let operator body and the copy-constructor code calls the Let operator ('this = s'):
Code: Select all
Type rawstring
Declare Constructor(As String)
Declare Operator Let(As String)
As Ubyte u(Any)
End Type
Constructor rawstring(s As String)
This = s
Print "constructor(as string)", @this, @u(0)
End Constructor
Operator rawstring.let(s As String)
Redim u(Len(s)-1)
For n As Long=0 To Len(s)-1
u(n)=s[n]
Next
Print "operator let(as string)", @this, @u(0)
End Operator
'================================
Dim As rawstring g = "abcd"
g="1234"
Sleep
Code: Select all
operator let(as string) 1703552 6755064
constructor(as string) 1703552 6755064
operator let(as string) 1703552 6755064
Re: Weird behavior in fixed length strings
Thanks fxm.
Yes the best let operator is just do the same as the constructor, copy and paste those few lines.
I was taking a bit of a shortcut.
Of course, with this simple rawstring, there is the task of saving to file and retrieving from file.
It does it OK if I save the file from fb and load the file from fb (Same as fixed length strings).
I can use the crt to be more specific about block loading and saving without those warnings with put and get.
But loading external files is a different story.
Anyway, I think that a new data type (raw string without hidden chr(0)) might be useful.
Passing this type to subs just means passing the size as well, like a pointer.
I'll leave it at that (going off topic).
Yes the best let operator is just do the same as the constructor, copy and paste those few lines.
I was taking a bit of a shortcut.
Of course, with this simple rawstring, there is the task of saving to file and retrieving from file.
It does it OK if I save the file from fb and load the file from fb (Same as fixed length strings).
I can use the crt to be more specific about block loading and saving without those warnings with put and get.
But loading external files is a different story.
Anyway, I think that a new data type (raw string without hidden chr(0)) might be useful.
Passing this type to subs just means passing the size as well, like a pointer.
I'll leave it at that (going off topic).
-
- Posts: 538
- Joined: Dec 02, 2011 22:51
- Location: France
Re: Weird behavior in fixed length strings
I respectfully give my opinion. I think fixed-length strings are interesting nonetheless, but they have a few weaknesses that fixing them could significantly improve usability. First of all, it seems to me that, since we're using a Chr(0) as a legacy terminator, there's no point in resetting the entire fixed-length memory area to blank (1) and I seems to lose one of the performance benefits of choosing a string terminator.
(1) : perhaps reset could be stopped till previous chr(0) is 'purged', but not sure might be faster, ..
Secondly, assignment to a fixed-length string from a variable-length string should be able to be done by respecting the rules of the variable-length string, i.e. by allowing the assignment of chr(0) and without truncation before the end of the memory slot of the fixed-length string, i.e.:
MyFixedString=MyVarString would produce the same results as
MyFixedString=Left(MyVarString, len(MyVarString))
Moreover 'MyFixedString=MyVarString' is currently redundant with 'MyFixedString=*@MyVarString'
It may therefore be possible to improve both functionality and performance by simply playing on these two points, perhaps minor ones.
For fixed length strings, I think, conceptually, the programmer should equate the difference between the effective size of the string (if less than the memory location), and the size of the memory location (buffer) . Wanting to make this problem transparent seems to have to lead to loss of functionalities and/or additional complexities in the overall consistency (MyFixedString could also return MyFixedString because fixed len is available, *@MyFixedString would be conceptually distinct)
Using a Chr(0) as a legacy terminator will still have some weaknesses, however:
- It is up to the user to manage the Chr(0) in the data (but an InstrRev for reading a fixed string could partially circumvent this point)
- The data cannot end with a chr(0) (confusion with the end of string character), so it is still not completely neutral.
- From a certain length, the fixed strings managed by a chr(0) will probably be less efficient than those managed by information on the size, reason why the question of the relevance of a true third type of string is not completely unfounded.
(1) : perhaps reset could be stopped till previous chr(0) is 'purged', but not sure might be faster, ..
Secondly, assignment to a fixed-length string from a variable-length string should be able to be done by respecting the rules of the variable-length string, i.e. by allowing the assignment of chr(0) and without truncation before the end of the memory slot of the fixed-length string, i.e.:
MyFixedString=MyVarString would produce the same results as
MyFixedString=Left(MyVarString, len(MyVarString))
Moreover 'MyFixedString=MyVarString' is currently redundant with 'MyFixedString=*@MyVarString'
It may therefore be possible to improve both functionality and performance by simply playing on these two points, perhaps minor ones.
For fixed length strings, I think, conceptually, the programmer should equate the difference between the effective size of the string (if less than the memory location), and the size of the memory location (buffer) . Wanting to make this problem transparent seems to have to lead to loss of functionalities and/or additional complexities in the overall consistency (MyFixedString could also return MyFixedString because fixed len is available, *@MyFixedString would be conceptually distinct)
Using a Chr(0) as a legacy terminator will still have some weaknesses, however:
- It is up to the user to manage the Chr(0) in the data (but an InstrRev for reading a fixed string could partially circumvent this point)
- The data cannot end with a chr(0) (confusion with the end of string character), so it is still not completely neutral.
- From a certain length, the fixed strings managed by a chr(0) will probably be less efficient than those managed by information on the size, reason why the question of the relevance of a true third type of string is not completely unfounded.