Weird behavior in fixed length strings

General FreeBASIC programming questions.
dodicat
Posts: 7983
Joined: Jan 10, 2006 20:30
Location: Scotland

Re: Weird behavior in fixed length strings

Post by dodicat »

Marcov
Even the string datatype has this capability in freepascal.
code
var g:string[80]=chr(0)+'Alpha'+chr(0)+'Beta';
var g2:string[80];
begin
writeln(ord(g[1]),' ',g);
writeln(length(g));
g2:=chr(0)+'Alpha'+chr(0)+'Beta';
writeln(ord(g2[1]),' ',g2);
end.

result
0 Alpha Beta
11
0 Alpha Beta

So what do you suggest for FreeBASIC?
It would be handy for FreeBASIC to have bare strings (without the null terminator), for the sake of loading from external files if nothing else, but what a load of work I reckon for another datatype.
marcov
Posts: 3462
Joined: Jun 16, 2005 9:45
Location: Netherlands
Contact:

Re: Weird behavior in fixed length strings

Post by marcov »

dodicat wrote: Jul 08, 2022 20:29 So what do you suggest for FreeBASIC?
Pascal never was zero terminated. Afaik the ancient convention (before string became a built in type) was to backwards space pad an allocation string, making it impossible to end a string with a space. That limitation was worked around with keeping a separate length and that evolved in to string types with separate length. Pascal successor Modula2 does use zero termination, except when it exactly fits in an allocation. (more C str-n function like)

The best thing to do? Implement all Delphi/FPC string types of course. I've made a list: http://www.stack.nl/~marcov/delphistringtypes.txt

Seriously however, it depends. Multiple languages make multiple choices for other reasons.

C is a bit double in that there are multiple kinds of uses. For instance embedded with a lot of static buffers and the more modern all dynamic allocated and str-n-function using as advocated by GNU, Microsoft and other vendors to get rid of the buffer overflows.

If you go for something close to C I would go for the latter, and do away with the former on all non embedded targets. (and rule of thumb: anything with over 64-128kb RAM is not embedded).

Then there is the C++ model that sacrifices everything to keep string a library concept, with high flexibility for an implementer, but makes its usage multi faceted and complex.

Finally there are languages like Delphi that simply makes string a distinct type. Together with some char* compatibility for easy interfacing. But then they ran into evolving notions about string types as described in the above URL.

Anyway, first you need to make choices, but since FB already has a language (rather than library) type, it is hard to beat the dynamically allocated "string" types of Delphi (ansi/wide/unicodestring). Fairly efficient, very easy, very compatible. I don't see why not.
deltarho[1859]
Posts: 4313
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: Weird behavior in fixed length strings

Post by deltarho[1859] »

There is another approach to the above.

Add some notes to the String topic of the manual to highlight the difference between angros47's first code snippet, in the opening post, and the second code snippet. We will have then a Dim assignment versus a separate assignment; the first will keep any Chr(0) and the second will have any Chr(0) stripped out.

The two methods can then be treated as a feature of fixed length strings as if that was intentional, but the manual did not mention it.

:)
dodicat
Posts: 7983
Joined: Jan 10, 2006 20:30
Location: Scotland

Re: Weird behavior in fixed length strings

Post by dodicat »

SARG wrote: Jul 08, 2022 13:24
marcov wrote: Jul 08, 2022 11:59 one could argue that the code with embedded zeros is simply illegals(null terminated strings should not contain nulls in the body of the string)
this seems to be the best solution.


From the manual :
Note: For the fixed-length string type only (QB-style fixed-length string), the 'Len()' keyword always returns the declared constant number of characters, regardless of the number of characters assigned to it by user.
(hence the formula: 'user_characters_length = IIf(InStr(s, Chr(0)) > 0, InStr(s, Chr(0)) - 1, Len(s))')

Code: Select all

dim s as string*80=chr(0)+"Alpha"+chr(0)+"Beta"
print "lenght=";IIf(InStr(s, Chr(0)) > 0, InStr(s, Chr(0)) - 1, Len(s));" !!!!"
print s
sleep
I cannot find IIf(InStr(s, Chr(0)) > 0, InStr(s, Chr(0)) - 1, Len(s)), in the .chm anyway.
You could do it this way

Code: Select all


namespace __zz__
dim  as string _x_
#define length(s) instrrev((s),any __zz__._x_)
sub __set__ constructor
for n as long=1 to 255
      __zz__._x_+=chr(n)
next
end sub
end namespace

'-------------------------------------------------------------
dim s as string*80 =chr(0)+"Alpha"+chr(0)+"Beta"
print length(s),len(s)


sleep

 
But the bug found by angros47 still stands for this length.
SARG
Posts: 1767
Joined: May 27, 2005 7:15
Location: FRANCE

Re: Weird behavior in fixed length strings

Post by SARG »

dodicat wrote: Jul 08, 2022 22:42 I cannot find IIf(InStr(s, Chr(0)) > 0, InStr(s, Chr(0)) - 1, Len(s)), in the .chm anyway.
Search the page : Strings (string, zstring, and wstring)
or https://www.freebasic.net/wiki/ProPgStringsTypes
fxm
Moderator
Posts: 12131
Joined: Apr 22, 2009 12:46
Location: Paris suburbs, FRANCE

Re: Weird behavior in fixed length strings

Post by fxm »

Glad my pages added to the Programmer's Guide are being used!
coderJeff
Site Admin
Posts: 4326
Joined: Nov 04, 2005 14:23
Location: Ontario, Canada
Contact:

Re: Weird behavior in fixed length strings

Post by coderJeff »

Sry, I didn't read the whole thread. Just something to keep in mind when writing tests to prove out fbc and/or runtime is correct or wrong: please do write tests that check both optimizations (constant folding) and runtime evaluation (expressions). Across multiple string types, we also have to contend with fbc compiler optimizing certain expressions before it generates code for runtime. Ideally, should be consistent across all modes, but as probably someone will demonstrate, we don't always get that.
fxm
Moderator
Posts: 12131
Joined: Apr 22, 2009 12:46
Location: Paris suburbs, FRANCE

Re: Weird behavior in fixed length strings

Post by fxm »

Note: It is not recommended to use explicit NULL character (chr(0)) in a string expression involving a fixed length string variable because this can lead to different unexpected results depending on usage context (initialization, assignment, concatenation, ...).
  • STRING documentation page updated with this above warning:
    KeyPgString → fxm [added warning about using a null character (Chr(0)) in an expression involving a fix-len string]

Demonstrative example with fix-len string (different unexpected results depending on usage context):

Code: Select all

Dim s1 As String * 20 = "Alpha" + Chr(0) + "Beta"
Print s1
Print "'" & s1 & "'"
Print
Dim As String * 20 s2
s2 = "Alpha" + Chr(0) + "Beta"
Print s2
Print
Dim As String * 20 s3 = s1
Print s3
Print
Dim As String s = s1
Print s

Sleep

Code: Select all

Alpha Beta
'Alpha'

AlphaBeta

Alpha

Alpha

No problem with (fix-len) zstring:

Code: Select all

Dim z1 As Zstring * 20 = "Alpha" + Chr(0) + "Beta"
Print z1
Print "'" & z1 & "'"
Print
Dim As Zstring * 20 z2
z2 = "Alpha" + Chr(0) + "Beta"
Print z2
Print
Dim As Zstring * 20 z3 = z1
Print z3
Print
Dim As String s = z1
Print s

Sleep

Code: Select all

Alpha
'Alpha'

Alpha

Alpha

Alpha


[edit]
Bug report filled in:
966 Weird behavior in fix-len string (Dim As String * N ...) if explicit NULL character (Chr(0)) used in a string expression
Last edited by fxm on Jul 14, 2022 8:58, edited 1 time in total.
Reason: Updated.
Lost Zergling
Posts: 538
Joined: Dec 02, 2011 22:51
Location: France

Re: Weird behavior in fixed length strings

Post by Lost Zergling »

Some more fun
:P

Code: Select all

Dim s1  as string  *20="Alpha"+Chr(0)+"Beta"
Dim s2 as string *20= Left(s1, len(s1))
'Dim s3 as string = s1 ' var strings assigments from fixed strings follow zstrings rules
Dim s3 as string = Left(s1, len(s1))
Dim zs4 As zstring ptr= Allocate(40)
*zs4="Alpha"+Chr(0)+"Beta"
Dim s5 as string
' Dim zs6 as zstring ptr=@s1

? s1    ' Alpha Beta
? s2    ' Alpha Beta
? s3    ' Alpha Beta
? *zs4   ' Alpha

? "--------------------------"
? *@s1 & "%"    ' Alpha
? *@s2 & "%"    ' Alpha
? *@s3 & "%"    ' Alpha Beta

? "--------------------------"
? Left(s1, len(s1) ) & "%"  & len(s1)   ' Alpha Beta + spaces
? Left(s2, len(s2) ) & "%"  & len(s2)   ' Alpha Beta + spaces
? Left(s3, len(s3) ) & "%"  & len(s3)   ' Alpha Beta + spaces
? "So many thanks, Left"

? "--------------------------"
s1="Burp"
? Left(s1, len(s1) ) & "%"  & len(s1) & " wouldn't be so much better logic and fast not clearing right places on fixed lenght ?"  ' Burp - fixed strings assigments follow zstrings rules + clearing right memory places - Why not just say Fixed strings follow zstrings rules ?
s1="B" & Chr(0) & "rp"
? Left(s1, len(s1) ) & "%"  & len(s1)   ' Brp - fixed strings assigments follow zstrings rules except for exceptions (chr(0) as gone)
s5="B" & Chr(0) & "r"
? "Printed " & s5 & "p"                     ' B rp - Just to be sure what is printed
s1= s5 & "p"
'  *zs6=s5 & "p"     'Check !
' s1= Left(s5, len(s5)) & "p"   'Check !
' s1= Left(s5 & "p", len(s5)+1)  'Check !

? "Assigned " & Left(s1, len(s1)-9 ) & "%"  & len(s1) & " Hao ? Where is my 'r' ? By what right did you kill my chr(0) ? So much refining here."    ' Bp

s3="B" & Chr(0) & "rp"
? Left(s3, len(s1) )  & "%"  & len(s3)  & space(18) & "Would you like a cup of tea ?" ' B rp - var strings assigments, string len updated

Deallocate zs4
sleep
dodicat
Posts: 7983
Joined: Jan 10, 2006 20:30
Location: Scotland

Re: Weird behavior in fixed length strings

Post by dodicat »

I agree Lost Zergling
A fixed length string should accept chr(0) (and no hidden chr(0) at the end).
Maybe another data type is needed.
Otherwise you must do a fake cast to string at each implied fixed length string.

Code: Select all

namespace whatever
#define Cstr(s) mid((s),1,length(s))
dim  as string _x_
#define length(s) instrrev((s),any whatever._x_)
sub setX constructor
for n as long=1 to 255
      _x_+=chr(n)
next
end sub
end namespace



Dim s1 As String * 20 = "Alpha" + Chr(0) + "Beta"
Print Cstr(s1),"length = ";length(s1)
print
Print "'" & Cstr(s1) & "'","length = ";length("'" & Cstr(s1) & "'")
Print
Dim As String * 20 s2
s2 = Cstr("Alpha" + Chr(0) + "Beta")
Print Cstr(s2),"length = ";length(s2)
Print
Dim As String * 20 s3 = Cstr(s1)
Print Cstr(s3),"length = ";length(s3)
Print
Dim As String s = Cstr(s1)
Print s,"length = ";length(s)

Sleep 
Which is an unwholesome hack
fxm
Moderator
Posts: 12131
Joined: Apr 22, 2009 12:46
Location: Paris suburbs, FRANCE

Re: Weird behavior in fixed length strings

Post by fxm »

dodicat wrote: Jul 11, 2022 17:09 A fixed length string should accept chr(0) (and no hidden chr(0) at the end).
Yes, but for that, it would be necessary to be also able to declare 'As String * N' as a type of parameter to be passed to a procedure.
dodicat
Posts: 7983
Joined: Jan 10, 2006 20:30
Location: Scotland

Re: Weird behavior in fixed length strings

Post by dodicat »

It wouldn't be too difficult to rustle up a raw ubyte string.
For fun

Code: Select all


#cmdline "-exx"

Type rawstring 
      Declare Constructor
      Declare Constructor(As String)
      Declare Constructor(As Long)
      Declare Operator Let(As String)
      Declare Operator Cast() As String
      Declare Operator [](As Long) As Ubyte
      As Ubyte u(Any)
End Type

Constructor rawstring
End Constructor

Constructor rawstring(s As String)
 Redim u(Len(s)-1)
For n As Long=0 To Len(s)-1
      u(n)=s[n]
      Next
End Constructor

Constructor rawstring(L As Long)
Redim u(L-1)
End Constructor

Operator rawstring.let(s As String)
this.constructor(s)
End Operator

Operator rawstring.cast() As String
Dim As String g
For n As Long=0 To Ubound(u)
      g+=Chr(u(n))
Next n
Return g
End Operator

Function RawLen(x As rawstring) As Long
    Return Ubound(x.u)+1
End Function

Operator rawstring.[](s As Long) As Ubyte
Return u(s) 
End Operator

Operator Len(x As rawstring) As Long
Return Ubound(x.u)+1
End Operator

Operator +(a As rawstring,b As rawstring) As rawstring
Dim As String aa=a,bb=b
Dim As rawstring ret=aa+bb
Return ret
End Operator

'================================

Dim As rawstring g
g="123456"+chr(0)+"abcde"
Print "Rawstring and length ",  g,Len(g)
Print "Mid(g,5,4) ",Mid(g,5,4)

Mid(*cast(string ptr,@g),4,2)="XX"


Print "Mid(g,4,2)= XX " , g

Dim As rawstring f=(5)
Print "Length of empty rawstring ",Len(f)

Dim As rawstring r="press "+"any "+"key "+"to "+"finish"+". . ."

For n As Long=0 To Len(r)-1
    Print Chr(r[n]);
Next
Print


Sleep
 
fxm
Moderator
Posts: 12131
Joined: Apr 22, 2009 12:46
Location: Paris suburbs, FRANCE

Re: Weird behavior in fixed length strings

Post by fxm »

Small note on above code

In general, calling a constructor on an instance ('instance.constructor(...)' or 'this.constructor(...)') is not recommended because member objects are not properly destroyed before they are reconstructed.
In your example, array elements are not properly deallocated before reallocation (see example below where unnecessary new element address appears while element count is unchanged) inducing memory leak:

Code: Select all

Type rawstring 
      Declare Constructor(As String)
      Declare Operator Let(As String)
      As Ubyte u(Any)
End Type

Constructor rawstring(s As String)
    Redim u(Len(s)-1)
    For n As Long=0 To Len(s)-1
        u(n)=s[n]
    Next
    Print "constructor(as string)", @this, @u(0)
End Constructor

Operator rawstring.let(s As String)
    this.constructor(s)
    Print "operator let(as string)", @this, @u(0)
End Operator

'================================

Dim As rawstring g = "abcd"
g="1234"

Sleep

Code: Select all

constructor(as string)      1703552       11473656
constructor(as string)      1703552       11473704
operator let(as string)     1703552       11473704

An inelegant workaround is to call the destructor on the object just before the constructor:

Code: Select all

Type rawstring 
      Declare Constructor(As String)
      Declare Operator Let(As String)
      As Ubyte u(Any)
End Type

Constructor rawstring(s As String)
    Redim u(Len(s)-1)
    For n As Long=0 To Len(s)-1
        u(n)=s[n]
    Next
    Print "constructor(as string)", @this, @u(0)
End Constructor

Operator rawstring.let(s As String)
    this.destructor()
    this.constructor(s)
    Print "operator let(as string)", @this, @u(0)
End Operator

'================================

Dim As rawstring g = "abcd"
g="1234"

Sleep

Code: Select all

constructor(as string)      1703552       4395768
constructor(as string)      1703552       4395768
operator let(as string)     1703552       4395768

But the better solution (IMHO) is that the user copy-construction code is reported in the Let operator body and the copy-constructor code calls the Let operator ('this = s'):

Code: Select all

Type rawstring 
      Declare Constructor(As String)
      Declare Operator Let(As String)
      As Ubyte u(Any)
End Type

Constructor rawstring(s As String)
    This = s
    Print "constructor(as string)", @this, @u(0)
End Constructor

Operator rawstring.let(s As String)
    Redim u(Len(s)-1)
    For n As Long=0 To Len(s)-1
        u(n)=s[n]
    Next
    Print "operator let(as string)", @this, @u(0)
End Operator

'================================

Dim As rawstring g = "abcd"
g="1234"

Sleep

Code: Select all

operator let(as string)     1703552       6755064
constructor(as string)      1703552       6755064
operator let(as string)     1703552       6755064
dodicat
Posts: 7983
Joined: Jan 10, 2006 20:30
Location: Scotland

Re: Weird behavior in fixed length strings

Post by dodicat »

Thanks fxm.
Yes the best let operator is just do the same as the constructor, copy and paste those few lines.
I was taking a bit of a shortcut.
Of course, with this simple rawstring, there is the task of saving to file and retrieving from file.
It does it OK if I save the file from fb and load the file from fb (Same as fixed length strings).
I can use the crt to be more specific about block loading and saving without those warnings with put and get.
But loading external files is a different story.
Anyway, I think that a new data type (raw string without hidden chr(0)) might be useful.
Passing this type to subs just means passing the size as well, like a pointer.
I'll leave it at that (going off topic).
Lost Zergling
Posts: 538
Joined: Dec 02, 2011 22:51
Location: France

Re: Weird behavior in fixed length strings

Post by Lost Zergling »

I respectfully give my opinion. I think fixed-length strings are interesting nonetheless, but they have a few weaknesses that fixing them could significantly improve usability. First of all, it seems to me that, since we're using a Chr(0) as a legacy terminator, there's no point in resetting the entire fixed-length memory area to blank (1) and I seems to lose one of the performance benefits of choosing a string terminator.
(1) : perhaps reset could be stopped till previous chr(0) is 'purged', but not sure might be faster, ..
Secondly, assignment to a fixed-length string from a variable-length string should be able to be done by respecting the rules of the variable-length string, i.e. by allowing the assignment of chr(0) and without truncation before the end of the memory slot of the fixed-length string, i.e.:
MyFixedString=MyVarString would produce the same results as
MyFixedString=Left(MyVarString, len(MyVarString))
Moreover 'MyFixedString=MyVarString' is currently redundant with 'MyFixedString=*@MyVarString'
It may therefore be possible to improve both functionality and performance by simply playing on these two points, perhaps minor ones.
For fixed length strings, I think, conceptually, the programmer should equate the difference between the effective size of the string (if less than the memory location), and the size of the memory location (buffer) . Wanting to make this problem transparent seems to have to lead to loss of functionalities and/or additional complexities in the overall consistency (MyFixedString could also return MyFixedString because fixed len is available, *@MyFixedString would be conceptually distinct)
Using a Chr(0) as a legacy terminator will still have some weaknesses, however:
- It is up to the user to manage the Chr(0) in the data (but an InstrRev for reading a fixed string could partially circumvent this point)
- The data cannot end with a chr(0) (confusion with the end of string character), so it is still not completely neutral.
- From a certain length, the fixed strings managed by a chr(0) will probably be less efficient than those managed by information on the size, reason why the question of the relevance of a true third type of string is not completely unfounded.
Post Reply