Extending Wstring and Zstring with UDTs

General discussion for topics related to the FreeBASIC project or its community.
Juergen Kuehlwein
Posts: 284
Joined: Mar 07, 2018 13:59
Location: Germany

Re: Extending Wstring and Zstring with UDTs

Post by Juergen Kuehlwein »

Great!

Returning a WSTRING PTR with SADD/STRPTR seems a lot more logical than returning a ZSTRING PTR for a WSTRING! There shouldn´t be much of a problem with that change, because before in case of a WSTRING you would have had to cast it as WSTRING PTR anyway in order to use it. Now you might have places in existing code, where you cast a WSTRING PTR to a WSTRING PTR, which shouldn´t be a problem at all.

If i interpret your code right, you changed it, so that the wstring data pointer no longer needs to be the first member variable of the UDT, which was a requirement in my version - fine!

LEFT and RIGHT (and maybe LEN too) without overloading would be very good, because then we have it all inside the RTL. This makes possible future work (extending WSTRING from UCS-2 to UTF-16 - make it really Unicode compliant) easier.

SWAP, as it currently works, is fine even for UDTs. I don´t kow, if there will be problems with more than one level of indirection inside (a pointer to a pointer as a member variable) or certain usage of UNIONs inside UDTs or with objects.

In my view it would be nice to have it for extended Z/WSTRING UDTs without the need for an overloaded member function. Maybe it should be documented and an example should be given, if there are cases, where swapping all data members doesn´t have the expected result.


Thanks,


JK
coderJeff
Site Admin
Posts: 4326
Joined: Nov 04, 2005 14:23
Location: Ontario, Canada
Contact:

Re: Extending Wstring and Zstring with UDTs

Post by coderJeff »

The reason I'm not completely convinced on SWAP, is that it is kind of like an assignment. To compare, with STRING descriptors, fbc implements an efficient SWAP mechanism, by swapping only the descriptors, exactly like swapping a UDT, without actually swapping the data that is pointed to. If SWAP unconditionally treats a UDT as WSTRING, and swaps the actual data, it's very inefficient.

Also, when I go back and look at allowing UDT to be handled like a WSTRING in LSET, RSET, and MID statement, they are also like assignments. Except inefficient, and possibly incorrectly handling the UDT as WSTRING, since they operate on the raw (pointed to) data instead of any overloaded operator the user has written.

If SWAP, LSET, RSET, MID statement just treat the data as WSTRING, then any logic that user may have written in to any CONSTRUCOR, or OPERATOR LET procedure is ignored. And if the UDT maintains internal state (like length, for example), it won't be updated correctly if using any of these assignment-like statements. So, even if these changes are merged in, I can see it changing again in future, possibly disallowing, or providing some other mechanism, like a specially named overloaded function/operator, like FOR/NEXT overloads.
Juergen Kuehlwein
Posts: 284
Joined: Mar 07, 2018 13:59
Location: Germany

Re: Extending Wstring and Zstring with UDTs

Post by Juergen Kuehlwein »

... it's very inefficient.
So, even if these changes are merged in, I can see it changing again in future, possibly disallowing, or providing some
other mechanism, like a specially named overloaded function/operator, like FOR/NEXT overloads.
Well, first make it work at all, then make it better, faster, more efficient, whatever. I would prefer an inefficient SWAP over no SWAP at all for a start.


JK
Lost Zergling
Posts: 538
Joined: Dec 02, 2011 22:51
Location: France

Re: Extending Wstring and Zstring with UDTs

Post by Lost Zergling »

@Juergen. Also consider this : once coded a permutation by values to return to it will be pretty certainly like having to re-write everything (logic of implementation and in-memory kinematic will not have been thought same way, so there will probably be almost nothing reusable). In addition, functional methods already adopted by users will have become enforcements.
Juergen Kuehlwein
Posts: 284
Joined: Mar 07, 2018 13:59
Location: Germany

Re: Extending Wstring and Zstring with UDTs

Post by Juergen Kuehlwein »

@Lost Zergling

i don´t think so. SWAP does a swap, how exactly this is handled internally, doesn´t affect the user. And if there is an improvement, this doesn´t break existing code. Adding a SWAP operator doesn´t break SWAP either. It would add an ADDITIONAL method of doing it (maybe optimized for the underlying data type).

@Jeff

thinking about it, if you take a look at the current code in ustring.bi, you will see it uses 5 variables (1 of pointer size and 4 of long size). I could get rid of 2 of them (wstring size and growsize, which could be hardcoded), leaving 3. I could make the length variable and the size variable an integer just like with strings. Then the ustring "descriptor" would be of the same size as the string descriptor. You can swap this UDT just like strings, you don´t need to swap the actual data. I wouldn´t call this inefficient...


JK
coderJeff
Site Admin
Posts: 4326
Joined: Nov 04, 2005 14:23
Location: Ontario, Canada
Contact:

Re: Extending Wstring and Zstring with UDTs

Post by coderJeff »

@JK, doesn't matter if it is your "ustring.bi" or some other user's own creation, fbc needs to do something reasonable. I agree, the feature need not be fully complete or fully optimized, only that I would like to leave myself a possibility for future that isn't going to make users outright angry when things break.

@Lost Zergling, I think I get your meaning. It's true, these are my concerns also.

Breaking binary compatibility, what would happen with some rtlib or implementation /optimization changes, even though will annoy users, is allowable in my opinion with a reasonable justification, and is fixable by recompiling source code. When I look at fbc's current list of features to add and bugs to fix, breaking binary compatibility is unavoidable if we want development to progress.

Breaking source code compatiblity concerns me most. We need a very strong justification to break source code compatibility, or some expectation that it will only affect a few users. I really want to avoid having to break source code compatibility in future if possible because it can't be fixed just by recompiling; users will have to change the code they've written. To be justifiable, this kind of breakage has to be absolutely necessary to progress development.

----

Here's what I am thinking, assuming udt extends Z|WSTRING:

SWAP udt, udt => exchange udt data only (aka descriptor swap), and works as would currently.

SWAP udt, wstring => raw exchange of wstring data
SWAP wstring, udt => raw exchange of wstring data

So, to force an exchange of wstring data on 2 UDT's
SWAP udt, wstr(udt)
SWAP wstr(udt), udt
SWAP wstr(udt), wstr(udt)

Additionally, regarding the assignment like nature of LSET, RSET, MID, SWAP, in the minimal implentation:

Code: Select all

type T extends wstring
	__ as integer
	declare operator cast() byref as const wstring
end type
Usage of LSET, RSET, MID statment, SWAP should produce an error because the pointer to WSTRING data is CONST. This is expected.

And to allow modification of the data, need to implement a non-const version of the UDT. Which can be done by removing the CONST qualifier on the CAST operator or:

Code: Select all

type T_mutable extends T
	declare operator cast() byref as wstring
end type
Juergen Kuehlwein
Posts: 284
Joined: Mar 07, 2018 13:59
Location: Germany

Re: Extending Wstring and Zstring with UDTs

Post by Juergen Kuehlwein »

Jeff,

i absolutely agree!

JK
coderJeff
Site Admin
Posts: 4326
Joined: Nov 04, 2005 14:23
Location: Ontario, Canada
Contact:

Re: Extending Wstring and Zstring with UDTs

Post by coderJeff »

Add changes for SWAP, SELECT, and IIF

IIF(expr, ture-expr, false-expr) uses a similar approach as SWAP. If one or the other is a UDT that can be converted to Z|WSTRING, it will do so, otherwise, no conversion and normal IIF logic remains.

This feature pull request user defined types can extend zstring or wstring #150 is now merged in to fbc/master.
changelog.txt wrote: Version 1.07.0

[changed]
- SADD/STRPTR(wstring) returns WSTRING PTR

[added]
- 'TYPE udt EXTENDS Z|WSTRING' allowed to specify that UDT is a kind of Z|WSTRING
- LTRIM/RTRIM/TRIM will accept UDT as Z|WSTRING
- LCASE/UCASE will accept UDT as Z|WSTRING
- Cxxx() conversion functions will accept UDT as Z|WSTRING
- INSTR/INSTRREV will accept UDT as Z|WSTRING
- MID function will accept UDT as Z|WSTRING
- SADD/STRPTR will accept UDT as Z|WSTRING to return Z|WSTRING ptr
- LSET/RSET statements will accept UDT as Z|WSTRING
- MID statement will accept UDT as Z|WSTRING
- ASC function will accept UDT as Z|WSTRING
- STR/WSTR function will accept UDT as Z|WSTRING to return a Z|WSTRING
- SELECT statement will accept UDT as Z|WSTRING to return a Z|WSTRING
- SWAP statement will accept UDT as Z|WSTRING
- IIF function will accept UDT as Z|WSTRING
JK, thank you for your help on this feature.
Juergen Kuehlwein
Posts: 284
Joined: Mar 07, 2018 13:59
Location: Germany

Re: Extending Wstring and Zstring with UDTs

Post by Juergen Kuehlwein »

My pleasure!


JK
Juergen Kuehlwein
Posts: 284
Joined: Mar 07, 2018 13:59
Location: Germany

Re: Extending Wstring and Zstring with UDTs

Post by Juergen Kuehlwein »

Jeff,


many thanks for your efforts! But as you know too, at best we are only halfway there...

As a next logical step, we should have a default dynamic (zero terminated) WSTRING type like the one i added to my pull request. Are there changes needed other than we already discussed?

I would propose to reserve the word "USTRING" for such a type in a way i did it in ustring.bi. This way it is possible to make different types (ustring.bi, José´s WINFBX, etc.) work for the same code. That is, including ustring.bi makes either José´s version (if present) or the default type an USTRING. Not including ustring.bi means a user must add his own implementation of extended WSTRINGs and (#)define it as USTRING, if he wants to use USTRINGs at all.

How should i proceed - a new pull request? Or would you like to add it yourself. In this case feel free to adapt it as necessary. Please add credits to José and all others involved and add a license appropriate for FreeBASIC, thanks.


JK


later: this new version passes all tests i wrote for ustring.bi - i didn´t expect anything else, great work!
fxm
Moderator
Posts: 12107
Joined: Apr 22, 2009 12:46
Location: Paris suburbs, FRANCE

Re: Extending Wstring and Zstring with UDTs

Post by fxm »

@coderJeff,
Can you provide an example of these new features, using for example 'Extends Zstring'?
(I think that the build of this day at http://users.freebasic-portal.de/stw/builds/ is still incomplete to be able to test the new features)
Juergen Kuehlwein
Posts: 284
Joined: Mar 07, 2018 13:59
Location: Germany

Re: Extending Wstring and Zstring with UDTs

Post by Juergen Kuehlwein »

@Jeff,


i made a pull request adding "ustring.bi" and USTRING specific tests. As usual, make the best of it!


JK
fxm
Moderator
Posts: 12107
Joined: Apr 22, 2009 12:46
Location: Paris suburbs, FRANCE

Re: Extending Wstring and Zstring with UDTs

Post by fxm »

For 'Type Ustring Extends Zstring', the added features allow to now support in addition:
- 'Strptr'
- 'Lset/Rset'
- 'Select Case'
which were the only incompatible without 'Extends Zstring'.

To overload the unary operator Len, since the member data of Ustring are usually private, I found nothing better than this:

Code: Select all

Operator Len ( Byref u As Ustring ) As Integer
  Operator = Len( Type<String>( u ) )
End Operator
coderJeff
Site Admin
Posts: 4326
Joined: Nov 04, 2005 14:23
Location: Ontario, Canada
Contact:

Re: Extending Wstring and Zstring with UDTs

Post by coderJeff »

fxm wrote:@coderJeff,
Can you provide an example of these new features, using for example 'Extends Zstring'?
Yeah, an 'Extends Zstring' example won't be of much interest yet. Because fbc prefers conversions to (ascii) zstring/string anyway, 'extends zstring' isn't much different than the UDT capability we already have. I think the only attraction now would be the symmetry with 'extends wstring'.

That said, 'extends zstring' or 'extends wstring' will eventually offer the prefered conversion.

For example, due bug #666 Cannot overload 'as string' with 'as zstring ptr', which also relates to other string conversions, even wstring/zstring, we have this situation:

Code: Select all

type T
	buffer as wstring * 50
	declare operator cast() byref as const wstring
end type

dim x as T

'' error 98: Ambiguous call to overloaded function, LEFT() in 'print left(x, 2)'
print left(x, 2)
Once that's fixed, 'extends zstring' can be used to resolve this ambiguity:

Code: Select all

type T extends zstring
	buffer as zstring * 50
	declare constructor( byref s as const zstring )
	declare operator cast() byref as const wstring
	declare operator cast() byref as const zstring
end type

sub proc overload( byref arg as wstring )
end sub

sub proc overload( byref arg as zstring )
end sub

dim s as T = "abcde"

'' Ambiguous call to overloaded function, PROC() in 'proc( s )'
proc( s )
When developing the 'extends z|wstring' feature, I thought also of having a "priority" attribute or a "default" keyword on the cast operators. In the end, I felt 'extends z|wstring' best captures the meaning of the code to be expressed. It's a gamble if the time invested so far on 'extends zstring' will pay off. Overall though, it has been efficient to add the code to fbc and tests for 'extends zstring' at the same time as doing the 'extends wstring'. I don't know what users will need it for yet. Maybe a UTF8 zstring type that also has a in-place WSTRING conversion? I dunno.
coderJeff
Site Admin
Posts: 4326
Joined: Nov 04, 2005 14:23
Location: Ontario, Canada
Contact:

Re: Extending Wstring and Zstring with UDTs

Post by coderJeff »

Before getting too elaborate with the examples, I'll post a couple small examples to try and the difference between standard UDT and the new 'UDT extends wstring'

One of the original issues is that fbc implicitly casts UDT's to a STRING type for quirk string functions. This is a problem for wstring UDT types, since wide characters are lost in the conversion.

This example can be tried with or without the 'extends wstring' to see the difference. I've put the output in the listing showing both.

Code: Select all

type T '' extends wstring
	buffer as wstring * 50
	declare constructor( byref s as const wstring )
	declare operator cast() byref as const wstring
end type

constructor T( byref s as const wstring )
	buffer = s
end constructor

operator T.cast() byref as const wstring
	operator = buffer
end operator

sub print_string( byref title as const string, byref x as const wstring, byval indent as integer = 0 )
	print left( title & space(20), 20 ) & ": "; space( indent*(1+sizeof(wstring)*2) );
	for i as integer = 0 to len(x) -1
		print hex( x[i], sizeof(wstring)*2 ); " ";
	next
	print
end sub

dim x as T = !"  \u3041\u3043\u3045\u3047\u3049  "
print_string( "x", x )

dim w as wstring * 50 = x
print_string( "w", w )

'' fbc implicitly uses a str() conversion

print_string( "ltrim(x)", ltrim(x), 2 )
print_string( "rtrim(x)", rtrim(x) )
print_string( "trim(x)" , trim(x) , 2 )

/'
OUTPUT (without extends wstring):

x                   : 0020 0020 3041 3043 3045 3047 3049 0020 0020
w                   : 0020 0020 3041 3043 3045 3047 3049 0020 0020
ltrim(x)            :           003F 003F 003F 003F 003F 0020 0020
rtrim(x)            : 0020 0020 003F 003F 003F 003F 003F
trim(x)             :           003F 003F 003F 003F 003F

OUTPUT (with extends wstring):

x                   : 0020 0020 3041 3043 3045 3047 3049 0020 0020
w                   : 0020 0020 3041 3043 3045 3047 3049 0020 0020
ltrim(x)            :           3041 3043 3045 3047 3049 0020 0020
rtrim(x)            : 0020 0020 3041 3043 3045 3047 3049
trim(x)             :           3041 3043 3045 3047 3049

'/
Even if we fix https://sourceforge.net/p/fbc/bugs/752/, we would still have to wrap arguments in wstr(), or cast(wstring,), and we probably want something a little more user friendly.
Post Reply