Extending Wstring and Zstring with UDTs
-
- Posts: 284
- Joined: Mar 07, 2018 13:59
- Location: Germany
Re: Extending Wstring and Zstring with UDTs
Great!
Returning a WSTRING PTR with SADD/STRPTR seems a lot more logical than returning a ZSTRING PTR for a WSTRING! There shouldn´t be much of a problem with that change, because before in case of a WSTRING you would have had to cast it as WSTRING PTR anyway in order to use it. Now you might have places in existing code, where you cast a WSTRING PTR to a WSTRING PTR, which shouldn´t be a problem at all.
If i interpret your code right, you changed it, so that the wstring data pointer no longer needs to be the first member variable of the UDT, which was a requirement in my version - fine!
LEFT and RIGHT (and maybe LEN too) without overloading would be very good, because then we have it all inside the RTL. This makes possible future work (extending WSTRING from UCS-2 to UTF-16 - make it really Unicode compliant) easier.
SWAP, as it currently works, is fine even for UDTs. I don´t kow, if there will be problems with more than one level of indirection inside (a pointer to a pointer as a member variable) or certain usage of UNIONs inside UDTs or with objects.
In my view it would be nice to have it for extended Z/WSTRING UDTs without the need for an overloaded member function. Maybe it should be documented and an example should be given, if there are cases, where swapping all data members doesn´t have the expected result.
Thanks,
JK
Returning a WSTRING PTR with SADD/STRPTR seems a lot more logical than returning a ZSTRING PTR for a WSTRING! There shouldn´t be much of a problem with that change, because before in case of a WSTRING you would have had to cast it as WSTRING PTR anyway in order to use it. Now you might have places in existing code, where you cast a WSTRING PTR to a WSTRING PTR, which shouldn´t be a problem at all.
If i interpret your code right, you changed it, so that the wstring data pointer no longer needs to be the first member variable of the UDT, which was a requirement in my version - fine!
LEFT and RIGHT (and maybe LEN too) without overloading would be very good, because then we have it all inside the RTL. This makes possible future work (extending WSTRING from UCS-2 to UTF-16 - make it really Unicode compliant) easier.
SWAP, as it currently works, is fine even for UDTs. I don´t kow, if there will be problems with more than one level of indirection inside (a pointer to a pointer as a member variable) or certain usage of UNIONs inside UDTs or with objects.
In my view it would be nice to have it for extended Z/WSTRING UDTs without the need for an overloaded member function. Maybe it should be documented and an example should be given, if there are cases, where swapping all data members doesn´t have the expected result.
Thanks,
JK
Re: Extending Wstring and Zstring with UDTs
The reason I'm not completely convinced on SWAP, is that it is kind of like an assignment. To compare, with STRING descriptors, fbc implements an efficient SWAP mechanism, by swapping only the descriptors, exactly like swapping a UDT, without actually swapping the data that is pointed to. If SWAP unconditionally treats a UDT as WSTRING, and swaps the actual data, it's very inefficient.
Also, when I go back and look at allowing UDT to be handled like a WSTRING in LSET, RSET, and MID statement, they are also like assignments. Except inefficient, and possibly incorrectly handling the UDT as WSTRING, since they operate on the raw (pointed to) data instead of any overloaded operator the user has written.
If SWAP, LSET, RSET, MID statement just treat the data as WSTRING, then any logic that user may have written in to any CONSTRUCOR, or OPERATOR LET procedure is ignored. And if the UDT maintains internal state (like length, for example), it won't be updated correctly if using any of these assignment-like statements. So, even if these changes are merged in, I can see it changing again in future, possibly disallowing, or providing some other mechanism, like a specially named overloaded function/operator, like FOR/NEXT overloads.
Also, when I go back and look at allowing UDT to be handled like a WSTRING in LSET, RSET, and MID statement, they are also like assignments. Except inefficient, and possibly incorrectly handling the UDT as WSTRING, since they operate on the raw (pointed to) data instead of any overloaded operator the user has written.
If SWAP, LSET, RSET, MID statement just treat the data as WSTRING, then any logic that user may have written in to any CONSTRUCOR, or OPERATOR LET procedure is ignored. And if the UDT maintains internal state (like length, for example), it won't be updated correctly if using any of these assignment-like statements. So, even if these changes are merged in, I can see it changing again in future, possibly disallowing, or providing some other mechanism, like a specially named overloaded function/operator, like FOR/NEXT overloads.
-
- Posts: 284
- Joined: Mar 07, 2018 13:59
- Location: Germany
Re: Extending Wstring and Zstring with UDTs
... it's very inefficient.
Well, first make it work at all, then make it better, faster, more efficient, whatever. I would prefer an inefficient SWAP over no SWAP at all for a start.So, even if these changes are merged in, I can see it changing again in future, possibly disallowing, or providing some
other mechanism, like a specially named overloaded function/operator, like FOR/NEXT overloads.
JK
-
- Posts: 538
- Joined: Dec 02, 2011 22:51
- Location: France
Re: Extending Wstring and Zstring with UDTs
@Juergen. Also consider this : once coded a permutation by values to return to it will be pretty certainly like having to re-write everything (logic of implementation and in-memory kinematic will not have been thought same way, so there will probably be almost nothing reusable). In addition, functional methods already adopted by users will have become enforcements.
-
- Posts: 284
- Joined: Mar 07, 2018 13:59
- Location: Germany
Re: Extending Wstring and Zstring with UDTs
@Lost Zergling
i don´t think so. SWAP does a swap, how exactly this is handled internally, doesn´t affect the user. And if there is an improvement, this doesn´t break existing code. Adding a SWAP operator doesn´t break SWAP either. It would add an ADDITIONAL method of doing it (maybe optimized for the underlying data type).
@Jeff
thinking about it, if you take a look at the current code in ustring.bi, you will see it uses 5 variables (1 of pointer size and 4 of long size). I could get rid of 2 of them (wstring size and growsize, which could be hardcoded), leaving 3. I could make the length variable and the size variable an integer just like with strings. Then the ustring "descriptor" would be of the same size as the string descriptor. You can swap this UDT just like strings, you don´t need to swap the actual data. I wouldn´t call this inefficient...
JK
i don´t think so. SWAP does a swap, how exactly this is handled internally, doesn´t affect the user. And if there is an improvement, this doesn´t break existing code. Adding a SWAP operator doesn´t break SWAP either. It would add an ADDITIONAL method of doing it (maybe optimized for the underlying data type).
@Jeff
thinking about it, if you take a look at the current code in ustring.bi, you will see it uses 5 variables (1 of pointer size and 4 of long size). I could get rid of 2 of them (wstring size and growsize, which could be hardcoded), leaving 3. I could make the length variable and the size variable an integer just like with strings. Then the ustring "descriptor" would be of the same size as the string descriptor. You can swap this UDT just like strings, you don´t need to swap the actual data. I wouldn´t call this inefficient...
JK
Re: Extending Wstring and Zstring with UDTs
@JK, doesn't matter if it is your "ustring.bi" or some other user's own creation, fbc needs to do something reasonable. I agree, the feature need not be fully complete or fully optimized, only that I would like to leave myself a possibility for future that isn't going to make users outright angry when things break.
@Lost Zergling, I think I get your meaning. It's true, these are my concerns also.
Breaking binary compatibility, what would happen with some rtlib or implementation /optimization changes, even though will annoy users, is allowable in my opinion with a reasonable justification, and is fixable by recompiling source code. When I look at fbc's current list of features to add and bugs to fix, breaking binary compatibility is unavoidable if we want development to progress.
Breaking source code compatiblity concerns me most. We need a very strong justification to break source code compatibility, or some expectation that it will only affect a few users. I really want to avoid having to break source code compatibility in future if possible because it can't be fixed just by recompiling; users will have to change the code they've written. To be justifiable, this kind of breakage has to be absolutely necessary to progress development.
----
Here's what I am thinking, assuming udt extends Z|WSTRING:
SWAP udt, udt => exchange udt data only (aka descriptor swap), and works as would currently.
SWAP udt, wstring => raw exchange of wstring data
SWAP wstring, udt => raw exchange of wstring data
So, to force an exchange of wstring data on 2 UDT's
SWAP udt, wstr(udt)
SWAP wstr(udt), udt
SWAP wstr(udt), wstr(udt)
Additionally, regarding the assignment like nature of LSET, RSET, MID, SWAP, in the minimal implentation:
Usage of LSET, RSET, MID statment, SWAP should produce an error because the pointer to WSTRING data is CONST. This is expected.
And to allow modification of the data, need to implement a non-const version of the UDT. Which can be done by removing the CONST qualifier on the CAST operator or:
@Lost Zergling, I think I get your meaning. It's true, these are my concerns also.
Breaking binary compatibility, what would happen with some rtlib or implementation /optimization changes, even though will annoy users, is allowable in my opinion with a reasonable justification, and is fixable by recompiling source code. When I look at fbc's current list of features to add and bugs to fix, breaking binary compatibility is unavoidable if we want development to progress.
Breaking source code compatiblity concerns me most. We need a very strong justification to break source code compatibility, or some expectation that it will only affect a few users. I really want to avoid having to break source code compatibility in future if possible because it can't be fixed just by recompiling; users will have to change the code they've written. To be justifiable, this kind of breakage has to be absolutely necessary to progress development.
----
Here's what I am thinking, assuming udt extends Z|WSTRING:
SWAP udt, udt => exchange udt data only (aka descriptor swap), and works as would currently.
SWAP udt, wstring => raw exchange of wstring data
SWAP wstring, udt => raw exchange of wstring data
So, to force an exchange of wstring data on 2 UDT's
SWAP udt, wstr(udt)
SWAP wstr(udt), udt
SWAP wstr(udt), wstr(udt)
Additionally, regarding the assignment like nature of LSET, RSET, MID, SWAP, in the minimal implentation:
Code: Select all
type T extends wstring
__ as integer
declare operator cast() byref as const wstring
end type
And to allow modification of the data, need to implement a non-const version of the UDT. Which can be done by removing the CONST qualifier on the CAST operator or:
Code: Select all
type T_mutable extends T
declare operator cast() byref as wstring
end type
-
- Posts: 284
- Joined: Mar 07, 2018 13:59
- Location: Germany
Re: Extending Wstring and Zstring with UDTs
Jeff,
i absolutely agree!
JK
i absolutely agree!
JK
Re: Extending Wstring and Zstring with UDTs
Add changes for SWAP, SELECT, and IIF
IIF(expr, ture-expr, false-expr) uses a similar approach as SWAP. If one or the other is a UDT that can be converted to Z|WSTRING, it will do so, otherwise, no conversion and normal IIF logic remains.
This feature pull request user defined types can extend zstring or wstring #150 is now merged in to fbc/master.
IIF(expr, ture-expr, false-expr) uses a similar approach as SWAP. If one or the other is a UDT that can be converted to Z|WSTRING, it will do so, otherwise, no conversion and normal IIF logic remains.
This feature pull request user defined types can extend zstring or wstring #150 is now merged in to fbc/master.
JK, thank you for your help on this feature.changelog.txt wrote: Version 1.07.0
[changed]
- SADD/STRPTR(wstring) returns WSTRING PTR
[added]
- 'TYPE udt EXTENDS Z|WSTRING' allowed to specify that UDT is a kind of Z|WSTRING
- LTRIM/RTRIM/TRIM will accept UDT as Z|WSTRING
- LCASE/UCASE will accept UDT as Z|WSTRING
- Cxxx() conversion functions will accept UDT as Z|WSTRING
- INSTR/INSTRREV will accept UDT as Z|WSTRING
- MID function will accept UDT as Z|WSTRING
- SADD/STRPTR will accept UDT as Z|WSTRING to return Z|WSTRING ptr
- LSET/RSET statements will accept UDT as Z|WSTRING
- MID statement will accept UDT as Z|WSTRING
- ASC function will accept UDT as Z|WSTRING
- STR/WSTR function will accept UDT as Z|WSTRING to return a Z|WSTRING
- SELECT statement will accept UDT as Z|WSTRING to return a Z|WSTRING
- SWAP statement will accept UDT as Z|WSTRING
- IIF function will accept UDT as Z|WSTRING
-
- Posts: 284
- Joined: Mar 07, 2018 13:59
- Location: Germany
Re: Extending Wstring and Zstring with UDTs
My pleasure!
JK
JK
-
- Posts: 284
- Joined: Mar 07, 2018 13:59
- Location: Germany
Re: Extending Wstring and Zstring with UDTs
Jeff,
many thanks for your efforts! But as you know too, at best we are only halfway there...
As a next logical step, we should have a default dynamic (zero terminated) WSTRING type like the one i added to my pull request. Are there changes needed other than we already discussed?
I would propose to reserve the word "USTRING" for such a type in a way i did it in ustring.bi. This way it is possible to make different types (ustring.bi, José´s WINFBX, etc.) work for the same code. That is, including ustring.bi makes either José´s version (if present) or the default type an USTRING. Not including ustring.bi means a user must add his own implementation of extended WSTRINGs and (#)define it as USTRING, if he wants to use USTRINGs at all.
How should i proceed - a new pull request? Or would you like to add it yourself. In this case feel free to adapt it as necessary. Please add credits to José and all others involved and add a license appropriate for FreeBASIC, thanks.
JK
later: this new version passes all tests i wrote for ustring.bi - i didn´t expect anything else, great work!
many thanks for your efforts! But as you know too, at best we are only halfway there...
As a next logical step, we should have a default dynamic (zero terminated) WSTRING type like the one i added to my pull request. Are there changes needed other than we already discussed?
I would propose to reserve the word "USTRING" for such a type in a way i did it in ustring.bi. This way it is possible to make different types (ustring.bi, José´s WINFBX, etc.) work for the same code. That is, including ustring.bi makes either José´s version (if present) or the default type an USTRING. Not including ustring.bi means a user must add his own implementation of extended WSTRINGs and (#)define it as USTRING, if he wants to use USTRINGs at all.
How should i proceed - a new pull request? Or would you like to add it yourself. In this case feel free to adapt it as necessary. Please add credits to José and all others involved and add a license appropriate for FreeBASIC, thanks.
JK
later: this new version passes all tests i wrote for ustring.bi - i didn´t expect anything else, great work!
Re: Extending Wstring and Zstring with UDTs
@coderJeff,
Can you provide an example of these new features, using for example 'Extends Zstring'?
(I think that the build of this day at http://users.freebasic-portal.de/stw/builds/ is still incomplete to be able to test the new features)
Can you provide an example of these new features, using for example 'Extends Zstring'?
(I think that the build of this day at http://users.freebasic-portal.de/stw/builds/ is still incomplete to be able to test the new features)
-
- Posts: 284
- Joined: Mar 07, 2018 13:59
- Location: Germany
Re: Extending Wstring and Zstring with UDTs
@Jeff,
i made a pull request adding "ustring.bi" and USTRING specific tests. As usual, make the best of it!
JK
i made a pull request adding "ustring.bi" and USTRING specific tests. As usual, make the best of it!
JK
Re: Extending Wstring and Zstring with UDTs
For 'Type Ustring Extends Zstring', the added features allow to now support in addition:
- 'Strptr'
- 'Lset/Rset'
- 'Select Case'
which were the only incompatible without 'Extends Zstring'.
To overload the unary operator Len, since the member data of Ustring are usually private, I found nothing better than this:
- 'Strptr'
- 'Lset/Rset'
- 'Select Case'
which were the only incompatible without 'Extends Zstring'.
To overload the unary operator Len, since the member data of Ustring are usually private, I found nothing better than this:
Code: Select all
Operator Len ( Byref u As Ustring ) As Integer
Operator = Len( Type<String>( u ) )
End Operator
Re: Extending Wstring and Zstring with UDTs
Yeah, an 'Extends Zstring' example won't be of much interest yet. Because fbc prefers conversions to (ascii) zstring/string anyway, 'extends zstring' isn't much different than the UDT capability we already have. I think the only attraction now would be the symmetry with 'extends wstring'.fxm wrote:@coderJeff,
Can you provide an example of these new features, using for example 'Extends Zstring'?
That said, 'extends zstring' or 'extends wstring' will eventually offer the prefered conversion.
For example, due bug #666 Cannot overload 'as string' with 'as zstring ptr', which also relates to other string conversions, even wstring/zstring, we have this situation:
Code: Select all
type T
buffer as wstring * 50
declare operator cast() byref as const wstring
end type
dim x as T
'' error 98: Ambiguous call to overloaded function, LEFT() in 'print left(x, 2)'
print left(x, 2)
Code: Select all
type T extends zstring
buffer as zstring * 50
declare constructor( byref s as const zstring )
declare operator cast() byref as const wstring
declare operator cast() byref as const zstring
end type
sub proc overload( byref arg as wstring )
end sub
sub proc overload( byref arg as zstring )
end sub
dim s as T = "abcde"
'' Ambiguous call to overloaded function, PROC() in 'proc( s )'
proc( s )
Re: Extending Wstring and Zstring with UDTs
Before getting too elaborate with the examples, I'll post a couple small examples to try and the difference between standard UDT and the new 'UDT extends wstring'
One of the original issues is that fbc implicitly casts UDT's to a STRING type for quirk string functions. This is a problem for wstring UDT types, since wide characters are lost in the conversion.
This example can be tried with or without the 'extends wstring' to see the difference. I've put the output in the listing showing both.
Even if we fix https://sourceforge.net/p/fbc/bugs/752/, we would still have to wrap arguments in wstr(), or cast(wstring,), and we probably want something a little more user friendly.
One of the original issues is that fbc implicitly casts UDT's to a STRING type for quirk string functions. This is a problem for wstring UDT types, since wide characters are lost in the conversion.
This example can be tried with or without the 'extends wstring' to see the difference. I've put the output in the listing showing both.
Code: Select all
type T '' extends wstring
buffer as wstring * 50
declare constructor( byref s as const wstring )
declare operator cast() byref as const wstring
end type
constructor T( byref s as const wstring )
buffer = s
end constructor
operator T.cast() byref as const wstring
operator = buffer
end operator
sub print_string( byref title as const string, byref x as const wstring, byval indent as integer = 0 )
print left( title & space(20), 20 ) & ": "; space( indent*(1+sizeof(wstring)*2) );
for i as integer = 0 to len(x) -1
print hex( x[i], sizeof(wstring)*2 ); " ";
next
print
end sub
dim x as T = !" \u3041\u3043\u3045\u3047\u3049 "
print_string( "x", x )
dim w as wstring * 50 = x
print_string( "w", w )
'' fbc implicitly uses a str() conversion
print_string( "ltrim(x)", ltrim(x), 2 )
print_string( "rtrim(x)", rtrim(x) )
print_string( "trim(x)" , trim(x) , 2 )
/'
OUTPUT (without extends wstring):
x : 0020 0020 3041 3043 3045 3047 3049 0020 0020
w : 0020 0020 3041 3043 3045 3047 3049 0020 0020
ltrim(x) : 003F 003F 003F 003F 003F 0020 0020
rtrim(x) : 0020 0020 003F 003F 003F 003F 003F
trim(x) : 003F 003F 003F 003F 003F
OUTPUT (with extends wstring):
x : 0020 0020 3041 3043 3045 3047 3049 0020 0020
w : 0020 0020 3041 3043 3045 3047 3049 0020 0020
ltrim(x) : 3041 3043 3045 3047 3049 0020 0020
rtrim(x) : 0020 0020 3041 3043 3045 3047 3049
trim(x) : 3041 3043 3045 3047 3049
'/