Extending Wstring and Zstring with UDTs

General discussion for topics related to the FreeBASIC project or its community.
coderJeff
Site Admin
Posts: 4313
Joined: Nov 04, 2005 14:23
Location: Ontario, Canada
Contact:

Extending Wstring and Zstring with UDTs

Post by coderJeff »

Recently I've been working on adding this new syntax to fbc compiler:

Code: Select all

type T extends wstring '' or zstring
	'' data & overload declarations
end type
The intent is to create a UDT that has good interoperability with fbc's string types, most specifically WSTRING. We are probably still a ways off from having a built-in dynamic wstring type, but this could be the step that helps get us there, and work out some of the bugs that still exist in fbc/rtlib's wstring handling.

This is based off the work by Juergen Kuehlwein that is currently sitting as a pull request at https://github.com/freebasic/fbc/pull/118 and rebased on to current master at https://github.com/jayrm/fbc/tree/jklwn-ustring

JK has done a very good job of identifying the areas of the compiler that need to be looked at. As it stands though, the pull request can't be merged in without some major modifications and testing, so I've started working on the changes in a step-by-step manner at https://github.com/jayrm/fbc/tree/udt-wstring

JK's effort is based on some of the already very good string-type-UDT's made by José Roca in his WinFBX framework. However, some tricks and hacks are needed to make the UDT's work with fbc's current UDT+wstring handling, and this new UDT feature is to help with that. There is a very long discussion about it on José's site at http://www.jose.it-berater.org/smfforum ... topic=5253

Most notably are the quirk functions (like LCASE, INSTR, TRIM), that tend to prefer STRING casting when used with a UDT, even if the appropriate operator UDT.CAST exists.

While I have been reviewing JK's code, and writing tests for this new feature, I have noticed that there are wstring bugs in the compiler/rtlib that need to be dealt with as well:

#441 Operator Cast WString Ptr not recognised as pointer
#666 Cannot overload 'as string' with 'as zstring ptr'
#734 Passing String argument to Zstring Ptr parameter ignores Constness
#752 Cast(Zstring, u) is prohibited even if UDT from 'u' defines the member operator Cast() Byref As Zstring
#840 Constancy of certain wstring expressions changes when crosscompiling with differing sizeof(wstring)
#899 trim( wstring ) causes crash if string is single space

NEED HELP WITH:
If you know of any bugs either reported on this forum or sourceforge.net that I have missed, please post a link here, or even better, update the sourceforge.net ticket with the link as well. I don't know if I can fix all of them, but getting all the available information and test cases is the first step.

Or if you know of a wstring problem that's never been reported or you can't find if it has been reported, please post about it here, or create a new issue ticket.

Thanks, Jeff.
PaulSquires
Posts: 999
Joined: Jul 14, 2005 23:41

Re: Extending Wstring and Zstring with UDTs

Post by PaulSquires »

This is very exciting, thanks Jeff! I have been using Jose's CWSTR class extensively for dynamic wstring support in my WinFBE Editor. The whole editor code base is built around using that data type. Hopefully as your work progresses I'll be able to test it against the large amount of WinFBE code. Shout out to yourself, Jose, and JK for your effort to tackle this huge issue.
Juergen Kuehlwein
Posts: 284
Joined: Mar 07, 2018 13:59
Location: Germany

Re: Extending Wstring and Zstring with UDTs

Post by Juergen Kuehlwein »

Paul,

among others i used your editor code as a test piece for my compiler changes. All i can tell is, it compiles and works flawlessly so far with all "**" removed. Even if Jeff wants some things a bit different than i coded it, i´m convinced there will be a seamless integration of dynamic wide string types like José´s CWSTR. It is possible!


JK
Juergen Kuehlwein
Posts: 284
Joined: Mar 07, 2018 13:59
Location: Germany

Re: Extending Wstring and Zstring with UDTs

Post by Juergen Kuehlwein »

Jeff,


regarding #899 if you didn´t figure it out yourself already, please see my last two posts here http://www.jose.it-berater.org/smfforum ... c=5253.195
coderJeff
Site Admin
Posts: 4313
Joined: Nov 04, 2005 14:23
Location: Ontario, Canada
Contact:

Re: Extending Wstring and Zstring with UDTs

Post by coderJeff »

True, I do want the code changed before it gets added. I have an idea of what I want to see for code paths, symbol naming, style, etc; basically, what I think will make the code easiest to follow and understand for the next developer (which very well might be future coderJeff!).

There are 2 big concerns I have, that are preventing me from recommending that the original pull request be merged in as-is.

1) The original pull request needs improvement. Many of the changes ignore what's happening around them; like what's just happened in the code path leading up to that point and what needs to happen in the code path after that point. Though, it's evident that a lot of work went in to tracing through fbc's logic and finding the places that need to be changed, and then making something different happen at those places, the transition is not smooth. I did start a review of the code, but it was overwhelming trying to write it up, because there's so many changes all at once, and I got the impression I would need to be very specific and meticuously justify every request. So instead, I just started working through the changes step-by-step, and making a commit for each step. It's the only way I think I will manage to get though it; just write the changes using the original pull request as the proof of concept. Hopefully the incremental steps will help make the changes needed obvious.

2) The implementation of this UDT-as-wstring feature is built on an underlying WSTRING implementation. And the tests for "udt-as-wstring" are just comparing that it can behave same as "wstring" except it's a UDT. Which is not a bad way of testing provided that the wstring implementation is good. Right now, I don't have complete confidence in wstring implementation, because I keep finding bugs. :) So I'm working on the #899 bug as posted above. Plus this new one I found LTRIM and TRIM truncate result if filter is zero length string. Pull request for these two will be soon, I think. It's not a wasted effort to investigate and fix the wstring bugs, and I see it as necessary to have best success with the udt-as-wstring feature.
marpon
Posts: 342
Joined: Dec 28, 2012 13:31
Location: Paris - France

Re: Extending Wstring and Zstring with UDTs

Post by marpon »

@coderJeff
i'm happy, you are digging on wstring features,
i was surprised 3 years ago by some wstr function behavour when i was interrested on creating an "unicode dynamic string"
and part because of these strange behavour , i've implemented my first "unicode string" using string udt as container.
https://github.com/marpon/uStringW

hope with your joined efforts you and Juergen , we will soon have an almost native 'unicode dynamic string' for fb
coderJeff
Site Admin
Posts: 4313
Joined: Nov 04, 2005 14:23
Location: Ontario, Canada
Contact:

Re: Extending Wstring and Zstring with UDTs

Post by coderJeff »

Thanks marpon. I noticed that your uStringW class has handling for surrogates. And as far as I can tell fb rtlib mostly does not. There are some cases where it won't matter that fb-rtlib is only looking at the code units (e.g. 16-bit values on windows) rather than the whole code point (e.g. possibly 2 x 16-bit values for surrogates). Then there are cases where it will matter.

I found some discussion about surrogates here: Error in LEN() command

I've created a pull request for the sf.net #899 and sf.net #900 bugs at Fix wstring bugs in [L|R]TRIM functions #142

But, then I remembered a related issue posted (in a creative format) at Fix six problems about [L/R]TrimAny library functions. #116
Juergen Kuehlwein
Posts: 284
Joined: Mar 07, 2018 13:59
Location: Germany

Re: Extending Wstring and Zstring with UDTs

Post by Juergen Kuehlwein »

Jeff,

i did it the same way, step by step, working through the code for each statement, which needed to be changed. In the end it sums up to what you have right now as a pull request.

regarding sf.net #900:

it should be:

Code: Select all

...
        len = fb_wstr_Len( src );
        p = src;
...        
in strw.trimex.c and strw.ltrimex. P is initialized to NULL and never set otherwise, if pattern is empty. Therefore there is nothing to copy and nothing is returned.


JK


later: oh, i see you already found it
marpon
Posts: 342
Joined: Dec 28, 2012 13:31
Location: Paris - France

Re: Extending Wstring and Zstring with UDTs

Post by marpon »

Yes i remember that topic, i've put my 2 cents there at that time.
What i could imagine is to have robust wstr functions also working on dynamic wstrings
that means : for windows 16 bits USC2 only ; on linux 32bits USC4

and for windows full unicode UTF16 (including surrogates pairs) , to have different functions (with different names ) playing with.
because these surrogates will have a huge impact on speed if the normal functions take care of that.

For me, because i've tried to play a little with surrogates story, i consider its much more than dynamic wstring ,
its a job for a full unicode lib working on code points and sometimes more than that...

I think freebasic as a compiler should provide robust basic bricks , today dynamic wstring should be one of these bricks.
it was what i was promoting 3 years ago, happy to see your implication on that subject.

thanks again
coderJeff
Site Admin
Posts: 4313
Joined: Nov 04, 2005 14:23
Location: Ontario, Canada
Contact:

Re: Extending Wstring and Zstring with UDTs

Post by coderJeff »

Juergen Kuehlwein wrote:i did it the same way, step by step, working through the code for each statement, which needed to be changed. In the end it sums up to what you have right now as a pull request.
That may be so, but it would be easier for both of us had the changes been made in incremental steps. Some of the changes in your pull request aren't related to each other and could have been made as separate commits or even separate pull requests. And some of the problems we discussed back in December and January are still there.

Anyway, I'm not complaining. In hindsight, I can see how I might have been more helpful to you earlier on. Like I said, you did a great job finding the places in the code that need to be looked at. It's a good starting point, but it needs more.

All the changes I've worked through so far are at https://github.com/jayrm/fbc/tree/udt-wstring
And I've added the same changes to your pull request at at https://github.com/jayrm/fbc/tree/jklwn-ustring
plus removed some parts that I think are unrelated to the main feature.
coderJeff
Site Admin
Posts: 4313
Joined: Nov 04, 2005 14:23
Location: Ontario, Canada
Contact:

Re: Extending Wstring and Zstring with UDTs

Post by coderJeff »

marpon wrote:I think freebasic as a compiler should provide robust basic bricks , today dynamic wstring should be one of these bricks.
it was what i was promoting 3 years ago, happy to see your implication on that subject.
Bricks are good. It's a good place to start. It's disappointing though to look through the code base and know exactly where UTF-16 will fail. Much the same way that UTF-8 fails when stored in a zstring. Primarily with string indexing, variable[index], len(), and character based operations, L|R|Trim( , any pattern ), Instr/InstrRev( , any pattern ), etc.

Maybe we could at least document the platform differences in the wiki, because not every use of UTF-16 is a complete fail. There are many operations that still work.
Juergen Kuehlwein
Posts: 284
Joined: Mar 07, 2018 13:59
Location: Germany

Re: Extending Wstring and Zstring with UDTs

Post by Juergen Kuehlwein »

Jeff,


as a first step dynamic UCS-2 strings will be supported and there will remain problems with UTF-16. This is true, but UCS-2 will be more than nothing. FB´s WSTRING currently supports only UCS-2 encoding in many places. So for full UTF-16 support, many changes in the runtime would be necessary. The good news is, that USC-2 covers most western character sets quite sufficient, you really need UTF-16 for Asian languages, which traditionally have a different characters for each word. So it definitely is a step forward!
Anyway, I'm not complaining. In hindsight, I can see how I might have been more helpful to you earlier on. Like I said, you did a great job finding the places in the code that need to be looked at. It's a good starting point, but it needs more.
Looking at what you already did, i see, that your changes go deeper into the compiler´s inner working. That´s fine. I wanted to stay on the surface, because my knowledge of the compiler isn´t by far as good and complete as yours. My biggest concern was not to spoil things by applying changes to places, where i don´t understand all the consequences. So i´m not disappointed at all, that you don´t merge my code in a one to one manner. See my pull request as a proposal and as a proof of concept and make the best of it. I´m glad you accepted the basic idea and now re-work it, so that in the end we will have the functionality i wanted to add.


I see you removed the string extension functions (stringex.bi) and the array functions (array.bi). Should i make a new pull request for these features as they are right now (include files), or should i try to convert them to C in order to add them to the run time library?


JK
coderJeff
Site Admin
Posts: 4313
Joined: Nov 04, 2005 14:23
Location: Ontario, Canada
Contact:

Re: Extending Wstring and Zstring with UDTs

Post by coderJeff »

JK, I am grateful for your understanding. Yes, I encourage you to keep working at it. You have shown yourself to be capable of building the fbc sources, understanding the code base to a fair degree, and the desire to contribute to the project. I am thankful.

If I may be so bold, next step for you is to get a some of your contributions merged in. May I suggest that you start with a small change. It was a learning experience for me as well when I made my first few pull requests having been away from the project for several years. dkl was very helpful in reviewing my code and helping me to learn where it could be improved to make quality contributions to the project. I hope you can have a similar experience with the development team; and it will be most productive if you start with something small.

For example, in your "inc/array.bi" there is this new feature you are introducing:

Code: Select all

private Function arrayDescriptorGetPtrFunction (Byval p As Any Ptr) As Any Ptr    'thanks to fxm
  Return p
End function

#macro arrayDescriptorPtr(array, p)                   'thanks to fxm
  Scope
    Dim As Function (() As Typeof((array))) As Any Ptr f
    f = Cast(Function (() As Typeof((array))) As Any Ptr, @arrayDescriptorGetPtrFunction)
    p = f(array())
  End Scope
#endmacro
- this looks like a workaround to a missing compiler feature; but could this be implemented better?
- we don't publish an API for manipulating the array descriptor; could we? Even if the array descriptor API changes from one version of fbc to the next, "inc/array.bi" could expose the array descriptor much the same way as "fbgfx.bi" exposes the fb.IMAGE structure
- Having the function available only if "inc/array.bi" is included is one possibility. What about VARPTR( array )? Any conflicts or ambiguity?
- and so on.

Exposing details of the array descriptor alone could be a pull request all on it's own. Ideally, when you create the pull request, you become the expert on the feature you are adding (trust me, developers and users scrutinize everything) so it us worth the initial investment. And it gives you a starting point to build on, adding the other array features that are in "inc/array.bi". It doesn't have to happen all in one big change set.

If you wish to start a new thread for general discussion, I can split this thread, or you can quote this post. Your choice. Thank-you again for your efforts to push fbc's development forward.
fxm
Moderator
Posts: 12081
Joined: Apr 22, 2009 12:46
Location: Paris suburbs, FRANCE

Re: Extending Wstring and Zstring with UDTs

Post by fxm »

Yes, I proposed this code (viewtopic.php?p=241393#p241393) for accessing an array descriptor rather as a user hacking and not as a developer method.

I added a post (viewtopic.php?p=260699#p260699) in this above referred topic, with a better definition of the array descriptor structure.
coderJeff
Site Admin
Posts: 4313
Joined: Nov 04, 2005 14:23
Location: Ontario, Canada
Contact:

Re: Extending Wstring and Zstring with UDTs

Post by coderJeff »

Pull request created for the first round of changes: user defined types can extend zstring or wstring #150

Allow a user defined type (UDT) to inherit properties and behaviours of a zstring or wstring by extending existing syntax. Purpose is to allow users to create custom string types (i.e. dynamic memory management) that can integrate well in to existing fbc compiler built ins.

New Syntax:

Code: Select all

type UDT extends zstring|wstring [, base-type]
end type
In places where fbc compiler would have normally rejected a UDT, this declaration of a UDT will instruct compiler to convert the UDT to a z|wstring through a suitable CAST operator.

zstring|wstring behaviour can be inherited directly from zstring|wstring, or indirectly and singly from the base type:

Code: Select all

type B extends wstring
	'' member data
	'' inherits from wstring
end type

type D extends B
	'' D inherits from B and wstring
end type
Or zstring|wstring behaviours can also be inherited in a UDT with a kind of pseudo multiple-inheritance:

Code: Select all

type B
	'' member data
end type

type D extends wstring, B
	'' D inherits from B and wstring
end type
Changed:
- SADD/STRPTR(wstring) returns WSTRING PTR

Added:
- 'TYPE udt EXTENDS Z|WSTRING' allowed to specify that UDT is a kind of Z|WSTRING
- LTRIM/RTRIM/TRIM will accept UDT as Z|WSTRING
- LCASE/UCASE will accept UDT as Z|WSTRING
- Cxxx() conversion functions will accept UDT as Z|WSTRING
- INSTR/INSTRREV will accept UDT as Z|WSTRING
- MID function will accept UDT as Z|WSTRING
- SADD/STRPTR will accept UDT as Z|WSTRING to return Z|WSTRING ptr
- LSET/RSET statements will accept UDT as Z|WSTRING
- MID statement will accept UDT as Z|WSTRING
- ASC function will accept UDT as Z|WSTRING
- STR/WSTR function will accept UDT as Z|WSTRING to return a Z|WSTRING

The only change to existing behaviour is SADD/STRPTR(wstring) now returns WSTRING PTR. This will break any user code where STRPTR(wstring) is expected to return a ZSTRING PTR

New implicit conversions and behaviours are activated only if a UDT extends zstring or wstring.

In future expect separate pull requests for:
- LEFT, RIGHT to accept UDT extends z|wstring without having to explicitly overload LEFT, tRIGHT. This involves fixing, in my opinion multiple string related bugs on sf.net
- IIF to allow handling UDT's that inherit zstring|wstring
- SELECT CASE to allow handling UDT's that inherit zstring|wstring
- SWAP which maybe would be handled best with an overloaded member function rather than implicit conversion to `zstring|wstring`

This pull request introduces the initial feature and handles nearly all the quirk keywords. I'd like to merge this change in before working on the remaining features in separate pull requests.
Post Reply