StringArray Sort (case independent)

Post your FreeBASIC tips and tricks here. Please don’t post your code without including an explanation.
MrSwiss
Posts: 3220
Joined: Jun 02, 2013 9:27
Location: Switzerland

StringArray Sort (case independent)

Postby MrSwiss » Jul 26, 2019 15:54

It seems, that in many cases of sorting a StringArray, the case's inside the individual
Strings, are simply ignored, e.g. QuickSort ... (this leads to: incorrect sort result!).

Example: "NOT" comes before "No" (it should be vice versa, from a human perspective)
(From a strictly numeric point of view, the sort is correct "as-is"!)

The Question:
How can we solve the dilemma presented above?
I'd suggest a three step approach:
1) figure out the technical reasons first
2) propose a workaround that eliminates the technical barrier(s)
3) implement 2nd in code, using all coding tricks we know of

The problem, detailed (step 1):

- Alphabet consists of: 2 distinct ranges (A - Z and a - z), in ASCII-numbers.
(see: ASCII-Table in Manual)
- QuickSort sorts: low to high (only numerical values considered)
- Therefore: 'Upper Case' superseeds 'Lower Case' always
(this is the actually problematic point at issue, to be solved)

My solution approach (step 2):
- Preprocessing (make Strings all the same case (UCase() used here)
- Sorting (QuickSort, for String)
- Postprocessing (restore sorted Array to 'original' case, again)

The solution (step 3):
- my assesment of the problem (step 1), might be wrong (to start off with)
- this whould invalidate step 2/3, of course
- before posting code here, I'm interested in other, maybe better approches you'd
propose to solve the issue, as explained above in words, not in code
dodicat
Posts: 5913
Joined: Jan 10, 2006 20:30
Location: Scotland

Re: StringArray Sort (case independent)

Postby dodicat » Jul 26, 2019 17:34

Let DOS take the strain?

Code: Select all

 


Function StringSplit(s_in As String,chars As String,result() As String) As Long
    Dim As Long ctr,ctr2,k,n,LC=len(chars)
    dim As boolean tally(Len(s_in))
    #macro check_instring()
        n=0
        while n<Lc
        If chars[n]=s_in[k] Then
        tally(k)=true
        If (ctr2-1) Then ctr+=1
        ctr2=0
        exit while
        end if
        n+=1
       wend
    #endmacro
   
    #macro split()
    If tally(k) Then
        If (ctr2-1) Then ctr+=1:result(ctr)=Mid(s_in,k+2-ctr2,ctr2-1)
        ctr2=0
    End If
    #endmacro
    '==================  LOOP TWICE =======================
    For k  =0 To Len(s_in)-1
        ctr2+=1:check_instring()
    Next k
    if ctr=0 then
         if len(s_in) andalso instr(chars,chr(s_in[0])) then ctr=1':beep
         end if
    If ctr Then Redim result(1 To ctr): ctr=0:ctr2=0 Else  Return 0
    For k  =0 To Len(s_in)-1
        ctr2+=1:split()
    Next k
    '===================== Last one ========================
    If ctr2>0 Then
        Redim Preserve result(1 To ctr+1)
        result(ctr+1)=Mid(s_in,k+1-ctr2,ctr2)
    End If
    Return Ubound(result)
End Function

sub sort(s() as string,d as string="down")
    var f=freefile
    var cd=curdir
    dim as string g,cmd,text
    open "templist__.txt" for output as #f
    for n as long=lbound(s) to ubound(s)
       print #f,s(n)
    next n
    close #f
    if d="down" then
    cmd= "SORT/r " + cd + "\templist__.txt /o " +cd +"\result__.txt"
    else
     cmd= "SORT " + cd + "\templist__.txt /o " +cd +"\result__.txt"
    end if
    shell cmd
    Open "result__.txt" For Binary Access Read As #f
    If Lof(f) > 0 Then
      text = String(Lof(f), 0)
      Get #f, , text
    End If
    Close #f
    redim as string s2()
   StringSplit(text,chr(10),s2())
    for n as long=lbound(s2) to ubound(s2)
        s(n)=(s2(n))
    next
    kill cd+"\result__.txt"
    kill cd+"\templist__.txt"
end sub

dim as string s(1 to ...)={"Zanussi","Not","No","A","C","B","b"}

sort(s(),"up")
for n as long=lbound(s) to ubound(s)
    print n,s(n)
next
print

sleep

MrSwiss
Posts: 3220
Joined: Jun 02, 2013 9:27
Location: Switzerland

Re: StringArray Sort (case independent)

Postby MrSwiss » Jul 26, 2019 17:46

dodicat wrote:Let DOS take the strain?

Nice idea, but not what I've intended (make QuickSort return correctly).
There is no need for file I/O ... can be done in memory alone.
My intention is: to get those 'coders brains' up to speed, once in a while.
Lost Zergling
Posts: 240
Joined: Dec 02, 2011 22:51
Location: France

Re: StringArray Sort (case independent)

Postby Lost Zergling » Jul 26, 2019 18:06

I would use my leaky lzle to fill a list using Lcase or Ucase for Tag0 (keys) and original keys as values or Tag1 ?...
dodicat
Posts: 5913
Joined: Jan 10, 2006 20:30
Location: Scotland

Re: StringArray Sort (case independent)

Postby dodicat » Jul 26, 2019 21:14

Just pop lcase into the callback function

Code: Select all

 
'#undef lcase
'#define lcase


#include once "crt.bi"

'=========  set up c sort =========
#define up <,>
#define down >,<
#define ArrayToSort(x,start,finish) @X((start)),((finish)-(start)+1),Sizeof(X)
#macro SetCSort(Datatype,FnName,b1,b2,dot)
Function FnName Cdecl(n1 As Any Ptr,n2 As Any Ptr) As long
    If lcase(*Cptr(Datatype Ptr,n1))dot b1 lcase(*Cptr(Datatype Ptr,n2))dot Then Return -1
    If lcase(*Cptr(DataType Ptr,n1))dot b2 lcase(*Cptr(DataType Ptr,n2))dot Then Return 1
    return 0
End Function
#endmacro


setCsort(string,callback,up,)
dim as string s(1 to ...)={"Zanussi","Not","No","A","C","B","d","a"}


qsort(arraytosort(s,Lbound(s),Ubound(s)),@callback)

for n as long=lbound(s) to ubound(s)
    print n,s(n)
next
print

sleep

 

Re comment lines 2 and 3 to see the raw result.

Here it is laid out.

Code: Select all

 


#include "crt.bi"

Function callbackU Cdecl(n1 As Any Ptr,n2 As Any Ptr) As Long
    If Lcase(*Cptr(String Ptr,n1)) < Lcase(*Cptr(String Ptr,n2)) Then Return -1
    If Lcase(*Cptr(String Ptr,n1)) > Lcase(*Cptr(String Ptr,n2)) Then Return 1
    Return 0
End Function

Function callbackD Cdecl(n1 As Any Ptr,n2 As Any Ptr) As Long
    If Lcase(*Cptr(String Ptr,n1)) > Lcase(*Cptr(String Ptr,n2)) Then Return -1
    If Lcase(*Cptr(String Ptr,n1)) < Lcase(*Cptr(String Ptr,n2)) Then Return 1
    Return 0
End Function

Sub sortstring(s() As String,L As Long,U As Long,direction As String="up")
    If Lcase(direction)="up" Then
        qsort( @s((L)),((U)-(L)+1),Sizeof(s),@callbackU)
    Else
        qsort( @s((L)),((U)-(L)+1),Sizeof(s),@callbackD)
    End If
End Sub


Dim As String s(1 To ...)={"Zanussi","Not","No","A","C","B","d","a"}

sortstring(s(),Lbound(s),Ubound(s),"up")

For n As Long=Lbound(s) To Ubound(s)
    Print n,s(n)
Next
Print

Sleep

 
MrSwiss
Posts: 3220
Joined: Jun 02, 2013 9:27
Location: Switzerland

Re: StringArray Sort (case independent)

Postby MrSwiss » Jul 26, 2019 22:34

Jep, dodicat got it, again!

Here comes my implementation, a bit more code, but no "crt.bi", because
Q_Sort() is implemented too. All I'm doing is:
wrap Q_Sort() within another Sub Q_SortUniStr(), which does the pre-/post-
processing, before and after Q_Sort(), with temp. string array:

Code: Select all

' Q_Sort_String_test1.bas -- (c) 2019-07-26, MrSwiss
'
' compile: -s console
'
' translated from C source: quick(int array[], int start, int end) {}
Private Sub Q_Sort OverLoad( _          ' modified C to FB (strings)
          sa()  As String, _            ' sa = string-array
    ByVal start As Long, _              ' lbound | below: ubound
    ByVal end_  As Long _               ' added underline (keyword conflict)
    )
    If start < end_ Then
        Dim As String   p = sa(start)
        Dim As Long     l = start + 1, r = end_
        While l < r
            If sa(l) <= p Then
                l += 1
            ElseIf sa(r) >= p Then
                r -= 1
            Else
                Swap sa(l), sa(r)
            End If
        Wend
        If sa(l) < p Then
            Swap sa(l), sa(start)
            l -= 1
        Else
            l -= 1
            Swap sa(l), sa(start)
        End If
        Q_Sort(sa(), start, l)
        Q_Sort(sa(), r, end_)
    End If
End Sub

Private Sub Q_SortUniStr( _             ' Q_Sort Unified String
          a()   As String, _            ' array to be sorted
          b()   As String  _            ' original string array (dyn./fixed)
    )
    Dim As Long lb_a = LBound(a), ub_a = UBound(a), _
                lb_b = LBound(b), ub_b = UBound(b)  ' store array's bounds
    ReDim As String t(lb_a To ub_a)     ' temp. dynamic array
    ' all strings unified to: upper case only
    For i As Integer = lb_a To ub_a     ' copy a() to t() upper cased
        t(i) = UCase(a(i))              ' prepare for sorting
    Next
    ' for correct string sort, we need it with: all the same case
    Q_Sort(t(), lb_a, ub_a)             ' sort it
    ' restore strings original case (from b())
    For i As Integer = lb_b To ub_b     ' get from original string array
        Var ts = b(i)                   ' store one string for comparison
        For j As Integer = lb_a To ub_a ' compare to t(j)
            If Len(t(j)) = Len(ts) Then ' only if equal len() (otherwise skip)
                If t(j) = UCase(ts) Then' only if strings are the same (dito)
                    a(j) = ts           ' assign original string to a(j)
                End If
            End If
        Next
    Next
    Erase t                             ' reset t()
End Sub
' Q_Sort procedures END


#Define iRange(l, h)    ( Int(Rnd() * ((h + 1) - (l)) + (l)) )

Randomize(Timer, 3)
For i As UInteger = 0 To 999
    Rnd() : Rnd() : Rnd() : Rnd() : _
    Rnd() : Rnd() : Rnd() : Rnd()
Next

ReDim As String a(Any)                  ' dynamic array (below: fixed size array)
Dim As String b(...) = { "Wednesday", "Thursday", "Friday", "Null", _
                         "September", "October", "November", "Copy", _
                         "Saturday", "Sunday", "Monday", "Pasta", _
                         "December", "January", "February", "Cut", _
                         "Apple", "Pear", "Orange", "Banana", "No", _
                         "Tuesday", "March", "April", "May", "Rice", _
                         "Yes", "June", "July", "August", "Zero", _
                         "EOF", "Error", "OR", "NOR", "XOR", "EXOR" }
Dim As Long lb_a, ub_a, lb_b, ub_b

ReDim a(99)                             ' 0 to 99 (aka: 100 elements)
lb_b = LBound(b) : ub_b = UBound(b)     ' store array bounds (of b())
lb_a = LBound(a) : ub_a = UBound(a)     ' store array bounds (of a())

For i As Integer = lb_a To ub_a         ' initialize array randomly
    a(i) = b(iRange(lb_b, ub_b))        ' using single line macro (#define)
    Print "a("; Str(i); "): "; _
          Tab(10); a(i)                 ' show it
Next

Print : Print
Print "press a key to SORT array ";
Sleep : Print : Print

Q_SortUniStr(a(), b())                  ' sort string array

For i As Integer = lb_a To ub_a
    Print "a("; Str(i); "): "; _
          Tab(10); a(i)                 ' show it
Next

Print : Print
Print "press a key to EXIT program ... ";
Sleep
This is of course not, what I'm really using ...
Just copy/paste from my "Q_Sort.bi" with 'overloaded' Q_Sort's for
different data-types (Long, Double, String).
fxm
Posts: 9126
Joined: Apr 22, 2009 12:46
Location: Paris suburbs, FRANCE

Re: StringArray Sort (case independent)

Postby fxm » Jul 27, 2019 15:44

Variant of 'Q_SortUniStr' ('Q_SortUniStr(a() As String)') to respect the initial case, even in case of string duplication but with different cases:

Code: Select all

Private Sub Q_SortUniStr( _                         ' Q_Sort Unified String
          a()   As String _                         ' array to be sorted
    )
    Dim As Long lb_a = LBound(a), ub_a = UBound(a)  ' store array's bounds
    ReDim As String t(lb_a To ub_a)                 ' temp. dynamic array
    ReDim As Boolean used(lb_a To ub_a)
    ' all strings unified to: upper case only
    For i As Integer = lb_a To ub_a                 ' copy a() to t() upper cased
        t(i) = UCase(a(i))                          ' prepare for sorting
    Next
    ' for correct string sort, we need it with: all the same case
    Q_Sort(t(), lb_a, ub_a)                         ' sort it
    ' restore strings original case (from a())
    For i As Integer = lb_a To ub_a
        For j As Integer = lb_a To ub_a
            If t(i) = Ucase(a(j)) Then
                If used(j) = False Then
                    Swap a(i), a(j)
                    used(i) = True
                    Exit For
                End If
            End If
        Next j
    Next i
    Erase t                                         ' reset t()
End Sub
MrSwiss
Posts: 3220
Joined: Jun 02, 2013 9:27
Location: Switzerland

Re: StringArray Sort (case independent)

Postby MrSwiss » Jul 27, 2019 17:28

fxm wrote:in case of string duplication but with different cases:
Sorry, not useful IMO, because:
- only for use, if 2 differing occurences (not triplicates, or larger multiples)
- I'm usually after sorting, killing all multiple occurences, anyway ...
(the pick of case can be influenced, by positioning in array b() ...)

No own, 'original' ideas?
fxm
Posts: 9126
Joined: Apr 22, 2009 12:46
Location: Paris suburbs, FRANCE

Re: StringArray Sort (case independent)

Postby fxm » Jul 27, 2019 17:44

MrSwiss wrote:- only for use, if 2 differing occurences (not triplicates, or larger multiples)

Have you really tested it?
MrSwiss
Posts: 3220
Joined: Jun 02, 2013 9:27
Location: Switzerland

Re: StringArray Sort (case independent)

Postby MrSwiss » Jul 27, 2019 17:49

fxm wrote:Have you really tested it?
No, just NOT interrested ...
I'm only keeping unique entry's, as already stated!
(that code was omitted, to keep close to the stated case)

What I'm interrested in, is: No own, 'original' ideas?
Last edited by MrSwiss on Jul 27, 2019 17:55, edited 1 time in total.
dodicat
Posts: 5913
Joined: Jan 10, 2006 20:30
Location: Scotland

Re: StringArray Sort (case independent)

Postby dodicat » Jul 27, 2019 17:53

Using the standard quicksort with the option of producing dos style string sort (case independent) or sort as the array stands.(comment out lcase)

i.e.
#define casetype 'lcase

Code: Select all

 

#define datatype  string
#define casetype  lcase

Sub sort(array() As DataType,begin As Long=0,Finish As Long=-1)
    If begin>finish Then begin=Lbound(array):Finish=Ubound(array)
    Dim As Long i=begin,j=finish
    Dim As DataType x=casetype(array(((I+J)\2)))
    While I <= J
        While casetype(array(I)) < X :I+=1:Wend 'CHANGE TO > FOR SORTING DOWN
            While casetype(array(J)) > X :J-=1:Wend 'CHANGE TO < FOR SORTING DOWN
                If I<=J Then Swap array(I),array(J): I+=1:J-=1
            Wend
            If J >begin Then  sort(array(),begin,J)
            If I <Finish Then sort(array(),I,Finish)
        End Sub
       
        Dim As String b(...) = { "Wednesday", "Thursday", "Friday", "Null", _
        "September", "October", "November", "Copy", _
        "Saturday", "Sunday", "Monday", "Pasta", _
        "December", "January", "February", "Cut", _
        "Apple", "Pear", "Orange", "Banana", "No", _
        "Tuesday", "March", "April", "May", "Rice", _
        "Yes", "June", "July", "August", "Zero", _
        "EOF", "Error", "OR", "NOR", "XOR", "EXOR","wed","thur","fri","mon","tue","Thur", _
        "jan","feb","Dec","Mon","OCT"}
       
        sort b(),Lbound(b),Ubound(b)
       
        For n As Long=Lbound(b) To Ubound(b)
            Print n,b(n)
        Next
        Print
        Sleep
jj2007
Posts: 1214
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: StringArray Sort (case independent)

Postby jj2007 » Jul 27, 2019 23:30

Converting all strings to lcase or ucase strings is an overkill, and probably very slow. It would be sufficient to use a string compare function that is case-insensitive. If there is interest, I can offer a 32-bit DLL using StringsDiffer().
MrSwiss
Posts: 3220
Joined: Jun 02, 2013 9:27
Location: Switzerland

Re: StringArray Sort (case independent)

Postby MrSwiss » Jul 27, 2019 23:44

While 64 bit only is acceptable but, 32 bit only is totally inacceptable ...
Preferable is anyway: both!
That's why we're using FB and not "lower level languages" after all! ;-)
jj2007
Posts: 1214
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: StringArray Sort (case independent)

Postby jj2007 » Aug 01, 2019 18:04

Well, for me slow code is totally unacceptable. De gustibus non est disputandum ;-)
dodicat
Posts: 5913
Joined: Jan 10, 2006 20:30
Location: Scotland

Re: StringArray Sort (case independent)

Postby dodicat » Aug 01, 2019 20:05

I did some speed tests.
The dos sort was by far the fastest (3.2 seconds for one million shortish strings)
Standard Quicksort was about eight seconds.
The C runtime was about ten seconds.

Return to “Tips and Tricks”

Who is online

Users browsing this forum: No registered users and 2 guests