## StringArray Sort (case independent)

MrSwiss
Posts: 3348
Joined: Jun 02, 2013 9:27
Location: Switzerland

### StringArray Sort (case independent)

It seems, that in many cases of sorting a StringArray, the case's inside the individual
Strings, are simply ignored, e.g. QuickSort ... (this leads to: incorrect sort result!).

Example: "NOT" comes before "No" (it should be vice versa, from a human perspective)
(From a strictly numeric point of view, the sort is correct "as-is"!)

The Question:
How can we solve the dilemma presented above?
I'd suggest a three step approach:
1) figure out the technical reasons first
2) propose a workaround that eliminates the technical barrier(s)
3) implement 2nd in code, using all coding tricks we know of

The problem, detailed (step 1):

- Alphabet consists of: 2 distinct ranges (A - Z and a - z), in ASCII-numbers.
(see: ASCII-Table in Manual)
- QuickSort sorts: low to high (only numerical values considered)
- Therefore: 'Upper Case' superseeds 'Lower Case' always
(this is the actually problematic point at issue, to be solved)

My solution approach (step 2):
- Preprocessing (make Strings all the same case (UCase() used here)
- Sorting (QuickSort, for String)
- Postprocessing (restore sorted Array to 'original' case, again)

The solution (step 3):
- my assesment of the problem (step 1), might be wrong (to start off with)
- this whould invalidate step 2/3, of course
- before posting code here, I'm interested in other, maybe better approches you'd
propose to solve the issue, as explained above in words, not in code
dodicat
Posts: 6153
Joined: Jan 10, 2006 20:30
Location: Scotland

### Re: StringArray Sort (case independent)

Let DOS take the strain?

Code: Select all

Function StringSplit(s_in As String,chars As String,result() As String) As Long
Dim As Long ctr,ctr2,k,n,LC=len(chars)
dim As boolean tally(Len(s_in))
#macro check_instring()
n=0
while n<Lc
If chars[n]=s_in[k] Then
tally(k)=true
If (ctr2-1) Then ctr+=1
ctr2=0
exit while
end if
n+=1
wend
#endmacro

#macro split()
If tally(k) Then
If (ctr2-1) Then ctr+=1:result(ctr)=Mid(s_in,k+2-ctr2,ctr2-1)
ctr2=0
End If
#endmacro
'==================  LOOP TWICE =======================
For k  =0 To Len(s_in)-1
ctr2+=1:check_instring()
Next k
if ctr=0 then
if len(s_in) andalso instr(chars,chr(s_in[0])) then ctr=1':beep
end if
If ctr Then Redim result(1 To ctr): ctr=0:ctr2=0 Else  Return 0
For k  =0 To Len(s_in)-1
ctr2+=1:split()
Next k
'===================== Last one ========================
If ctr2>0 Then
Redim Preserve result(1 To ctr+1)
result(ctr+1)=Mid(s_in,k+1-ctr2,ctr2)
End If
Return Ubound(result)
End Function

sub sort(s() as string,d as string="down")
var f=freefile
var cd=curdir
dim as string g,cmd,text
open "templist__.txt" for output as #f
for n as long=lbound(s) to ubound(s)
print #f,s(n)
next n
close #f
if d="down" then
cmd= "SORT/r " + cd + "\templist__.txt /o " +cd +"\result__.txt"
else
cmd= "SORT " + cd + "\templist__.txt /o " +cd +"\result__.txt"
end if
shell cmd
Open "result__.txt" For Binary Access Read As #f
If Lof(f) > 0 Then
text = String(Lof(f), 0)
Get #f, , text
End If
Close #f
redim as string s2()
StringSplit(text,chr(10),s2())
for n as long=lbound(s2) to ubound(s2)
s(n)=(s2(n))
next
kill cd+"\result__.txt"
kill cd+"\templist__.txt"
end sub

dim as string s(1 to ...)={"Zanussi","Not","No","A","C","B","b"}

sort(s(),"up")
for n as long=lbound(s) to ubound(s)
print n,s(n)
next
print

sleep

MrSwiss
Posts: 3348
Joined: Jun 02, 2013 9:27
Location: Switzerland

### Re: StringArray Sort (case independent)

dodicat wrote:Let DOS take the strain?

Nice idea, but not what I've intended (make QuickSort return correctly).
There is no need for file I/O ... can be done in memory alone.
My intention is: to get those 'coders brains' up to speed, once in a while.
Lost Zergling
Posts: 260
Joined: Dec 02, 2011 22:51
Location: France

### Re: StringArray Sort (case independent)

I would use my leaky lzle to fill a list using Lcase or Ucase for Tag0 (keys) and original keys as values or Tag1 ?...
dodicat
Posts: 6153
Joined: Jan 10, 2006 20:30
Location: Scotland

### Re: StringArray Sort (case independent)

Just pop lcase into the callback function

Code: Select all

'#undef lcase
'#define lcase

#include once "crt.bi"

'=========  set up c sort =========
#define up <,>
#define down >,<
#define ArrayToSort(x,start,finish) @X((start)),((finish)-(start)+1),Sizeof(X)
#macro SetCSort(Datatype,FnName,b1,b2,dot)
Function FnName Cdecl(n1 As Any Ptr,n2 As Any Ptr) As long
If lcase(*Cptr(Datatype Ptr,n1))dot b1 lcase(*Cptr(Datatype Ptr,n2))dot Then Return -1
If lcase(*Cptr(DataType Ptr,n1))dot b2 lcase(*Cptr(DataType Ptr,n2))dot Then Return 1
return 0
End Function
#endmacro

setCsort(string,callback,up,)
dim as string s(1 to ...)={"Zanussi","Not","No","A","C","B","d","a"}

qsort(arraytosort(s,Lbound(s),Ubound(s)),@callback)

for n as long=lbound(s) to ubound(s)
print n,s(n)
next
print

sleep

Re comment lines 2 and 3 to see the raw result.

Here it is laid out.

Code: Select all

#include "crt.bi"

Function callbackU Cdecl(n1 As Any Ptr,n2 As Any Ptr) As Long
If Lcase(*Cptr(String Ptr,n1)) < Lcase(*Cptr(String Ptr,n2)) Then Return -1
If Lcase(*Cptr(String Ptr,n1)) > Lcase(*Cptr(String Ptr,n2)) Then Return 1
Return 0
End Function

Function callbackD Cdecl(n1 As Any Ptr,n2 As Any Ptr) As Long
If Lcase(*Cptr(String Ptr,n1)) > Lcase(*Cptr(String Ptr,n2)) Then Return -1
If Lcase(*Cptr(String Ptr,n1)) < Lcase(*Cptr(String Ptr,n2)) Then Return 1
Return 0
End Function

Sub sortstring(s() As String,L As Long,U As Long,direction As String="up")
If Lcase(direction)="up" Then
qsort( @s((L)),((U)-(L)+1),Sizeof(s),@callbackU)
Else
qsort( @s((L)),((U)-(L)+1),Sizeof(s),@callbackD)
End If
End Sub

Dim As String s(1 To ...)={"Zanussi","Not","No","A","C","B","d","a"}

sortstring(s(),Lbound(s),Ubound(s),"up")

For n As Long=Lbound(s) To Ubound(s)
Print n,s(n)
Next
Print

Sleep

MrSwiss
Posts: 3348
Joined: Jun 02, 2013 9:27
Location: Switzerland

### Re: StringArray Sort (case independent)

Jep, dodicat got it, again!

Here comes my implementation, a bit more code, but no "crt.bi", because
Q_Sort() is implemented too. All I'm doing is:
wrap Q_Sort() within another Sub Q_SortUniStr(), which does the pre-/post-
processing, before and after Q_Sort(), with temp. string array:

Code: Select all

' Q_Sort_String_test1.bas -- (c) 2019-07-26, MrSwiss
'
' compile: -s console
'
' translated from C source: quick(int array[], int start, int end) {}
Private Sub Q_Sort OverLoad( _          ' modified C to FB (strings)
sa()  As String, _            ' sa = string-array
ByVal start As Long, _              ' lbound | below: ubound
ByVal end_  As Long _               ' added underline (keyword conflict)
)
If start < end_ Then
Dim As String   p = sa(start)
Dim As Long     l = start + 1, r = end_
While l < r
If sa(l) <= p Then
l += 1
ElseIf sa(r) >= p Then
r -= 1
Else
Swap sa(l), sa(r)
End If
Wend
If sa(l) < p Then
Swap sa(l), sa(start)
l -= 1
Else
l -= 1
Swap sa(l), sa(start)
End If
Q_Sort(sa(), start, l)
Q_Sort(sa(), r, end_)
End If
End Sub

Private Sub Q_SortUniStr( _             ' Q_Sort Unified String
a()   As String, _            ' array to be sorted
b()   As String  _            ' original string array (dyn./fixed)
)
Dim As Long lb_a = LBound(a), ub_a = UBound(a), _
lb_b = LBound(b), ub_b = UBound(b)  ' store array's bounds
ReDim As String t(lb_a To ub_a)     ' temp. dynamic array
' all strings unified to: upper case only
For i As Integer = lb_a To ub_a     ' copy a() to t() upper cased
t(i) = UCase(a(i))              ' prepare for sorting
Next
' for correct string sort, we need it with: all the same case
Q_Sort(t(), lb_a, ub_a)             ' sort it
' restore strings original case (from b())
For i As Integer = lb_b To ub_b     ' get from original string array
Var ts = b(i)                   ' store one string for comparison
For j As Integer = lb_a To ub_a ' compare to t(j)
If Len(t(j)) = Len(ts) Then ' only if equal len() (otherwise skip)
If t(j) = UCase(ts) Then' only if strings are the same (dito)
a(j) = ts           ' assign original string to a(j)
End If
End If
Next
Next
Erase t                             ' reset t()
End Sub
' Q_Sort procedures END

#Define iRange(l, h)    ( Int(Rnd() * ((h + 1) - (l)) + (l)) )

Randomize(Timer, 3)
For i As UInteger = 0 To 999
Rnd() : Rnd() : Rnd() : Rnd() : _
Rnd() : Rnd() : Rnd() : Rnd()
Next

ReDim As String a(Any)                  ' dynamic array (below: fixed size array)
Dim As String b(...) = { "Wednesday", "Thursday", "Friday", "Null", _
"September", "October", "November", "Copy", _
"Saturday", "Sunday", "Monday", "Pasta", _
"December", "January", "February", "Cut", _
"Apple", "Pear", "Orange", "Banana", "No", _
"Tuesday", "March", "April", "May", "Rice", _
"Yes", "June", "July", "August", "Zero", _
"EOF", "Error", "OR", "NOR", "XOR", "EXOR" }
Dim As Long lb_a, ub_a, lb_b, ub_b

ReDim a(99)                             ' 0 to 99 (aka: 100 elements)
lb_b = LBound(b) : ub_b = UBound(b)     ' store array bounds (of b())
lb_a = LBound(a) : ub_a = UBound(a)     ' store array bounds (of a())

For i As Integer = lb_a To ub_a         ' initialize array randomly
a(i) = b(iRange(lb_b, ub_b))        ' using single line macro (#define)
Print "a("; Str(i); "): "; _
Tab(10); a(i)                 ' show it
Next

Print : Print
Print "press a key to SORT array ";
Sleep : Print : Print

Q_SortUniStr(a(), b())                  ' sort string array

For i As Integer = lb_a To ub_a
Print "a("; Str(i); "): "; _
Tab(10); a(i)                 ' show it
Next

Print : Print
Print "press a key to EXIT program ... ";
Sleep
This is of course not, what I'm really using ...
Just copy/paste from my "Q_Sort.bi" with 'overloaded' Q_Sort's for
different data-types (Long, Double, String).
fxm
Posts: 9466
Joined: Apr 22, 2009 12:46
Location: Paris suburbs, FRANCE

### Re: StringArray Sort (case independent)

Variant of 'Q_SortUniStr' ('Q_SortUniStr(a() As String)') to respect the initial case, even in case of string duplication but with different cases:

Code: Select all

Private Sub Q_SortUniStr( _                         ' Q_Sort Unified String
a()   As String _                         ' array to be sorted
)
Dim As Long lb_a = LBound(a), ub_a = UBound(a)  ' store array's bounds
ReDim As String t(lb_a To ub_a)                 ' temp. dynamic array
ReDim As Boolean used(lb_a To ub_a)
' all strings unified to: upper case only
For i As Integer = lb_a To ub_a                 ' copy a() to t() upper cased
t(i) = UCase(a(i))                          ' prepare for sorting
Next
' for correct string sort, we need it with: all the same case
Q_Sort(t(), lb_a, ub_a)                         ' sort it
' restore strings original case (from a())
For i As Integer = lb_a To ub_a
For j As Integer = lb_a To ub_a
If t(i) = Ucase(a(j)) Then
If used(j) = False Then
Swap a(i), a(j)
used(i) = True
Exit For
End If
End If
Next j
Next i
Erase t                                         ' reset t()
End Sub
MrSwiss
Posts: 3348
Joined: Jun 02, 2013 9:27
Location: Switzerland

### Re: StringArray Sort (case independent)

fxm wrote:in case of string duplication but with different cases:
Sorry, not useful IMO, because:
- only for use, if 2 differing occurences (not triplicates, or larger multiples)
- I'm usually after sorting, killing all multiple occurences, anyway ...
(the pick of case can be influenced, by positioning in array b() ...)

No own, 'original' ideas?
fxm
Posts: 9466
Joined: Apr 22, 2009 12:46
Location: Paris suburbs, FRANCE

### Re: StringArray Sort (case independent)

MrSwiss wrote:- only for use, if 2 differing occurences (not triplicates, or larger multiples)

Have you really tested it?
MrSwiss
Posts: 3348
Joined: Jun 02, 2013 9:27
Location: Switzerland

### Re: StringArray Sort (case independent)

fxm wrote:Have you really tested it?
No, just NOT interrested ...
I'm only keeping unique entry's, as already stated!
(that code was omitted, to keep close to the stated case)

What I'm interrested in, is: No own, 'original' ideas?
Last edited by MrSwiss on Jul 27, 2019 17:55, edited 1 time in total.
dodicat
Posts: 6153
Joined: Jan 10, 2006 20:30
Location: Scotland

### Re: StringArray Sort (case independent)

Using the standard quicksort with the option of producing dos style string sort (case independent) or sort as the array stands.(comment out lcase)

i.e.
#define casetype 'lcase

Code: Select all

#define datatype  string
#define casetype  lcase

Sub sort(array() As DataType,begin As Long=0,Finish As Long=-1)
If begin>finish Then begin=Lbound(array):Finish=Ubound(array)
Dim As Long i=begin,j=finish
Dim As DataType x=casetype(array(((I+J)\2)))
While I <= J
While casetype(array(I)) < X :I+=1:Wend 'CHANGE TO > FOR SORTING DOWN
While casetype(array(J)) > X :J-=1:Wend 'CHANGE TO < FOR SORTING DOWN
If I<=J Then Swap array(I),array(J): I+=1:J-=1
Wend
If J >begin Then  sort(array(),begin,J)
If I <Finish Then sort(array(),I,Finish)
End Sub

Dim As String b(...) = { "Wednesday", "Thursday", "Friday", "Null", _
"September", "October", "November", "Copy", _
"Saturday", "Sunday", "Monday", "Pasta", _
"December", "January", "February", "Cut", _
"Apple", "Pear", "Orange", "Banana", "No", _
"Tuesday", "March", "April", "May", "Rice", _
"Yes", "June", "July", "August", "Zero", _
"EOF", "Error", "OR", "NOR", "XOR", "EXOR","wed","thur","fri","mon","tue","Thur", _
"jan","feb","Dec","Mon","OCT"}

sort b(),Lbound(b),Ubound(b)

For n As Long=Lbound(b) To Ubound(b)
Print n,b(n)
Next
Print
Sleep
jj2007
Posts: 1318
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

### Re: StringArray Sort (case independent)

Converting all strings to lcase or ucase strings is an overkill, and probably very slow. It would be sufficient to use a string compare function that is case-insensitive. If there is interest, I can offer a 32-bit DLL using StringsDiffer().
MrSwiss
Posts: 3348
Joined: Jun 02, 2013 9:27
Location: Switzerland

### Re: StringArray Sort (case independent)

While 64 bit only is acceptable but, 32 bit only is totally inacceptable ...
Preferable is anyway: both!
That's why we're using FB and not "lower level languages" after all! ;-)
jj2007
Posts: 1318
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

### Re: StringArray Sort (case independent)

Well, for me slow code is totally unacceptable. De gustibus non est disputandum ;-)
dodicat
Posts: 6153
Joined: Jan 10, 2006 20:30
Location: Scotland

### Re: StringArray Sort (case independent)

I did some speed tests.
The dos sort was by far the fastest (3.2 seconds for one million shortish strings)
Standard Quicksort was about eight seconds.
The C runtime was about ten seconds.