StringArray Sort (case independent)

Post your FreeBASIC source, examples, tips and tricks here. Please don’t post code without including an explanation.
jj2007
Posts: 2326
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: StringArray Sort (case independent)

Post by jj2007 »

Juergen Kuehlwein wrote:I can´t believe it! Using a dedicated comparison function i can get it down to 0.6 seconds (CASE INSENSITIVE
Loads of errors - is something missing?

Code: Select all

\FreeBasic\tmp\TmpFb.bas(2) error 4: Duplicated definition, found 'unsigned'
\FreeBasic\tmp\TmpFb.bas(3) error 145: Only valid in -lang deprecated or fblite or qb
Juergen Kuehlwein
Posts: 284
Joined: Mar 07, 2018 13:59
Location: Germany

Re: StringArray Sort (case independent)

Post by Juergen Kuehlwein »

The posted code is in C and it´s not yet pushed to the github repositiory. I replaced the comparison function i had with these two in the RTL. The quicksort in the RTL uses a separate comparison function for each variable type and i used this code for strings.

For a case independent compare you must "normalize" the strings to compare by making them uppercase or lowercase. But doing this before each compare is very time consuming and inefficient, because you would always convert the whole string (both of them) even if they differ at the first character.

This approach makes use of a conversion table, that is, you compare the two strings character by character using the character code as index for the table. Using two tables makes two extra checks for the zero byte (null character) obsolete, which makes it even faster.


JK
jj2007
Posts: 2326
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: StringArray Sort (case independent)

Post by jj2007 »

Sounds clever, will test it asap! I can't believe that I didn't notice it was C... ;-)

Do you have a complete example, with array generation etc? My C is a bit rusty.
Lost Zergling
Posts: 534
Joined: Dec 02, 2011 22:51
Location: France

Re: StringArray Sort (case independent)

Post by Lost Zergling »

Considering JK explanations and extrapolating to lzle it is now obvious for me than the case independant feature should be handled into the hashtag (and not before it) in this context but I m not sure if it would be worth or not to add such feature to the tool (one more test and a small but global slowdown). Perhaps just a comparison. Thus about c : a c version of the lzle library could be an idea but I d rather prefer FB pointers speed close to c ones and some posts on forum tend to suggest this not the case, wich could explain some of the speed differences.
Juergen Kuehlwein
Posts: 284
Joined: Mar 07, 2018 13:59
Location: Germany

Re: StringArray Sort (case independent)

Post by Juergen Kuehlwein »

Do you have a complete example, with array generation etc? My C is a bit rusty.
Yes, but it´s mixed code: dodicat´s code for a DOS sort (this thread), and code of my array project (FB + C for the RTL). The actual sort is done as a quicksort in C. I cannot commit and push currently, because i´m on a vacation and have only a copy of the code in my laptop with me. So, please be patient, hopefully i will be able to push a more elaborate version in two weeks or so.


JK
jj2007
Posts: 2326
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: StringArray Sort (case independent)

Post by jj2007 »

Thanks, Juergen - und schönen Urlaub ;-)
dodicat
Posts: 7976
Joined: Jan 10, 2006 20:30
Location: Scotland

Re: StringArray Sort (case independent)

Post by dodicat »

I can get the c callback function down to about 5 seconds for a million strings.
My answer agrees with the dos sort and a quicksort (which is slower).

Code: Select all


#include "crt.bi"

Function callbackU Cdecl(n1 As Any Ptr,n2 As Any Ptr) As Long
    static as string a,b
    static as long lena,lenb
    a=*Cptr(String Ptr,n1):b=*Cptr(String Ptr,n2)
    lena=len(a):lenb=len(b)
    for n as long=0 to iif(lena<lenb,lena,lenb)-1
        if a[n]<91 then if a[n]>64 then a[n]+=32
        if b[n]<91 then if b[n]>64 then b[n]+=32
        If a[n] < b[n] Then Return -1
        If a[n] > b[n] Then Return 1
    next
    Return 0
End Function

Function callbackD Cdecl(n1 As Any Ptr,n2 As Any Ptr) As Long
     static as string a,b
     static as long lena,lenb
     a=*Cptr(String Ptr,n1):b=*Cptr(String Ptr,n2)
    lena=len(a):lenb=len(b)
    for n as long=0 to iif(lena<lenb,lena,lenb)-1
        if a[n]<91 then if a[n]>64 then a[n]+=32
        if b[n]<91 then if b[n]>64 then b[n]+=32
        If a[n] > b[n] Then Return -1
        If a[n] < b[n] Then Return 1
    next
    Return 0
End Function

Sub sortstring(s() As String,L As Long,U As Long,direction As String="up")
    If Lcase(direction)="up" Then
        qsort( @s((L)),((U)-(L)+1),Sizeof(s),@callbackU)
    Else
        qsort( @s((L)),((U)-(L)+1),Sizeof(s),@callbackD)
    End If
End Sub


Sub create(L() As String)
    #define range(f,l) Int(Rnd*(((l)+1)-(f))+(f))
    #define q range(97,122)-Iif(Rnd>.5,32,0)
    Randomize 1
    For n As Long=Lbound(L) To Ubound(L)
        Dim As String g1=Chr(q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q)
        Dim As String g2=Chr(q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q)
        Dim As String g3=Chr(q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q)
        L(n)=Left(g1+g2+g3,60+Rnd*30)
    Next
End Sub

Sub show(L() As String)
    For n As Long=Lbound(L) To 10
        Print L(n)
    Next
    For n As Long=1 To 4
        Print "..."
    Next
    For n As Long=Ubound(L)-10 To Ubound(L)
        Print L(n)
    Next
End Sub

Dim As Double t1,t2
Dim As Long limit=1000000
Dim As String L(1 To limit)
Print "Creating string"
create(L())
Print "Commence sort (crt)"
t1=Timer
sortstring(L(),lbound(L),ubound(L),"up")
t2=Timer
show(L())
Print t2-t1;"  Seconds crt"
Sleep 
    
Here is my answer

Code: Select all

 Creating string
Commence sort (crt)
AAaAHZybMJfIBMDPTlGpjvUIKUAUarYYBhwoQzXDbBiAcpCJBdWibuPGKXCeSrAcOsQVouKGLUtXF
aAABoYraUmtdbUjgRDbugWslpjAUIKuszBiOYgbGgDwawBsyfUTodDuktDyLUCdtDSYqIFfwxoBlKDfbiuHQ
aaadeaEoIYWvHUzkSOtliXprvQLwLRmQtZnACEWprRsosQEBCExqPRkIFHyxbkWHovUZtDOKw
aAAdgYmOjPfEHlBAxbwixWqYTppVdpWGkQMBRNmCTHZfjqVHHsnYUNcnJtYEXkXrgjLLpz
AAAdXKZoSZITCgAlVxqloFzcjapuQHGGQBOElGPKCqLXaVMBIzaDjebMxglyt
AaAEdkSIREVLGLJZScnkXaOymKTFdJGPZUsOGaOTUHuCWijpmEdhUHMBxyeYzI
aAAeFlewdudpRIuowRSihDPHiAzTqtGfKRvbeuqAwufmfbvCeTVQYwpXwzukg
aaAelVOVdRieVichzLhUWafcKAoFjlAgwYvopUmYgCOfUVNlwdlxRzAZwKjkgxvBEqviohBHcPlhOFzxcRBYy
AaaEomtIuMRgVNHDwjZIWRZgiVgkfghJlhXWsSydLLcyAauhNdjESZelpVBoBySKhUyQHxApqjQxmrTaSSpGLUeXIx
AAAerWyYiOufKfPsPpmOSMGVvvwGzARxjwEZgqJdcNwDLzNNvXxIDHxadiJnxARObYNuILRflHAFjS
...
...
...
...
zZZxkXWYMdvqOvUurUXPYkofDtHOBSGmpqrcBRpBGbYSHVRtJvhjpryJpBCtvurFCiTcvZAGdGFzlKKXw
ZzZXLUDRAPWHQTmxjehAUyDvJqngpOnjtVFhDzPERrhlsKDNlrCecMdmYHxGsaenuoHwaNrlGqBDNDPgztCAFqKFTa
ZZzyanZINaMDFipgHqschlpzDjIiguGYeRPkMSHHhgKnRucrAurxZWTdihlfYt
zZzycXtEdYGXwjLnXqIOJBelLnwaVMCpwbLdRpqfZHQEwwQEmIHnxgwCVCAGAnwUVQAmlMuQEzBcaNLImCCumMX
zZZyPQXOdeaMoUjdXDNYDjsRomlpjNyoeUHpyoYyYxloSCbnXeqzBvBXzgUUmXSErgdTn
zZZyTXiTBhRLyLZtLVHIVqZkquYTncsnbztSkxeNTteHouBCHqkYUrJeLMvphCMB
ZZZYUhUIBdzPEDKfbdHakFUrczSEtzJpyEMdKePOoVCWtQGyduJtzietukAoVIyDFkvghymAIRljOkxqqw
zzzYZAgWBUtSZfUmlAWnFFRpDcGgwdOFndrhSJZcyMsgkvLnTOTRmUtwjKvNKJndDYagMFDBnZQduJzTADOTbIsyx
ZZzyzetRGgWITClgQOTutTCdCiuHMxPNPuxGIsUFqStMIuIRPqlWpHdzyckcajQMCgNTbYzilNuy
zzZzcsrZGTHbiajufJIMXwWdbKFncznEBmlSLCGJBKFTHoixDzzNgeXSRohyfJHAQreI
zZZzToOoFStspcNrVaEIfjsxTNFZWUIwVWqxrzppJBjnxhbFgXmoCjUQxitzjpsLsmkTJefTDsKUKDj
 4.547425800003111  Seconds crt
 
It doesn't agree with JK's answer.
At the moment I can't get it much faster (even with an external comparison array of ubyte).
Here is it using an external array.

Code: Select all


 

 dim shared as ubyte u(255)
#define lwr(s) iif(s<91 andalso s>64,s+32,s)

for n as long=0 to 255
    u(n)=n
    u(n)=lwr(u(n))
next

#include "crt.bi"

Function callbackU Cdecl(n1 As Any Ptr,n2 As Any Ptr) As Long
    static as string a,b
    static as long lena,lenb
    a=*Cptr(String Ptr,n1):b=*Cptr(String Ptr,n2)
    lena=len(a):lenb=len(b)
    for n as long=0 to iif(lena<lenb,lena,lenb)-1
        If u(a[n]) < u(b[n]) Then Return -1
        If u(a[n]) > u(b[n]) Then Return 1
    next
    Return 0
End Function

Function callbackD Cdecl(n1 As Any Ptr,n2 As Any Ptr) As Long
     static as string a,b
     static as long lena,lenb
     a=*Cptr(String Ptr,n1):b=*Cptr(String Ptr,n2)
    lena=len(a):lenb=len(b)
    for n as long=0 to iif(lena<lenb,lena,lenb)-1
        If u(a[n]) > u(b[n]) Then Return -1
        If u(a[n]) < u(b[n]) Then Return 1
    next
    Return 0
End Function

Sub sortstring(s() As String,L As Long,U As Long,direction As String="up")
    If Lcase(direction)="up" Then
        qsort( @s((L)),((U)-(L)+1),Sizeof(s),@callbackU)
    Else
        qsort( @s((L)),((U)-(L)+1),Sizeof(s),@callbackD)
    End If
End Sub


Sub create(L() As String)
    #define range(f,l) Int(Rnd*(((l)+1)-(f))+(f))
    #define q range(97,122)-Iif(Rnd>.5,32,0)
    Randomize 1
    For n As Long=Lbound(L) To Ubound(L)
        Dim As String g1=Chr(q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q)
        Dim As String g2=Chr(q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q)
        Dim As String g3=Chr(q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q)
        L(n)=Left(g1+g2+g3,60+Rnd*30)
    Next
End Sub

Sub show(L() As String)
    For n As Long=Lbound(L) To 10
        Print L(n)
    Next
    For n As Long=1 To 4
        Print "..."
    Next
    For n As Long=Ubound(L)-10 To Ubound(L)
        Print L(n)
    Next
End Sub

Dim As Double t1,t2
Dim As Long limit=1000000
Dim As String L(1 To limit)
Print "Creating string"
create(L())
Print "Commence sort (crt)"
t1=Timer
sortstring(L(),lbound(L),ubound(L),"up")
t2=Timer
show(L())
Print t2-t1;"  Seconds crt"
Sleep 
 
   
Lost Zergling
Posts: 534
Joined: Dec 02, 2011 22:51
Location: France

Re: StringArray Sort (case independent)

Post by Lost Zergling »

Speed results can be very different on a random dataset depending on you fall in a slowdown case or not on the related algo (quicksort, bubble, and so on...). Speed can be also related to cache, processor, os... I can see at least 2 different approach : going on a fast algo and trying to reduce or limit the slowdown cases OR going on a slow algo covering by default the widdest cases and then trying to optimise speed. Lzle is second option and is not optimal for a pure sort because the data are not contiguous, so it consumes memory and time to store pointers and thus there is a redundancy between case sensitive and non case sensitive datas. Counterpart data can be handled as keys. Suggested dataset is a bit specific because it is small compared to the key len one hand, it is random so it is more convenient user case for scientific than for it computing other hand. Because of its architecture, I will use lzle (with case insensitive into the hashtag) as a minimal requirements tests for pure sort for exemple to detect the event of a possible slodown cases. Then, I assume it could be much slower than a specific algo because it is designed for keys in a processing context. I m on hollidays and on my smartphone this we.
dodicat
Posts: 7976
Joined: Jan 10, 2006 20:30
Location: Scotland

Re: StringArray Sort (case independent)

Post by dodicat »

JK seems only to sort the first four characters properly, the fifth often is in error.
e.g.
...
AAAdXKZoSZITCgAlVxqloFzcjapuQHGGQBOElGPKCqLXaVMBIzaDjebMxglyt
aaadeaEoIYWvHUzkSOtliXprvQLwLRmQtZnACEWprRsosQEBCExqPRkIFHyxbkWHovUZtDOKw
...

aaadx should be after aaade (case independent)

Anyway, the fastest I can make this run without a migrane is the standard quicksort with two callback functions an a lookup array.
With 64 or 32 bits -O3 optimisation, about 1.6 seconds for a million strings.
-gen gas is about 2.5 seconds.
My results are the same as posted previously.

Code: Select all

  

 Dim Shared As Ubyte u(255)
#define lwr(s) iif(s<91 andalso s>64,s+32,s)
For n As Long=0 To 255
    u(n)=lwr(n)  'lookup
Next

Function lessthan(a As String,b As String)  As Long
    var lena=Len(a),lenb=Len(b)
    For n As Long=0 To Iif(lena<lenb,lena,lenb)-1
        If u(a[n]) < u(b[n]) Then  Return -1
        If u(a[n]) > u(b[n]) Then Return 0
    Next
    Return 0
End Function

Function morethan(a As String,b As String)  As Long
    var lena=Len(a),lenb=Len(b)
    For n As Long=0 To Iif(lena<lenb,lena,lenb)-1
        If u(a[n]) > u(b[n]) Then Return -1 
        If u(a[n]) < u(b[n]) Then Return 0
    Next
    Return 0
End Function

Sub sortup(array() As String,begin As Long,Finish As Long)
    static as string x
    Dim As Long i=begin,j=finish
    x=(array(((I+J)\2)))
    While I <= J
        While lessthan(array(I), X):I+=1:Wend
        While morethan(array(J), X):J-=1:Wend
                If I<=J Then Swap array(I),array(J): I+=1:J-=1
            Wend
            If J >begin Then  sortup(array(),begin,J)
            If I <Finish Then sortup(array(),I,Finish)
        End Sub
        
Sub sortdown(array() As String,begin As Long,Finish As Long)
    static as string x
    Dim As Long i=begin,j=finish
    x=(array(((I+J)\2)))
    While I <= J
        While morethan(array(I), X):I+=1:Wend
        While lessthan(array(J), X):J-=1:Wend
                If I<=J Then Swap array(I),array(J): I+=1:J-=1
            Wend
            If J >begin Then  sortdown(array(),begin,J)
            If I <Finish Then sortdown(array(),I,Finish)
        End Sub
        
       
       Sub create(L() As String)
    #define range(f,l) Int(Rnd*(((l)+1)-(f))+(f))
    #define q range(97,122)-Iif(Rnd>.5,32,0)
    Randomize 1
    For n As Long=Lbound(L) To Ubound(L)
        Dim As String g1=Chr(q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q)
        Dim As String g2=Chr(q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q)
        Dim As String g3=Chr(q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q)
        L(n)=Left(g1+g2+g3,60+Rnd*30)
    Next
End Sub

Sub show(L() As String)
    For n As Long=Lbound(L) To 10
        Print L(n)
    Next
    For n As Long=1 To 4
        Print "..."
    Next
    For n As Long=Ubound(L)-10 To Ubound(L)
        Print L(n)
    Next
End Sub

Dim As Double t1,t2
Dim As Long limit=1000000
Dim As String L(1 To limit)
Print "Creating string"
create(L())
Print "Commence sort (crt)"
t1=Timer
sortup(L(),Lbound(L),Ubound(L))
t2=Timer
show(L())
Print t2-t1;"  Seconds quicksort"
Sleep
 
    
Juergen Kuehlwein
Posts: 284
Joined: Mar 07, 2018 13:59
Location: Germany

Re: StringArray Sort (case independent)

Post by Juergen Kuehlwein »

@dodicat,

i fixed the error i had i my sorting algo in the meantime. Our results are the same now (case independent). My laptop is not the fastest, i get 0.9 seconds here (i got 0.6 at home) for one million strings using basically your code with my sort. I still see room for optimization ...


JK
jj2007
Posts: 2326
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: StringArray Sort (case independent)

Post by jj2007 »

Juergen Kuehlwein wrote:@dodicat,

i fixed the error i had i my sorting algo in the meantime. Our results are the same now (case independent). My laptop is not the fastest, i get 0.9 seconds here (i got 0.6 at home) for one million strings using basically your code with my sort. I still see room for optimization ..
Not much room. I use a mergesort, and quicksort is only marginally faster (but mergesort is a stable sort):

Code: Select all

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
sorting 1000000 lines case-insensitive took 1441 ms
It's always a good idea to indicate which cpu you are using.
dodicat
Posts: 7976
Joined: Jan 10, 2006 20:30
Location: Scotland

Re: StringArray Sort (case independent)

Post by dodicat »

I have quicksort down to <1 second per million strings.
(A simple optimisation, don't know how I missed it before.)
I have also tried macros instead of callback functions, but as yet they are a tiny bit slower.

Code: Select all

 

 Dim Shared As Ubyte u(255)
#define lwr(s) iif(s<91 andalso s>64,s+32,s)
For n As Long=0 To 255
    u(n)=lwr(n)  'lookup
Next

Function lessthan(a As String,b As String,Lenb as long)  As Long
    static as long lena
    lena=cast(integer ptr,@a)[1]'=Len(a)
    For n as long =0 To Iif(lena<lenb,lena,lenb)-1
        If u(a[n]) < u(b[n]) Then  Return -1
        If u(a[n]) > u(b[n]) Then  Return 0
    Next
    Return 0
End Function

Function morethan(a As String,b As String,lenb as long)  As Long
    static as long lena
    lena=cast(integer ptr,@a)[1]'=Len(a)
    For n as long  =0 To Iif(lena<lenb,lena,lenb)-1
        If u(a[n]) > u(b[n]) Then Return -1
        If u(a[n]) < u(b[n]) Then Return 0
    Next
    Return 0
End Function

Sub sortup(array() As String,begin As Long,Finish As Long)
    static as string x
    var i=begin,j=finish
    x=(array(((I+J)\2)))
     var lenx=cast(integer ptr,@x)[1]'=len(X)  'get length here instead of in the loops
    While I <= J
        While lessthan(array(I),X,lenx):I+=1:Wend
        While morethan(array(J),X,lenx):J-=1:Wend
        If I<=J Then Swap array(I),array(J): I+=1:J-=1
    Wend
            If J >begin Then  sortup(array(),begin,J)
            If I <Finish Then sortup(array(),I,Finish)
        End Sub
       
Sub sortdown(array() As String,begin As Long,Finish As Long)
    static as string x
    var i=begin,j=finish
    x=(array(((I+J)\2)))
    var lenx=len(X)
    While I <= J
        While morethan(array(I),X,lenx):I+=1:Wend
        While lessthan(array(J),X,lenx):J-=1:Wend
            If I<=J Then Swap array(I),array(J): I+=1:J-=1
    Wend
            If J >begin Then  sortdown(array(),begin,J)
            If I <Finish Then sortdown(array(),I,Finish)
        End Sub
       
       
       Sub create(L() As String)
    #define range(f,l) Int(Rnd*(((l)+1)-(f))+(f))
    #define q range(97,122)-Iif(Rnd>.5,32,0)
    Randomize 1
    For n As Long=Lbound(L) To Ubound(L)
        Dim As String g1=Chr(q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q)
        Dim As String g2=Chr(q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q)
        Dim As String g3=Chr(q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q)
        L(n)=Left(g1+g2+g3,60+Rnd*30)
    Next
End Sub

Sub show(L() As String)
    For n As Long=Lbound(L) To 10
        Print L(n)
    Next
    For n As Long=1 To 4
        Print "..."
    Next
    For n As Long=Ubound(L)-10 To Ubound(L)
        Print L(n)
    Next
End Sub

Dim As Double t1,t2
Dim As Long limit=1000000
Dim As String L(1 To limit)
Print "Creating string"
create(L())
Print "Commence sort (crt)"
t1=Timer
sortup(L(),Lbound(L),Ubound(L))
t2=Timer
show(L())
Print t2-t1;"  Seconds quicksort"
Sleep
 
    
      
Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz (2 CPUs), ~3.0GHz
But unfortunately I don't know which cpu, could be either the one on the right or the one on the left.
Juergen Kuehlwein
Posts: 284
Joined: Mar 07, 2018 13:59
Location: Germany

Re: StringArray Sort (case independent)

Post by Juergen Kuehlwein »

My laptop´s CPU is: Intel Pentium 4417U, 2.3 GHz 2 Cores. But knowing the CPU, doesn´t mean you can now compare speed. What makes sense to me is running different code on the same machine under the same conditions and then comparing speeds.

Running dodicat´s code (preceding post) i get on my (not very fast) laptop:
Creating string
Commence sort (sortup(L(),Lbound(L),Ubound(L)))
AAaAHZybMJfIBMDPTlGpjvUIKUAUarYYBhwoQzXDbBiAcpCJBdWibuPGKXCeSrAcOsQVouKGLUtXF
aAABoYraUmtdbUjgRDbugWslpjAUIKuszBiOYgbGgDwawBsyfUTodDuktDyLUCdtDSYqIFfwxoBlKDfbiuHQ
aaadeaEoIYWvHUzkSOtliXprvQLwLRmQtZnACEWprRsosQEBCExqPRkIFHyxbkWHovUZtDOKw
aAAdgYmOjPfEHlBAxbwixWqYTppVdpWGkQMBRNmCTHZfjqVHHsnYUNcnJtYEXkXrgjLLpz
AAAdXKZoSZITCgAlVxqloFzcjapuQHGGQBOElGPKCqLXaVMBIzaDjebMxglyt
AaAEdkSIREVLGLJZScnkXaOymKTFdJGPZUsOGaOTUHuCWijpmEdhUHMBxyeYzI
aAAeFlewdudpRIuowRSihDPHiAzTqtGfKRvbeuqAwufmfbvCeTVQYwpXwzukg
aaAelVOVdRieVichzLhUWafcKAoFjlAgwYvopUmYgCOfUVNlwdlxRzAZwKjkgxvBEqviohBHcPlhOFzxcRBYy
AaaEomtIuMRgVNHDwjZIWRZgiVgkfghJlhXWsSydLLcyAauhNdjESZelpVBoBySKhUyQHxApqjQxmrTaSSpGLUeXIx
AAAerWyYiOufKfPsPpmOSMGVvvwGzARxjwEZgqJdcNwDLzNNvXxIDHxadiJnxARObYNuILRflHAFjS
...
...
...
...
zZZxkXWYMdvqOvUurUXPYkofDtHOBSGmpqrcBRpBGbYSHVRtJvhjpryJpBCtvurFCiTcvZAGdGFzlKKXw
ZzZXLUDRAPWHQTmxjehAUyDvJqngpOnjtVFhDzPERrhlsKDNlrCecMdmYHxGsaenuoHwaNrlGqBDNDPgztCAFqKFTa
ZZzyanZINaMDFipgHqschlpzDjIiguGYeRPkMSHHhgKnRucrAurxZWTdihlfYt
zZzycXtEdYGXwjLnXqIOJBelLnwaVMCpwbLdRpqfZHQEwwQEmIHnxgwCVCAGAnwUVQAmlMuQEzBcaNLImCCumMX
zZZyPQXOdeaMoUjdXDNYDjsRomlpjNyoeUHpyoYyYxloSCbnXeqzBvBXzgUUmXSErgdTn
zZZyTXiTBhRLyLZtLVHIVqZkquYTncsnbztSkxeNTteHouBCHqkYUrJeLMvphCMB
ZZZYUhUIBdzPEDKfbdHakFUrczSEtzJpyEMdKePOoVCWtQGyduJtzietukAoVIyDFkvghymAIRljOkxqqw
zzzYZAgWBUtSZfUmlAWnFFRpDcGgwdOFndrhSJZcyMsgkvLnTOTRmUtwjKvNKJndDYagMFDBnZQduJzTADOTbIsyx
ZZzyzetRGgWITClgQOTutTCdCiuHMxPNPuxGIsUFqStMIuIRPqlWpHdzyckcajQMCgNTbYzilNuy
zzZzcsrZGTHbiajufJIMXwWdbKFncznEBmlSLCGJBKFTHoixDzzNgeXSRohyfJHAQreI
zZZzToOoFStspcNrVaEIfjsxTNFZWUIwVWqxrzppJBjnxhbFgXmoCjUQxitzjpsLsmkTJefTDsKUKDj
2.196520539519042 Seconds quicksort


running his code with my sort, i get this:
Creating string
Commence sort (array(sort, (L, nocase))
AAaAHZybMJfIBMDPTlGpjvUIKUAUarYYBhwoQzXDbBiAcpCJBdWibuPGKXCeSrAcOsQVouKGLUtXF
aAABoYraUmtdbUjgRDbugWslpjAUIKuszBiOYgbGgDwawBsyfUTodDuktDyLUCdtDSYqIFfwxoBlKDfbiuHQ
aaadeaEoIYWvHUzkSOtliXprvQLwLRmQtZnACEWprRsosQEBCExqPRkIFHyxbkWHovUZtDOKw
aAAdgYmOjPfEHlBAxbwixWqYTppVdpWGkQMBRNmCTHZfjqVHHsnYUNcnJtYEXkXrgjLLpz
AAAdXKZoSZITCgAlVxqloFzcjapuQHGGQBOElGPKCqLXaVMBIzaDjebMxglyt
AaAEdkSIREVLGLJZScnkXaOymKTFdJGPZUsOGaOTUHuCWijpmEdhUHMBxyeYzI
aAAeFlewdudpRIuowRSihDPHiAzTqtGfKRvbeuqAwufmfbvCeTVQYwpXwzukg
aaAelVOVdRieVichzLhUWafcKAoFjlAgwYvopUmYgCOfUVNlwdlxRzAZwKjkgxvBEqviohBHcPlhOFzxcRBYy
AaaEomtIuMRgVNHDwjZIWRZgiVgkfghJlhXWsSydLLcyAauhNdjESZelpVBoBySKhUyQHxApqjQxmrTaSSpGLUeXIx
AAAerWyYiOufKfPsPpmOSMGVvvwGzARxjwEZgqJdcNwDLzNNvXxIDHxadiJnxARObYNuILRflHAFjS
...
...
...
...
zZZxkXWYMdvqOvUurUXPYkofDtHOBSGmpqrcBRpBGbYSHVRtJvhjpryJpBCtvurFCiTcvZAGdGFzlKKXw
ZzZXLUDRAPWHQTmxjehAUyDvJqngpOnjtVFhDzPERrhlsKDNlrCecMdmYHxGsaenuoHwaNrlGqBDNDPgztCAFqKFTa
ZZzyanZINaMDFipgHqschlpzDjIiguGYeRPkMSHHhgKnRucrAurxZWTdihlfYt
zZzycXtEdYGXwjLnXqIOJBelLnwaVMCpwbLdRpqfZHQEwwQEmIHnxgwCVCAGAnwUVQAmlMuQEzBcaNLImCCumMX
zZZyPQXOdeaMoUjdXDNYDjsRomlpjNyoeUHpyoYyYxloSCbnXeqzBvBXzgUUmXSErgdTn
zZZyTXiTBhRLyLZtLVHIVqZkquYTncsnbztSkxeNTteHouBCHqkYUrJeLMvphCMB
ZZZYUhUIBdzPEDKfbdHakFUrczSEtzJpyEMdKePOoVCWtQGyduJtzietukAoVIyDFkvghymAIRljOkxqqw
zzzYZAgWBUtSZfUmlAWnFFRpDcGgwdOFndrhSJZcyMsgkvLnTOTRmUtwjKvNKJndDYagMFDBnZQduJzTADOTbIsyx
ZZzyzetRGgWITClgQOTutTCdCiuHMxPNPuxGIsUFqStMIuIRPqlWpHdzyckcajQMCgNTbYzilNuy
zzZzcsrZGTHbiajufJIMXwWdbKFncznEBmlSLCGJBKFTHoixDzzNgeXSRohyfJHAQreI
zZZzToOoFStspcNrVaEIfjsxTNFZWUIwVWqxrzppJBjnxhbFgXmoCjUQxitzjpsLsmkTJefTDsKUKDj
0.8116207793408421 Seconds quicksort
So it´s 2.19 vs 0.81 seconds. When i´m back home again, i will push the code and the binaries to my Github repository so you can download it and test yourself.


JK
dodicat
Posts: 7976
Joined: Jan 10, 2006 20:30
Location: Scotland

Re: StringArray Sort (case independent)

Post by dodicat »

Yea OK JK.
Thanks for testing.
If I run my code with 64 bit -Wc -O3 I get
...
ZZzyzetRGgWITClgQOTutTCdCiuHMxPNPuxGIsUFqStMIuIRPqlWpHdzyckcajQMCgNTbYzilNuy
zzZzcsrZGTHbiajufJIMXwWdbKFncznEBmlSLCGJBKFTHoixDzzNgeXSRohyfJHAQreI
zZZzToOoFStspcNrVaEIfjsxTNFZWUIwVWqxrzppJBjnxhbFgXmoCjUQxitzjpsLsmkTJefTDsKUKDj
0.903839100006735 Seconds quicksort
(about the same for 32 bit optimized)
If I don't optimize I get >2 seconds.
I am still messing around with a macro + quicksort which at the moment I get about .88 seconds or so (using -O3 or -Ofast switch).

Normally in freebasic, when a speed test is under way quite a few members pitch in with their different ideas.
But the forum is so quiet these days, probably like yourself, they are all on holiday.
So enjoy the rest of yours.
jj2007
Posts: 2326
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: StringArray Sort (case independent)

Post by jj2007 »

I took the liberty to "optimise" dodicat's code of Aug 11, 2019 15:31 (three posts above) by adding a switch that loads the string array from disk. The creation part is very slow, therefore it is convenient to create it once, then recall it from disk to speed up the testing:

Code: Select all

#include "Recall.bi"
#define fromdisk 1	' 1=create, 2=load
Sub create(L() As String)
  #if fromdisk=1
	Dim As Long fText=Freefile
	Open "SortTemp.txt" For Binary Access Write As #fText
	#define range(f,l) Int(Rnd*(((l)+1)-(f))+(f))
	#define q range(97,122)-Iif(Rnd>.5,32,0)
	Randomize 1
	For n As Long=Lbound(L) To Ubound(L)
		Dim As String g1=Chr(q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q)
		Dim As String g2=Chr(q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q)
		Dim As String g3=Chr(q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q)
		L(n)=Left(g1+g2+g3,60+Rnd*30)
		Print #fText, L(n)
	Next
	Close #fText
	Print Ubound(L); " strings created"
  #else
 	Print "Loading array - ";
	Print Recall("SortTemp.txt", L());" strings loaded from SortTemp.txt"
  #endif
End Sub
Here is the required Recall.bi - copy & save to disk in the same folder as the main source:

Code: Select all

' Recall.bi, 12 August 2019, jj2007
#include once "crt.bi"	' needed for memcpy
#ifndef maxCell
	#Define maxCell 100	' whatever you consider enough for a single cell
#endif
Dim shared retstr As string * maxCell+1	

Function Recall(fname As String, locArray() As String) As Integer
  Dim As Integer ct=0, cursize=1000   ' locArray is a local representation of a dynamic array
  Dim As Long f=Freefile, sBytes
  If Open(fname For Binary Access Read As #f) = 0 Then
	Dim as ubyte ptr pContent = Allocate(Lof(f)), CurPos=pContent, CrPos
	Dim as long flen=Lof(f)
	Get #f, 1, *pContent, Lof(f) 
	Close #f
	Do
		if ct=0 or ct>cursize then
			cursize+=cursize shr 1
			ReDim Preserve locArray(cursize)
		endif
		CrPos=strstr(CurPos, Chr(10))
		if CrPos=0 Then CrPos=pContent+flen:flen=0
		sBytes=CrPos-CurPos-1
		if sBytes>0 then
			locArray(ct)=Space(sBytes)
			memcpy(StrPtr(locArray(ct)), CurPos, sBytes)
		endif
		CurPos=CrPos+1
		ct+=1	' inc dword ptr [ebp-20]
	Loop until flen=0
	While ct>1 and len(trim(locArray(ct-1)))=0	' get rid of trailing empty strings
		ct-=1
	Wend
	ReDim Preserve locArray(ct-1)
	DeAllocate(pContent)
  Else
	Print "Error opening file"
  End If
  Return ct
End Function

Function Cell(row As integer, col As integer, locArray() As String) As string
  Dim As integer ct=0, ctTabs=0, posLeft=-1, posRight=0
  Dim As ubyte ptr pString
  Dim c As ubyte
  pString=StrPtr(locArray(row))
  if pString then
   Do
      c=pString[ct]
      if c=0 then
      	if ctTabs>=col then posRight=ct+1
      	Exit do
      endif
      if c=9 then
         ctTabs=ctTabs+1
         if col=0 then
            posLeft=0
            if ctTabs>col then
            	posRight=ct+1
            	Exit do
            endif
         else
            if posLeft=-1 and ctTabs>=col then
                posLeft=ct+1
            elseif ctTabs>col then
                posRight=ct+1
                Exit do
            endif
         endif
      endif
      ct=ct+1
   Loop
  endif
  if posRight=0 then
   retstr[0]=0
  else
	posRight-=posLeft
	if posRight>maxCell then posRight=maxCell
	memcpy(StrPtr(retstr), pString+posLeft, posRight)
	retstr[posRight-1]=0
  endif
  return retstr
end function
And here is Dodicat's full code with the modification. Compile & run once with #define fromdisk 1, then test the sort algos with #define fromdisk 2. I moved the limit (#strings) up, too, to make it more transparent:

Code: Select all

Dim As Long limit=1000000
#define fromdisk 1	' 1=create, 2=load

Dim Shared As Ubyte u(255)
#define lwr(s) iif(s<91 andalso s>64,s+32,s)
For n As Long=0 To 255
    u(n)=lwr(n)  'lookup
Next

Function lessthan(a As String,b As String,Lenb as long)  As Long
    static as long lena
    lena=cast(integer ptr,@a)[1]'=Len(a)
    For n as long =0 To Iif(lena<lenb,lena,lenb)-1
        If u(a[n]) < u(b[n]) Then  Return -1
        If u(a[n]) > u(b[n]) Then  Return 0
    Next
    Return 0
End Function

Function morethan(a As String,b As String,lenb as long)  As Long
    static as long lena
    lena=cast(integer ptr,@a)[1]'=Len(a)
    For n as long  =0 To Iif(lena<lenb,lena,lenb)-1
        If u(a[n]) > u(b[n]) Then Return -1
        If u(a[n]) < u(b[n]) Then Return 0
    Next
    Return 0
End Function

Sub sortup(array() As String,begin As Long,Finish As Long)
    static as string x
    var i=begin,j=finish
    x=(array(((I+J)\2)))
     var lenx=cast(integer ptr,@x)[1]'=len(X)  'get length here instead of in the loops
    While I <= J
        While lessthan(array(I),X,lenx):I+=1:Wend
        While morethan(array(J),X,lenx):J-=1:Wend
        If I<=J Then Swap array(I),array(J): I+=1:J-=1
    Wend
            If J >begin Then  sortup(array(),begin,J)
            If I <Finish Then sortup(array(),I,Finish)
        End Sub
       
Sub sortdown(array() As String,begin As Long,Finish As Long)
    static as string x
    var i=begin,j=finish
    x=(array(((I+J)\2)))
    var lenx=len(X)
    While I <= J
        While morethan(array(I),X,lenx):I+=1:Wend
        While lessthan(array(J),X,lenx):J-=1:Wend
            If I<=J Then Swap array(I),array(J): I+=1:J-=1
    Wend
            If J >begin Then  sortdown(array(),begin,J)
            If I <Finish Then sortdown(array(),I,Finish)
        End Sub

#include "Recall.bi"
Sub create(L() As String)
  #if fromdisk=1
	Dim As Long fText=Freefile
	Open "SortTemp.txt" For Binary Access Write As #fText
	#define range(f,l) Int(Rnd*(((l)+1)-(f))+(f))
	#define q range(97,122)-Iif(Rnd>.5,32,0)
	Randomize 1
	For n As Long=Lbound(L) To Ubound(L)
		Dim As String g1=Chr(q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q)
		Dim As String g2=Chr(q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q)
		Dim As String g3=Chr(q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q)
		L(n)=Left(g1+g2+g3,60+Rnd*30)
		Print #fText, L(n)
	Next
	Close #fText
	Print Ubound(L); " strings created"
  #else
 	Print "Loading array - ";
	Print Recall("SortTemp.txt", L());" strings loaded from SortTemp.txt"
  #endif
End Sub

Sub show(L() As String)
    For n As Long=Lbound(L) To 10
        Print L(n)
    Next
    For n As Long=1 To 4
        Print "..."
    Next
    For n As Long=Ubound(L)-10 To Ubound(L)
        Print L(n)
    Next
End Sub

Dim As Double t1,t2
Dim As String L(1 To limit)
Print "Creating string"
create(L())
Print "Commence sort (crt)"
t1=Timer
sortup(L(),Lbound(L),Ubound(L))
t2=Timer
show(L())
Print t2-t1;"  Seconds quicksort"
Sleep
See here for an assembly equivalent, showing that the case-insensitive algo is roughly a factor 2.5 slower
Post Reply