StringArray Sort (case independent)

Post your FreeBASIC source, examples, tips and tricks here. Please don’t post code without including an explanation.
Lost Zergling
Posts: 538
Joined: Dec 02, 2011 22:51
Location: France

Re: StringArray Sort (case independent)

Post by Lost Zergling »

I remain convinced of the relevance of lzle in this case (about 1000000 entries): it is a quasi standard feature of the tool. The reuse of the memory addresses makes the performances much less sensitive to the structure of the data to be sorted and the successive sorts and the tree is adapted to average volumetrics (as long as the memory is sufficient). The main disadvantage of use (compared to Qsort) is that it is (a little) heavy. In view of the "visual sort" topic, I believe that, hypothetically, a scientific level FB coded dedicated algo possibly using pointers could certainly make better (than the optimal lzle case) while retaining the scalar character. Question is how much better...(10% ? 40% ?) I think this problem a bit complicated.
MrSwiss
Posts: 3910
Joined: Jun 02, 2013 9:27
Location: Switzerland

Re: StringArray Sort (case independent)

Post by MrSwiss »

And I remain equally convinced, acting on the maxim of:

"more code, more trouble, less speed"

that your tool is of no consequence, at least, as far as it concerns myself.
(Btw. if you don't stop your "advertising campaign", I'm going to report you!)
jj2007
Posts: 2326
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: StringArray Sort (case independent)

Post by jj2007 »

dodicat wrote:one million shortish strings
Average length, random, ...?
Lost Zergling
Posts: 538
Joined: Dec 02, 2011 22:51
Location: France

Re: StringArray Sort (case independent)

Post by Lost Zergling »

"more code, more trouble, less speed"
In general, I'd say yes. But in the case of a sort, the problem is much more complex. Certainly, a code "bigger" will consume cache, but if the object of this code is precisely to limit cache consumption by the data, then this choice becomes profitable beyond a certain volume (not so high) . Similarly, the reuse of addresses will help address the problem of "processing" (limit the Redim, allocate and deallocate), and therefore also optimize the case of many small sorts. To give a more concrete example, something like this :
MyList.HashKeyUnique(0)
MyList.HashTag(lCase($Key) : MyList.RwTag1($Key) => For each key
Then :
While MyList.KeyStep(Rev) : .. : Wend => String Sort
While MyList.nKeyStep(Rev) : .. : Wend => Numeric like Sort result
In the related topic :
LZLE could be useful to evaluate and validate the performance of codes you expect, whenever these codes seems not interest you every time.
jj2007
Posts: 2326
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: StringArray Sort (case independent)

Post by jj2007 »

dodicat wrote:I did some speed tests.
The dos sort was by far the fastest (3.2 seconds for one million shortish strings)
Standard Quicksort was about eight seconds.
The C runtime was about ten seconds.
Timings for a one million strings array with a total of 10MB:

Code: Select all

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
0.57 seconds for sorting 1 Million strings
The C runtime was about ten seconds.
I can't believe that C is so slow. Can you tell more about your setup?
Lost Zergling
Posts: 538
Joined: Dec 02, 2011 22:51
Location: France

Re: StringArray Sort (case independent)

Post by Lost Zergling »

On my side : i5 2410m w64, battery mode. Lzle.bi (FB) : 2 sec / 3 sec depending on the dataset.
dodicat
Posts: 7983
Joined: Jan 10, 2006 20:30
Location: Scotland

Re: StringArray Sort (case independent)

Post by dodicat »

Example of DOS sort for a million lines of a mixed bag of text characters 60 to 90 digits long.

Code: Select all

 
Function StringSplit(s_in As String,chars As String,result() As String) As Long
    Dim As Long ctr,ctr2,k,n,LC=Len(chars)
    Dim As boolean tally(Len(s_in))
    #macro check_instring()
    n=0
    While n<Lc
        If chars[n]=s_in[k] Then 
            tally(k)=true
            If (ctr2-1) Then ctr+=1
            ctr2=0
            Exit While
        End If
        n+=1
    Wend
    #endmacro
    
    #macro split()
    If tally(k) Then
        If (ctr2-1) Then ctr+=1:result(ctr)=Mid(s_in,k+2-ctr2,ctr2-1)
        ctr2=0
    End If
    #endmacro
    '==================  LOOP TWICE =======================
    For k  =0 To Len(s_in)-1
        ctr2+=1:check_instring()
    Next k
    If ctr=0 Then
        If Len(s_in) Andalso Instr(chars,Chr(s_in[0])) Then ctr=1
    End If
    If ctr Then Redim result(1 To ctr): ctr=0:ctr2=0 Else  Return 0
    For k  =0 To Len(s_in)-1
        ctr2+=1:split()
    Next k
    '===================== Last one ========================
    If ctr2>0 Then
        Redim Preserve result(1 To ctr+1)
        result(ctr+1)=Mid(s_in,k+1-ctr2,ctr2)
    End If
    Return Ubound(result)
End Function

Sub DOSsort(s() As String,d As String="down")
    Var f=Freefile
    Var cd=Curdir
    Dim As String cmd,text
    Open "_tmp_.txt" For Output As #f
    For n As Long=Lbound(s) To Ubound(s)
        Print #f,s(n)
    Next n
    Close #f
    If d="down" Then
        cmd= "SORT/r " + cd + "\_tmp_.txt /o " +cd +"\_out_.txt" 
    Else
        cmd= "SORT " + cd + "\_tmp_.txt /o " +cd +"\_out_.txt" 
    End If
    Shell(cmd)
    Open "_out_.txt" For Binary Access Read As #f
    If Lof(f) > 0 Then
        text = String(Lof(f), 0)
        Get #f, , text
    End If
    Close #f
    StringSplit(text,Chr(10),s())
    Kill cd+"\_out_.txt"
    Kill cd+"\_tmp_.txt"
    print
    cmd="if EXIST _out_.txt echo please delete _out_.txt manually"
    shell(cmd)
    cmd="if EXIST _tmp_.txt echo please delete _tmp_.txt manually"
    shell(cmd)
    print
End Sub

Sub create(L() As String)
    #define range(f,l) Int(Rnd*(((l)+1)-(f))+(f))
    #define q range(97,122)-Iif(Rnd>.5,32,0)
    Randomize 1
    For n As Long=Lbound(L) To Ubound(L)
        Dim As String g1=Chr(q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q)
        Dim As String g2=Chr(q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q)
        Dim As String g3=Chr(q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q)
        L(n)=Left(g1+g2+g3,60+Rnd*30)
    Next
End Sub

Sub show(L() As String)
    For n As Long=Lbound(L) To 10
        Print L(n)
    Next
    For n As Long=1 To 4
        Print "..."
    Next
    For n As Long=Ubound(L)-10 To Ubound(L)
        Print L(n)
    Next
End Sub

Dim As Double t1,t2
Dim As Long limit=1000000
Dim As String L(1 To limit)
Print "Creating string"
create(L())
Print "Commence sort (DOS)"
t1=Timer
DOSsort(L(),"up")
t2=Timer
show(L())
Print t2-t1;"  Seconds DOS"
Sleep  
Lost Zergling
Posts: 538
Joined: Dec 02, 2011 22:51
Location: France

Re: StringArray Sort (case independent)

Post by Lost Zergling »

2-3 sec with a len 10-20 on my side. Using len 60-90 could be 4 times slower or above. to be tested using different optimization options. whenever your solution is using disk cache or not, sounds interesting for optimization. I ll do a few tests also on a crypted disk. I ll do also some new tests on lzle, to re check stability and reliability. I need to better document good practices and bugs issues or unsupported features. I do not have my computer for the moment so I ll proceed later to the tests.
jj2007
Posts: 2326
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: StringArray Sort (case independent)

Post by jj2007 »

dodicat wrote:Example of DOS sort for a million lines of a mixed bag of text characters 60 to 90 digits long.
I get a weird error:
"'SORT' is not recognized as an internal or external command, operable program or batch file."
It definitely exists, as a simple SORT from a DOS prompt shows. Having worked with Windows for a while, I tried something - and it worked:

Code: Select all

        cmd= "C:\Windows\System32\SORT.exe " + cd + "\_tmp_.txt /o " +cd +"\_out_.txt" 
8.297454716230277 Seconds DOS

Sorting the same array with QSort() takes less than 1.5 seconds, so there is some room for improvement ;-)
Juergen Kuehlwein
Posts: 284
Joined: Mar 07, 2018 13:59
Location: Germany

Re: StringArray Sort (case independent)

Post by Juergen Kuehlwein »

7.455707071739653 Seconds DOS

1.974337754966882 Seconds array(sort, L)

1.97 vs 1.5 (Assembler based) is not that bad, i think. I´m almost ready for a release of new array features discussed here (viewtopic.php?f=17&t=27606)


JK
dodicat
Posts: 7983
Joined: Jan 10, 2006 20:30
Location: Scotland

Re: StringArray Sort (case independent)

Post by dodicat »

The required sort (From Mr Swiss's point of view) changes the string to lcase (or ucase) ,compares, then swaps the original array elements.
The DOS sort does this automatically.
The crt sort is case dependent Unless the callback function changes the case.
But this slows it down to a snails' pace.
Choose either line 6 or line 7 by commenting out to see the speed difference between case sensitive and case insensitive.
I have chosen only 10000 elements, for a million elements, changing case, 229 seconds, not changing case 1.59 seconds

Code: Select all

 


#include "crt.bi"

#define Lc lcase
'#define Lc

Function callbackU Cdecl(n1 As Any Ptr,n2 As Any Ptr) As Long
    If Lc(*Cptr(String Ptr,n1)) < Lc(*Cptr(String Ptr,n2)) Then Return -1
    If Lc(*Cptr(String Ptr,n1)) > Lc(*Cptr(String Ptr,n2)) Then Return 1
    Return 0
End Function

Function callbackD Cdecl(n1 As Any Ptr,n2 As Any Ptr) As Long
    If Lc(*Cptr(String Ptr,n1)) > Lc(*Cptr(String Ptr,n2)) Then Return -1
    If Lc(*Cptr(String Ptr,n1)) < Lc(*Cptr(String Ptr,n2)) Then Return 1
    Return 0
End Function

Sub sortstring(s() As String,L As Long,U As Long,direction As String="up")
    If Lcase(direction)="up" Then
        qsort( @s((L)),((U)-(L)+1),Sizeof(s),@callbackU)
    Else
        qsort( @s((L)),((U)-(L)+1),Sizeof(s),@callbackD)
    End If
End Sub


Sub create(L() As String)
    #define range(f,l) Int(Rnd*(((l)+1)-(f))+(f))
    #define q range(97,122)-Iif(Rnd>.5,32,0)
    Randomize 1
    For n As Long=Lbound(L) To Ubound(L)
        Dim As String g1=Chr(q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q)
        Dim As String g2=Chr(q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q)
        Dim As String g3=Chr(q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q)
        L(n)=Left(g1+g2+g3,60+Rnd*30)
    Next
End Sub

Sub show(L() As String)
    For n As Long=Lbound(L) To 10
        Print L(n)
    Next
    For n As Long=1 To 4
        Print "..."
    Next
    For n As Long=Ubound(L)-10 To Ubound(L)
        Print L(n)
    Next
End Sub

Dim As Double t1,t2
Dim As Long limit=10000'00
Dim As String L(1 To limit)
Print "Creating string"
create(L())
Print "Commence sort (crt)"
t1=Timer
sortstring(L(),lbound(L),ubound(L),"up")
t2=Timer
show(L())
Print t2-t1;"  Seconds crt"
Sleep  
 
Juergen Kuehlwein
Posts: 284
Joined: Mar 07, 2018 13:59
Location: Germany

Re: StringArray Sort (case independent)

Post by Juergen Kuehlwein »

The DOS sort does this automatically
I must admit, that i missed this fact - so there is indeed some room for improvement!


JK
jj2007
Posts: 2326
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: StringArray Sort (case independent)

Post by jj2007 »

There is indeed quite a difference between case-insensitive and case-sensitive sorting, more than a factor 2:

Code: Select all

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz

sorting 1000000 lines case-insensitive took 1441 ms
AAaAHZybMJfIBMDPTlGpjvUIKUAUarYYBhwoQzXDbBiAcpCJBdWibuPGKXCeSrAcOsQVouKGLUtXF
aAABoYraUmtdbUjgRDbugWslpjAUIKuszBiOYgbGgDwawBsyfUTodDuktDyLUCdtDSYqIFfwxoBlKDfbiuHQ
aaadeaEoIYWvHUzkSOtliXprvQLwLRmQtZnACEWprRsosQEBCExqPRkIFHyxbkWHovUZtDOKw
...
ZZzyzetRGgWITClgQOTutTCdCiuHMxPNPuxGIsUFqStMIuIRPqlWpHdzyckcajQMCgNTbYzilNuy
zzZzcsrZGTHbiajufJIMXwWdbKFncznEBmlSLCGJBKFTHoixDzzNgeXSRohyfJHAQreI
zZZzToOoFStspcNrVaEIfjsxTNFZWUIwVWqxrzppJBjnxhbFgXmoCjUQxitzjpsLsmkTJefTDsKUKDj

sorting 1000000 lines case-sensitive took 584 ms
AAAJkbfOkQBObDHcbTIwHpqfZhoeEAOdznaOpVJqBwanparynNfGQbaMfZwqeWVzuereGocrnKWbAx
AAAKDBqCLIvtYhNOwRHDkUdYhtCnqVlAtRxBIyOmZtyRTJmmduoLcogeGKDYzS
AAANyurgtXuGMSDjUiphJpjDEIXsqWblivOefMURODFFRLEmyNHboCtnRgrrGsTpQXBZ
...
zzzMNTYBXQesyiSPhDSnEMllmqJyPOetjQASwTjPPdMGALxaTsXIBwpiDnRqaG
zzzWKRnlAPdJrfgIgiHjbJTvoMgrRnxVAukqhhjMqshOgVAuaGceqItZADoDlNBtykFZjwKyKuMhTJeu
zzzYZAgWBUtSZfUmlAWnFFRpDcGgwdOFndrhSJZcyMsgkvLnTOTRmUtwjKvNKJndDYagMFDBnZQduJzTADOTbIsyx
As mentioned above:
jj2007 wrote:Converting all strings to lcase or ucase strings is an overkill, and probably very slow. It would be sufficient to use a string compare function that is case-insensitive. If there is interest, I can offer a 32-bit DLL using StringsDiffer().
srvaldez
Posts: 3379
Joined: Sep 25, 2005 21:54

Re: StringArray Sort (case independent)

Post by srvaldez »

on Windows, perhaps use stricmp (not sure if it's available in plain C)
Juergen Kuehlwein
Posts: 284
Joined: Mar 07, 2018 13:59
Location: Germany

Re: StringArray Sort (case independent)

Post by Juergen Kuehlwein »

I can´t believe it! Using a dedicated comparison function i can get it down to 0.6 seconds (CASE INSENSITIVE)

Code: Select all

static unsigned char low1[256] = {
 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,
 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63,
 64, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111,
 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 91, 92, 93, 94, 95,
 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111,
 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127,
 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 154, 139, 156, 141, 158, 143,
 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 255,
 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175,
 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191,
 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239,
 240, 241, 242, 243, 244, 245, 246, 215, 248, 249, 250, 251, 252, 253, 254, 223,
 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239,
 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255 };

static unsigned char low2[256] = {
 65, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,
 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63,
 64, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111,
 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 91, 92, 93, 94, 95,
 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111,
 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127,
 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 154, 139, 156, 141, 158, 143,
 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 255,
 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175,
 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191,
 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239,
 240, 241, 242, 243, 244, 245, 246, 215, 248, 249, 250, 251, 252, 253, 254, 223,
 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239,
 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255 };


int strcmp_ab(char const *a, char const *b)           // if a < b then return 1
{                        
    while (low1[(unsigned char)*a] == low2[(unsigned char)*b]) {
      a++;
      b++;
    }
    // either strings differ or null character detected.
    // perform comparison using same(!) table.


    if (low1[(unsigned char)*a] < low1[(unsigned char)*b])    
    {
        return 1;
    }
    return 0;    
}


int strcmp_ba(char const *a, char const *b)           // if b < a then return 1
{                        
    while (low1[(unsigned char)*a] == low2[(unsigned char)*b]) {
      a++;
      b++;
    }
    // either strings differ or null character detected.
    // perform comparison using same(!) table.


    if (low1[(unsigned char)*b] < low1[(unsigned char)*a])    
    {
        return 1;
    }
    return 0;    
}

Creating string
Commence sort Array(sort, ...)
AAaAHZybMJfIBMDPTlGpjvUIKUAUarYYBhwoQzXDbBiAcpCJBdWibuPGKXCeSrAcOsQVouKGLUtXF
aAABoYraUmtdbUjgRDbugWslpjAUIKuszBiOYgbGgDwawBsyfUTodDuktDyLUCdtDSYqIFfwxoBlKDfbiuHQ
aAAdgYmOjPfEHlBAxbwixWqYTppVdpWGkQMBRNmCTHZfjqVHHsnYUNcnJtYEXkXrgjLLpz
AAAdXKZoSZITCgAlVxqloFzcjapuQHGGQBOElGPKCqLXaVMBIzaDjebMxglyt
aaadeaEoIYWvHUzkSOtliXprvQLwLRmQtZnACEWprRsosQEBCExqPRkIFHyxbkWHovUZtDOKw
aAAeFlewdudpRIuowRSihDPHiAzTqtGfKRvbeuqAwufmfbvCeTVQYwpXwzukg
aaAelVOVdRieVichzLhUWafcKAoFjlAgwYvopUmYgCOfUVNlwdlxRzAZwKjkgxvBEqviohBHcPlhOFzxcRBYy
AaAEdkSIREVLGLJZScnkXaOymKTFdJGPZUsOGaOTUHuCWijpmEdhUHMBxyeYzI
aaAFkJClunAIaLvmnsgvDSXUFdemXprnVJSXuCNnLrnmfBhCaFRyctADnBbGMefQMt
aaaHDPAiCmmrRjePtckJsWQLINgkFSgbEqUGGTdrnrtSDmAjQQGGowwaSaghyLuEM
...
...
...
...
ZZZuFgFflggiJMoVJGWkcOTHpbShTRzFCFltRRAljToebpZclncAmwXYZsQKszHSCdnx
zZzVrAvYyKkUVmUFkaRgJSivGzntmypIDLfQvsuekrwIXoQwIemjjkRkdOpgbpllWwIvnmjwgEdj
zZZxkXWYMdvqOvUurUXPYkofDtHOBSGmpqrcBRpBGbYSHVRtJvhjpryJpBCtvurFCiTcvZAGdGFzlKKXw
zZZyPQXOdeaMoUjdXDNYDjsRomlpjNyoeUHpyoYyYxloSCbnXeqzBvBXzgUUmXSErgdTn
zZZyTXiTBhRLyLZtLVHIVqZkquYTncsnbztSkxeNTteHouBCHqkYUrJeLMvphCMB
zZZwYFjQIpJFGsIFvjZtYIgJViAyOOmvbLEegDhtrAHlyQlMqotYUmxUeJNIPRoNopMfJDdCKZEhfUPdIXupqRuO
ZzZXLUDRAPWHQTmxjehAUyDvJqngpOnjtVFhDzPERrhlsKDNlrCecMdmYHxGsaenuoHwaNrlGqBDNDPgztCAFqKFTa
ZZzyanZINaMDFipgHqschlpzDjIiguGYeRPkMSHHhgKnRucrAurxZWTdihlfYt
zZzycXtEdYGXwjLnXqIOJBelLnwaVMCpwbLdRpqfZHQEwwQEmIHnxgwCVCAGAnwUVQAmlMuQEzBcaNLImCCumMX
zzZzcsrZGTHbiajufJIMXwWdbKFncznEBmlSLCGJBKFTHoixDzzNgeXSRohyfJHAQreI
zZZzToOoFStspcNrVaEIfjsxTNFZWUIwVWqxrzppJBjnxhbFgXmoCjUQxitzjpsLsmkTJefTDsKUKDj
0.5588046405318892 Seconds Array(sort, ...)

JK
Post Reply