StringArray Sort (case independent)
-
- Posts: 538
- Joined: Dec 02, 2011 22:51
- Location: France
Re: StringArray Sort (case independent)
I remain convinced of the relevance of lzle in this case (about 1000000 entries): it is a quasi standard feature of the tool. The reuse of the memory addresses makes the performances much less sensitive to the structure of the data to be sorted and the successive sorts and the tree is adapted to average volumetrics (as long as the memory is sufficient). The main disadvantage of use (compared to Qsort) is that it is (a little) heavy. In view of the "visual sort" topic, I believe that, hypothetically, a scientific level FB coded dedicated algo possibly using pointers could certainly make better (than the optimal lzle case) while retaining the scalar character. Question is how much better...(10% ? 40% ?) I think this problem a bit complicated.
Re: StringArray Sort (case independent)
And I remain equally convinced, acting on the maxim of:
"more code, more trouble, less speed"
that your tool is of no consequence, at least, as far as it concerns myself.
(Btw. if you don't stop your "advertising campaign", I'm going to report you!)
"more code, more trouble, less speed"
that your tool is of no consequence, at least, as far as it concerns myself.
(Btw. if you don't stop your "advertising campaign", I'm going to report you!)
Re: StringArray Sort (case independent)
Average length, random, ...?dodicat wrote:one million shortish strings
-
- Posts: 538
- Joined: Dec 02, 2011 22:51
- Location: France
Re: StringArray Sort (case independent)
In general, I'd say yes. But in the case of a sort, the problem is much more complex. Certainly, a code "bigger" will consume cache, but if the object of this code is precisely to limit cache consumption by the data, then this choice becomes profitable beyond a certain volume (not so high) . Similarly, the reuse of addresses will help address the problem of "processing" (limit the Redim, allocate and deallocate), and therefore also optimize the case of many small sorts. To give a more concrete example, something like this :"more code, more trouble, less speed"
MyList.HashKeyUnique(0)
MyList.HashTag(lCase($Key) : MyList.RwTag1($Key) => For each key
Then :
While MyList.KeyStep(Rev) : .. : Wend => String Sort
While MyList.nKeyStep(Rev) : .. : Wend => Numeric like Sort result
In the related topic :
LZLE could be useful to evaluate and validate the performance of codes you expect, whenever these codes seems not interest you every time.
Re: StringArray Sort (case independent)
Timings for a one million strings array with a total of 10MB:dodicat wrote:I did some speed tests.
The dos sort was by far the fastest (3.2 seconds for one million shortish strings)
Standard Quicksort was about eight seconds.
The C runtime was about ten seconds.
Code: Select all
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
0.57 seconds for sorting 1 Million strings
I can't believe that C is so slow. Can you tell more about your setup?The C runtime was about ten seconds.
-
- Posts: 538
- Joined: Dec 02, 2011 22:51
- Location: France
Re: StringArray Sort (case independent)
On my side : i5 2410m w64, battery mode. Lzle.bi (FB) : 2 sec / 3 sec depending on the dataset.
Re: StringArray Sort (case independent)
Example of DOS sort for a million lines of a mixed bag of text characters 60 to 90 digits long.
Code: Select all
Function StringSplit(s_in As String,chars As String,result() As String) As Long
Dim As Long ctr,ctr2,k,n,LC=Len(chars)
Dim As boolean tally(Len(s_in))
#macro check_instring()
n=0
While n<Lc
If chars[n]=s_in[k] Then
tally(k)=true
If (ctr2-1) Then ctr+=1
ctr2=0
Exit While
End If
n+=1
Wend
#endmacro
#macro split()
If tally(k) Then
If (ctr2-1) Then ctr+=1:result(ctr)=Mid(s_in,k+2-ctr2,ctr2-1)
ctr2=0
End If
#endmacro
'================== LOOP TWICE =======================
For k =0 To Len(s_in)-1
ctr2+=1:check_instring()
Next k
If ctr=0 Then
If Len(s_in) Andalso Instr(chars,Chr(s_in[0])) Then ctr=1
End If
If ctr Then Redim result(1 To ctr): ctr=0:ctr2=0 Else Return 0
For k =0 To Len(s_in)-1
ctr2+=1:split()
Next k
'===================== Last one ========================
If ctr2>0 Then
Redim Preserve result(1 To ctr+1)
result(ctr+1)=Mid(s_in,k+1-ctr2,ctr2)
End If
Return Ubound(result)
End Function
Sub DOSsort(s() As String,d As String="down")
Var f=Freefile
Var cd=Curdir
Dim As String cmd,text
Open "_tmp_.txt" For Output As #f
For n As Long=Lbound(s) To Ubound(s)
Print #f,s(n)
Next n
Close #f
If d="down" Then
cmd= "SORT/r " + cd + "\_tmp_.txt /o " +cd +"\_out_.txt"
Else
cmd= "SORT " + cd + "\_tmp_.txt /o " +cd +"\_out_.txt"
End If
Shell(cmd)
Open "_out_.txt" For Binary Access Read As #f
If Lof(f) > 0 Then
text = String(Lof(f), 0)
Get #f, , text
End If
Close #f
StringSplit(text,Chr(10),s())
Kill cd+"\_out_.txt"
Kill cd+"\_tmp_.txt"
print
cmd="if EXIST _out_.txt echo please delete _out_.txt manually"
shell(cmd)
cmd="if EXIST _tmp_.txt echo please delete _tmp_.txt manually"
shell(cmd)
print
End Sub
Sub create(L() As String)
#define range(f,l) Int(Rnd*(((l)+1)-(f))+(f))
#define q range(97,122)-Iif(Rnd>.5,32,0)
Randomize 1
For n As Long=Lbound(L) To Ubound(L)
Dim As String g1=Chr(q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q)
Dim As String g2=Chr(q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q)
Dim As String g3=Chr(q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q)
L(n)=Left(g1+g2+g3,60+Rnd*30)
Next
End Sub
Sub show(L() As String)
For n As Long=Lbound(L) To 10
Print L(n)
Next
For n As Long=1 To 4
Print "..."
Next
For n As Long=Ubound(L)-10 To Ubound(L)
Print L(n)
Next
End Sub
Dim As Double t1,t2
Dim As Long limit=1000000
Dim As String L(1 To limit)
Print "Creating string"
create(L())
Print "Commence sort (DOS)"
t1=Timer
DOSsort(L(),"up")
t2=Timer
show(L())
Print t2-t1;" Seconds DOS"
Sleep
-
- Posts: 538
- Joined: Dec 02, 2011 22:51
- Location: France
Re: StringArray Sort (case independent)
2-3 sec with a len 10-20 on my side. Using len 60-90 could be 4 times slower or above. to be tested using different optimization options. whenever your solution is using disk cache or not, sounds interesting for optimization. I ll do a few tests also on a crypted disk. I ll do also some new tests on lzle, to re check stability and reliability. I need to better document good practices and bugs issues or unsupported features. I do not have my computer for the moment so I ll proceed later to the tests.
Re: StringArray Sort (case independent)
I get a weird error:dodicat wrote:Example of DOS sort for a million lines of a mixed bag of text characters 60 to 90 digits long.
It definitely exists, as a simple SORT from a DOS prompt shows. Having worked with Windows for a while, I tried something - and it worked:"'SORT' is not recognized as an internal or external command, operable program or batch file."
Code: Select all
cmd= "C:\Windows\System32\SORT.exe " + cd + "\_tmp_.txt /o " +cd +"\_out_.txt"
Sorting the same array with QSort() takes less than 1.5 seconds, so there is some room for improvement ;-)
-
- Posts: 284
- Joined: Mar 07, 2018 13:59
- Location: Germany
Re: StringArray Sort (case independent)
7.455707071739653 Seconds DOS
1.974337754966882 Seconds array(sort, L)
1.97 vs 1.5 (Assembler based) is not that bad, i think. I´m almost ready for a release of new array features discussed here (viewtopic.php?f=17&t=27606)
JK
1.974337754966882 Seconds array(sort, L)
1.97 vs 1.5 (Assembler based) is not that bad, i think. I´m almost ready for a release of new array features discussed here (viewtopic.php?f=17&t=27606)
JK
Re: StringArray Sort (case independent)
The required sort (From Mr Swiss's point of view) changes the string to lcase (or ucase) ,compares, then swaps the original array elements.
The DOS sort does this automatically.
The crt sort is case dependent Unless the callback function changes the case.
But this slows it down to a snails' pace.
Choose either line 6 or line 7 by commenting out to see the speed difference between case sensitive and case insensitive.
I have chosen only 10000 elements, for a million elements, changing case, 229 seconds, not changing case 1.59 seconds
The DOS sort does this automatically.
The crt sort is case dependent Unless the callback function changes the case.
But this slows it down to a snails' pace.
Choose either line 6 or line 7 by commenting out to see the speed difference between case sensitive and case insensitive.
I have chosen only 10000 elements, for a million elements, changing case, 229 seconds, not changing case 1.59 seconds
Code: Select all
#include "crt.bi"
#define Lc lcase
'#define Lc
Function callbackU Cdecl(n1 As Any Ptr,n2 As Any Ptr) As Long
If Lc(*Cptr(String Ptr,n1)) < Lc(*Cptr(String Ptr,n2)) Then Return -1
If Lc(*Cptr(String Ptr,n1)) > Lc(*Cptr(String Ptr,n2)) Then Return 1
Return 0
End Function
Function callbackD Cdecl(n1 As Any Ptr,n2 As Any Ptr) As Long
If Lc(*Cptr(String Ptr,n1)) > Lc(*Cptr(String Ptr,n2)) Then Return -1
If Lc(*Cptr(String Ptr,n1)) < Lc(*Cptr(String Ptr,n2)) Then Return 1
Return 0
End Function
Sub sortstring(s() As String,L As Long,U As Long,direction As String="up")
If Lcase(direction)="up" Then
qsort( @s((L)),((U)-(L)+1),Sizeof(s),@callbackU)
Else
qsort( @s((L)),((U)-(L)+1),Sizeof(s),@callbackD)
End If
End Sub
Sub create(L() As String)
#define range(f,l) Int(Rnd*(((l)+1)-(f))+(f))
#define q range(97,122)-Iif(Rnd>.5,32,0)
Randomize 1
For n As Long=Lbound(L) To Ubound(L)
Dim As String g1=Chr(q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q)
Dim As String g2=Chr(q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q)
Dim As String g3=Chr(q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q,q)
L(n)=Left(g1+g2+g3,60+Rnd*30)
Next
End Sub
Sub show(L() As String)
For n As Long=Lbound(L) To 10
Print L(n)
Next
For n As Long=1 To 4
Print "..."
Next
For n As Long=Ubound(L)-10 To Ubound(L)
Print L(n)
Next
End Sub
Dim As Double t1,t2
Dim As Long limit=10000'00
Dim As String L(1 To limit)
Print "Creating string"
create(L())
Print "Commence sort (crt)"
t1=Timer
sortstring(L(),lbound(L),ubound(L),"up")
t2=Timer
show(L())
Print t2-t1;" Seconds crt"
Sleep
-
- Posts: 284
- Joined: Mar 07, 2018 13:59
- Location: Germany
Re: StringArray Sort (case independent)
I must admit, that i missed this fact - so there is indeed some room for improvement!The DOS sort does this automatically
JK
Re: StringArray Sort (case independent)
There is indeed quite a difference between case-insensitive and case-sensitive sorting, more than a factor 2:
As mentioned above:
Code: Select all
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
sorting 1000000 lines case-insensitive took 1441 ms
AAaAHZybMJfIBMDPTlGpjvUIKUAUarYYBhwoQzXDbBiAcpCJBdWibuPGKXCeSrAcOsQVouKGLUtXF
aAABoYraUmtdbUjgRDbugWslpjAUIKuszBiOYgbGgDwawBsyfUTodDuktDyLUCdtDSYqIFfwxoBlKDfbiuHQ
aaadeaEoIYWvHUzkSOtliXprvQLwLRmQtZnACEWprRsosQEBCExqPRkIFHyxbkWHovUZtDOKw
...
ZZzyzetRGgWITClgQOTutTCdCiuHMxPNPuxGIsUFqStMIuIRPqlWpHdzyckcajQMCgNTbYzilNuy
zzZzcsrZGTHbiajufJIMXwWdbKFncznEBmlSLCGJBKFTHoixDzzNgeXSRohyfJHAQreI
zZZzToOoFStspcNrVaEIfjsxTNFZWUIwVWqxrzppJBjnxhbFgXmoCjUQxitzjpsLsmkTJefTDsKUKDj
sorting 1000000 lines case-sensitive took 584 ms
AAAJkbfOkQBObDHcbTIwHpqfZhoeEAOdznaOpVJqBwanparynNfGQbaMfZwqeWVzuereGocrnKWbAx
AAAKDBqCLIvtYhNOwRHDkUdYhtCnqVlAtRxBIyOmZtyRTJmmduoLcogeGKDYzS
AAANyurgtXuGMSDjUiphJpjDEIXsqWblivOefMURODFFRLEmyNHboCtnRgrrGsTpQXBZ
...
zzzMNTYBXQesyiSPhDSnEMllmqJyPOetjQASwTjPPdMGALxaTsXIBwpiDnRqaG
zzzWKRnlAPdJrfgIgiHjbJTvoMgrRnxVAukqhhjMqshOgVAuaGceqItZADoDlNBtykFZjwKyKuMhTJeu
zzzYZAgWBUtSZfUmlAWnFFRpDcGgwdOFndrhSJZcyMsgkvLnTOTRmUtwjKvNKJndDYagMFDBnZQduJzTADOTbIsyx
jj2007 wrote:Converting all strings to lcase or ucase strings is an overkill, and probably very slow. It would be sufficient to use a string compare function that is case-insensitive. If there is interest, I can offer a 32-bit DLL using StringsDiffer().
Re: StringArray Sort (case independent)
on Windows, perhaps use stricmp (not sure if it's available in plain C)
-
- Posts: 284
- Joined: Mar 07, 2018 13:59
- Location: Germany
Re: StringArray Sort (case independent)
I can´t believe it! Using a dedicated comparison function i can get it down to 0.6 seconds (CASE INSENSITIVE)
JK
Code: Select all
static unsigned char low1[256] = {
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,
48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63,
64, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111,
112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 91, 92, 93, 94, 95,
96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111,
112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127,
128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 154, 139, 156, 141, 158, 143,
144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 255,
160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175,
176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191,
224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239,
240, 241, 242, 243, 244, 245, 246, 215, 248, 249, 250, 251, 252, 253, 254, 223,
224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239,
240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255 };
static unsigned char low2[256] = {
65, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,
48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63,
64, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111,
112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 91, 92, 93, 94, 95,
96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111,
112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127,
128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 154, 139, 156, 141, 158, 143,
144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 255,
160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175,
176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191,
224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239,
240, 241, 242, 243, 244, 245, 246, 215, 248, 249, 250, 251, 252, 253, 254, 223,
224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239,
240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255 };
int strcmp_ab(char const *a, char const *b) // if a < b then return 1
{
while (low1[(unsigned char)*a] == low2[(unsigned char)*b]) {
a++;
b++;
}
// either strings differ or null character detected.
// perform comparison using same(!) table.
if (low1[(unsigned char)*a] < low1[(unsigned char)*b])
{
return 1;
}
return 0;
}
int strcmp_ba(char const *a, char const *b) // if b < a then return 1
{
while (low1[(unsigned char)*a] == low2[(unsigned char)*b]) {
a++;
b++;
}
// either strings differ or null character detected.
// perform comparison using same(!) table.
if (low1[(unsigned char)*b] < low1[(unsigned char)*a])
{
return 1;
}
return 0;
}
Creating string
Commence sort Array(sort, ...)
AAaAHZybMJfIBMDPTlGpjvUIKUAUarYYBhwoQzXDbBiAcpCJBdWibuPGKXCeSrAcOsQVouKGLUtXF
aAABoYraUmtdbUjgRDbugWslpjAUIKuszBiOYgbGgDwawBsyfUTodDuktDyLUCdtDSYqIFfwxoBlKDfbiuHQ
aAAdgYmOjPfEHlBAxbwixWqYTppVdpWGkQMBRNmCTHZfjqVHHsnYUNcnJtYEXkXrgjLLpz
AAAdXKZoSZITCgAlVxqloFzcjapuQHGGQBOElGPKCqLXaVMBIzaDjebMxglyt
aaadeaEoIYWvHUzkSOtliXprvQLwLRmQtZnACEWprRsosQEBCExqPRkIFHyxbkWHovUZtDOKw
aAAeFlewdudpRIuowRSihDPHiAzTqtGfKRvbeuqAwufmfbvCeTVQYwpXwzukg
aaAelVOVdRieVichzLhUWafcKAoFjlAgwYvopUmYgCOfUVNlwdlxRzAZwKjkgxvBEqviohBHcPlhOFzxcRBYy
AaAEdkSIREVLGLJZScnkXaOymKTFdJGPZUsOGaOTUHuCWijpmEdhUHMBxyeYzI
aaAFkJClunAIaLvmnsgvDSXUFdemXprnVJSXuCNnLrnmfBhCaFRyctADnBbGMefQMt
aaaHDPAiCmmrRjePtckJsWQLINgkFSgbEqUGGTdrnrtSDmAjQQGGowwaSaghyLuEM
...
...
...
...
ZZZuFgFflggiJMoVJGWkcOTHpbShTRzFCFltRRAljToebpZclncAmwXYZsQKszHSCdnx
zZzVrAvYyKkUVmUFkaRgJSivGzntmypIDLfQvsuekrwIXoQwIemjjkRkdOpgbpllWwIvnmjwgEdj
zZZxkXWYMdvqOvUurUXPYkofDtHOBSGmpqrcBRpBGbYSHVRtJvhjpryJpBCtvurFCiTcvZAGdGFzlKKXw
zZZyPQXOdeaMoUjdXDNYDjsRomlpjNyoeUHpyoYyYxloSCbnXeqzBvBXzgUUmXSErgdTn
zZZyTXiTBhRLyLZtLVHIVqZkquYTncsnbztSkxeNTteHouBCHqkYUrJeLMvphCMB
zZZwYFjQIpJFGsIFvjZtYIgJViAyOOmvbLEegDhtrAHlyQlMqotYUmxUeJNIPRoNopMfJDdCKZEhfUPdIXupqRuO
ZzZXLUDRAPWHQTmxjehAUyDvJqngpOnjtVFhDzPERrhlsKDNlrCecMdmYHxGsaenuoHwaNrlGqBDNDPgztCAFqKFTa
ZZzyanZINaMDFipgHqschlpzDjIiguGYeRPkMSHHhgKnRucrAurxZWTdihlfYt
zZzycXtEdYGXwjLnXqIOJBelLnwaVMCpwbLdRpqfZHQEwwQEmIHnxgwCVCAGAnwUVQAmlMuQEzBcaNLImCCumMX
zzZzcsrZGTHbiajufJIMXwWdbKFncznEBmlSLCGJBKFTHoixDzzNgeXSRohyfJHAQreI
zZZzToOoFStspcNrVaEIfjsxTNFZWUIwVWqxrzppJBjnxhbFgXmoCjUQxitzjpsLsmkTJefTDsKUKDj
0.5588046405318892 Seconds Array(sort, ...)
JK