Thread status polling - safe?

General FreeBASIC programming questions.
fxm
Moderator
Posts: 12468
Joined: Apr 22, 2009 12:46
Location: Paris suburbs, FRANCE

Re: Thread status polling - safe?

Post by fxm »

My results with my PC Windows10 (AMD Ryzen 5 3500U):
(with original code)

Code: Select all

32-bit gas -exx
Threads     Test1   Test2
1           16.1    16.2
2           22.0    22.3
3           32.1    31.8
4           46.7    44.9
5           62.8    64.2
6           82.5    82.8
  • The maximum improvement factor is obtained for 3 threads running in parallel: +34%.

Code: Select all

64-bit gcc
Threads     Test1   Test2
1           11.5    11.3
2           17.5    16.8
3           26.6    25.0
4           41.0    38.0
5           58.9    58.7
6           77.1    77.4
  • The maximum improvement factor is obtained for 2 or 3 threads running in parallel: +25%.
badidea
Posts: 2629
Joined: May 24, 2007 22:10
Location: The Netherlands

Re: Thread status polling - safe?

Post by badidea »

fxm wrote: Jan 23, 2025 7:48 I modified the thread 'wordmap_type.recursiveSmart0()' so that it works internally on a copy of the instance passed (copy at the beginning, and copy back at the end) like this (so we lose the monitoring of the progress):
...code...
That would not help much, recursiveSmart0() is only called 170 times (in this case), recursiveSmart1() is the actual recursive function which is called millions of times.

The recursive depth is limited to the number of words to fit in the grid. Which is practically limited to 40 due to computation times. So I could change some array to fixed size. I did have a non-recursive solver at some moment, but it was ugly and and bit slower.
fxm
Moderator
Posts: 12468
Joined: Apr 22, 2009 12:46
Location: Paris suburbs, FRANCE

Re: Thread status polling - safe?

Post by fxm »

badidea wrote: Jan 23, 2025 21:52
fxm wrote: Jan 23, 2025 7:48 I modified the thread 'wordmap_type.recursiveSmart0()' so that it works internally on a copy of the instance passed (copy at the beginning, and copy back at the end) like this (so we lose the monitoring of the progress):
...code...
That would not help much, recursiveSmart0() is only called 170 times (in this case), recursiveSmart1() is the actual recursive function which is called millions of times.
But with my modification above, 'recursiveSmart0()' calls 'recursiveSmart1()' on the same instance which is in the local memory ('wt.recursiveSmart1(depth + 1)') like also the other methods ('wt.copyToMapSave(wp, word, erasedStr)', 'wt.copyToMap(wp, erasedStr)').
fxm
Moderator
Posts: 12468
Joined: Apr 22, 2009 12:46
Location: Paris suburbs, FRANCE

Re: Thread status polling - safe?

Post by fxm »

badidea wrote: Jan 23, 2025 21:52 The recursive depth is limited to the number of words to fit in the grid. Which is practically limited to 40 due to computation times. So I could change some array to fixed size. I did have a non-recursive solver at some moment, but it was ugly and and bit slower.
You can also try to change var-len strings (in types or local) into fix-len zstrings.

Note:
A fix-len 'String * N' can not be passed to a procedure as a fix-len string (a var-len string is used internally).
On the contrary, a fix-len 'Zstring * (N+1)' can be passed to a procedure by pointer (Byval As Zstring Ptr) or by reference (Byref Zstring).
fxm
Moderator
Posts: 12468
Joined: Apr 22, 2009 12:46
Location: Paris suburbs, FRANCE

Re: Thread status polling - safe?

Post by fxm »

fxm wrote: Jan 23, 2025 7:48 I modified the thread 'wordmap_type.recursiveSmart0()' so that it works internally on a copy of the instance passed (copy at the beginning, and copy back at the end) like this (so we lose the monitoring of the progress):
.....
But I did not see any improvement.

This is explained because simple numeric variables are well in local memory, but for all the pseudo-objects like var-len strings and dynamic arrays, only the descriptors are in local memory but not the data itself which is always on the heap (only fixed length data can be put in local memory).

From the original code, I just replaced all var-len strings by fix-len zstrings (in all types and in all procedure bodies):
- declarations of variables : 'Dim As String' replaced by 'Dim As Zstring * 256' everywhere
- declarations of parameters : 'As String' replaced by 'As Zstring' everywhere

I got the improved multi-threading performance as follows:

Code: Select all

32-bit gas -exx
Threads    original code with string    modified code with:
                                        zstring * 256      
1           16.0                        15.8          
2           22.5 / 2 = 11.2             17.2 / 2 = 8.6   
3           31.4 / 3 = 10.5 <- best     18.5 / 3 = 6.2   
4           44.6 / 4 = 11.2             21.4 / 4 = 5.3
5           59.5 / 5 = 11.9             24.8 / 5 = 5.0        
6           78.7 / 6 = 13.1             28.7 / 6 = 4.8 <- best
7          100.0 / 7 = 14.3             34.2 / 7 = 4.9        
8          124.1 / 8 = 15.8             39.1 / 8 = 4.9        

Code: Select all

64-bit gcc
Threads    original code with string    modified code with:
                                        zstring * 256      
1           11.1                        10.1               
2           15.2 / 2 =  7.6 <- best     10.9 / 2 = 5.4     
3           25.3 / 3 =  8.4             12.5 / 3 = 4.2     
4           38.3 / 4 =  9.6             15.1 / 4 = 3.8        
5           45.9 / 5 =  9.2             17.6 / 5 = 3.5 <- best
6           71.4 / 6 = 11.8             22.4 / 6 = 3.7        
7           91.6 / 7 = 13.1             26.6 / 7 = 3.8        
8          120.0 / 8 = 15.0             33.2 / 8 = 4.1       
I confirm that there is no additional gain in working on an instance copied in local memory than in working directly on the instance in shared memory (as tested previously).

I now believe that the remaining performance gap from the maximum achievable performance of multi-threading is mainly due to the use of dynamic (var-len) arrays.
badidea
Posts: 2629
Joined: May 24, 2007 22:10
Location: The Netherlands

Re: Thread status polling - safe?

Post by badidea »

That looks good. Some extra locking going on in normal fb strings for thread safety?
I tried a few unsuccessfully things. Among others my non-recursive solver which was performing dramatically bad with multiple threads.
It is a pity that we have a low level language but still limited control over these cores and their memory.
fxm
Moderator
Posts: 12468
Joined: Apr 22, 2009 12:46
Location: Paris suburbs, FRANCE

Re: Thread status polling - safe?

Post by fxm »

- When defining var-len strings, the string data are always put in the heap (only the descriptor is put in the local scope memory).
- Except for the dynamic arrays of strings and static strings, when defining fix-len strings, the string data are always put in the local scope memory (there is no descriptors).

So except the fix-len string variables defined directly or indirectly at main code level (which are therefore in the shared memory), the others (defined directly or indirectly at the procedure level) will be in the local memory (no more concurrency between threads).

Furthermore, fix-len string variables are allocated at program startup and deallocated at program termination, while var-len string variables are dynamically allocated/deallocated as the program executes (with allocation/deallocation concurrency between threads).

To modify the arrays (from var-len to fix-len), the problem is more delicate.
Not only do you have to define a fixed maximum size to allocate for each array, but for each one you would have to associate an index variable (per dimension if necessary) that would point to the last useful element ('Redim' would be replaced by the update of this index variable).
But given the number of arrays and for some the size of their elements, I am afraid that all of this would not fit in the stack.
fxm
Moderator
Posts: 12468
Joined: Apr 22, 2009 12:46
Location: Paris suburbs, FRANCE

Re: Thread status polling - safe?

Post by fxm »

By reducing all zstrings to 32 characters, I was able to transform all the var-len arrays into fix-len arrays of 30 elements per dimension (like the method defined above), and also to copy locally the instances passed to the threads, all this put in the stack.

I did tests up to 8 threads but I do not see any noticeable improvement compared to the only var-len strings changed in fix-len zstrings!
Amazing!
badidea
Posts: 2629
Joined: May 24, 2007 22:10
Location: The Netherlands

Re: Thread status polling - safe?

Post by badidea »

I haven't done much testing myself yet, but I did notice that switching to fixed length zstring also improves single threaded performance. This is nice but I did not expect that since I use len() at various location which I expected to be slower for zstring. May be this an example where fast CPU cache shines.

--- edit ---
This one erasedStr = space(len(word)) was an expensive operation. And not needed any more with zstring * 32. Just putting a 0 and the end now.
Last edited by badidea on Jan 29, 2025 22:00, edited 1 time in total.
fxm
Moderator
Posts: 12468
Joined: Apr 22, 2009 12:46
Location: Paris suburbs, FRANCE

Re: Thread status polling - safe?

Post by fxm »

badidea wrote: Jan 27, 2025 22:58 This one erasedStr = space(len(word)) was an expensive operation. And not needed any more with zstring * 32. Just putting a 0 and the end now.

Indeed, with a zstring, you just have to do:

Code: Select all

	erasedStr[len(word)] = 0  '' erasedStr = space(len(word))
	if wp.ori = 0 then 'horz
		for i as isz = 0 to len(word) - 1
			erasedStr[i] = wordMap(row, col + i)
			wordMap(row, col + i) = word[i]
		next
	else 'vert
		for i as isz = 0 to len(word) - 1
			erasedStr[i] = wordMap(row + i, col)
			wordMap(row + i, col) = word[i]
		next
	end if

As a result, this (above) significantly improves my previous results:
(impacted procedure called 1500000 times per thread)

Code: Select all

32-bit gas -exx
Threads    original code with string    modified code with:       modified code with:
                                        zstring * 256             zstring * 256
                                                                  + erasedStr[len(word)] = 0
1           16.0                        15.8                      15.4                      
2           22.5 / 2 = 11.2             17.2 / 2 = 8.6            15.9 / 2 = 8.0            
3           31.4 / 3 = 10.5 <- best     18.5 / 3 = 6.2            16.0 / 3 = 5.3            
4           44.6 / 4 = 11.2             21.4 / 4 = 5.3            18.8 / 4 = 4.7            
5           59.5 / 5 = 11.9             24.8 / 5 = 5.0            21.3 / 5 = 4.3            
6           78.7 / 6 = 13.1             28.7 / 6 = 4.8 <- best    23.7 / 6 = 4.0            
7          100.0 / 7 = 14.3             34.2 / 7 = 4.9            26.4 / 7 = 3.8 <- best    
8          124.1 / 8 = 15.8             39.1 / 8 = 4.9            30.6 / 8 = 3.8 <- best    

Code: Select all

64-bit gcc
Threads    original code with string    modified code with:       modified code with:       
                                        zstring * 256             zstring * 256             
                                                                  + erasedStr[len(word)] = 0
1           11.1                        10.1                       9.6                      
2           15.2 / 2 =  7.6 <- best     10.9 / 2 = 5.4            10.4 / 2 = 5.2            
3           25.3 / 3 =  8.4             12.5 / 3 = 4.2            11.2 / 3 = 3.7            
4           38.3 / 4 =  9.6             15.1 / 4 = 3.8            12.5 / 4 = 3.1            
5           45.9 / 5 =  9.2             17.6 / 5 = 3.5 <- best    15.1 / 5 = 3.0 <- best    
6           71.4 / 6 = 11.8             22.4 / 6 = 3.7            18.0 / 6 = 3.0 <- best    
7           91.6 / 7 = 13.1             26.6 / 7 = 3.8            21.3 / 7 = 3.0 <- best    
8          120.0 / 8 = 15.0             33.2 / 8 = 4.1            25.2 / 8 = 3.2            
fxm
Moderator
Posts: 12468
Joined: Apr 22, 2009 12:46
Location: Paris suburbs, FRANCE

Re: Thread status polling - safe?

Post by fxm »

The 'Space()' (or 'String()') function is a disaster for multi-threading with zstrings (compared to a loop using access by zstring indexes):

Code: Select all

Type Thread
    Dim As UInteger value
    Dim As Any Ptr pHandle
    Declare Static Sub thread1(ByVal pt As Thread Ptr)
    Declare Static Sub thread2(ByVal pt As Thread Ptr)
End Type

Sub Thread.thread1(ByVal pt As Thread Ptr)
    Dim As Zstring * 32 z
    For i As Integer = 1 To pt->value
        z = Space(31)
    Next i
End Sub

Sub Thread.thread2(ByVal pt As Thread Ptr)
    Dim As Zstring * 32 z
    For i As Integer = 1 To pt->value
        For j As Integer = 0 To 30
            z[j] = Asc(" ")
        Next j
        z[31] = 0
    Next i
End Sub

Sub MyThreads(ByVal pThread As Any Ptr, ByVal threadNB As UInteger = 1)
    Dim As Thread td(1 To threadNB)
    Dim As Double t
   
    t = Timer
    For i As Integer = 1 To threadNB
        td(i).value = 1000000
        td(i).pHandle = ThreadCreate(pThread, @td(i))
    Next I
    For i As Integer = 1 To threadNB
        ThreadWait(td(i).pHandle)
    Next I
    t = Timer - t

    Print "      total time for " & threadNB & " threads in parallel: " & t & " s"
    Print
End Sub

Print
For i As Integer = 1 To 8
    Print "Each thread using the 'z = Space(31)' instruction:"
    Mythreads(@Thread.thread1, I)
    Print "Each thread using a loop on 'z[j] = Asc(" ")':"
    Mythreads(@Thread.thread2, I)
    Print "-----------------------------------------------------------------------------"
    Print
Next i

Sleep
  • Output example:

    Code: Select all

    Each thread using the 'z = Space(31)' instruction:
          total time for 1 threads in parallel: 0.1497983999983461 s
    
    Each thread using a loop on 'z[j] = Asc()':
          total time for 1 threads in parallel: 0.09929199998789784 s
    
    -----------------------------------------------------------------------------
    
    Each thread using the 'z = Space(31)' instruction:
          total time for 2 threads in parallel: 0.5092992000064811 s
    
    Each thread using a loop on 'z[j] = Asc()':
          total time for 2 threads in parallel: 0.1060854000143934 s
    
    -----------------------------------------------------------------------------
    
    Each thread using the 'z = Space(31)' instruction:
          total time for 3 threads in parallel: 1.028965099988312 s
    
    Each thread using a loop on 'z[j] = Asc()':
          total time for 3 threads in parallel: 0.1170484000024032 s
    
    -----------------------------------------------------------------------------
    
    Each thread using the 'z = Space(31)' instruction:
          total time for 4 threads in parallel: 1.545667300007153 s
    
    Each thread using a loop on 'z[j] = Asc()':
          total time for 4 threads in parallel: 0.1235589999916584 s
    
    -----------------------------------------------------------------------------
    
    Each thread using the 'z = Space(31)' instruction:
          total time for 5 threads in parallel: 2.104250199986453 s
    
    Each thread using a loop on 'z[j] = Asc()':
          total time for 5 threads in parallel: 0.1470729000132565 s
    
    -----------------------------------------------------------------------------
    
    Each thread using the 'z = Space(31)' instruction:
          total time for 6 threads in parallel: 2.978753699990847 s
    
    Each thread using a loop on 'z[j] = Asc()':
          total time for 6 threads in parallel: 0.1509730000084488 s
    
    -----------------------------------------------------------------------------
    
    Each thread using the 'z = Space(31)' instruction:
          total time for 7 threads in parallel: 3.897512400003478 s
    
    Each thread using a loop on 'z[j] = Asc()':
          total time for 7 threads in parallel: 0.1601502000145132 s
    
    -----------------------------------------------------------------------------
    
    Each thread using the 'z = Space(31)' instruction:
          total time for 8 threads in parallel: 4.680235099998669 s
    
    Each thread using a loop on 'z[j] = Asc()':
          total time for 8 threads in parallel: 0.1599974000066453 s
    
    -----------------------------------------------------------------------------
    

Same observation for the 'Mid()' function for example:

Code: Select all

Type Thread
    Dim As UInteger value
    Dim As Any Ptr pHandle
    Declare Static Sub thread1(ByVal pt As Thread Ptr)
    Declare Static Sub thread2(ByVal pt As Thread Ptr)
End Type

Sub Thread.thread1(ByVal pt As Thread Ptr)
    Dim As Zstring * 32 z0 = "FreeBASIC version 1.20"
    Dim As Zstring * 32 z
    For i As Integer = 1 To pt->value
        z = Mid(z, 11,7)
    Next i
End Sub

Sub Thread.thread2(ByVal pt As Thread Ptr)
    Dim As Zstring * 32 z0 = "FreeBASIC version 1.20"
    Dim As Zstring * 32 z
    For i As Integer = 1 To pt->value
        For j As Integer = 0 To 6
            z[j] = z0[j + 10]
        Next j
        z[7] = 0
    Next i
End Sub

Sub MyThreads(ByVal pThread As Any Ptr, ByVal threadNB As UInteger = 1)
    Dim As Thread td(1 To threadNB)
    Dim As Double t
   
    t = Timer
    For i As Integer = 1 To threadNB
        td(i).value = 1000000
        td(i).pHandle = ThreadCreate(pThread, @td(i))
    Next I
    For i As Integer = 1 To threadNB
        ThreadWait(td(i).pHandle)
    Next I
    t = Timer - t

    Print "      total time for " & threadNB & " threads in parallel: " & t & " s"
    Print
End Sub

Print
For i As Integer = 1 To 8
    Print "Each thread using the 'z = Mid(z, 11,7)' instruction:"
    Mythreads(@Thread.thread1, I)
    Print "Each thread using a loop on 'z[j] = z0[j + 10]':"
    Mythreads(@Thread.thread2, I)
    Print "-----------------------------------------------------------------------------"
    Print
Next i

Sleep
  • Output example:

    Code: Select all

    Each thread using the 'z = Mid(z, 11,7)' instruction:
          total time for 1 threads in parallel: 0.06024279999967064 s
    
    Each thread using a loop on 'z[j] = z0[j + 10]':
          total time for 1 threads in parallel: 0.04805159998872455 s
    
    -----------------------------------------------------------------------------
    
    Each thread using the 'z = Mid(z, 11,7)' instruction:
          total time for 2 threads in parallel: 0.1579303999920256 s
    
    Each thread using a loop on 'z[j] = z0[j + 10]':
          total time for 2 threads in parallel: 0.04216660001127082 s
    
    -----------------------------------------------------------------------------
    
    Each thread using the 'z = Mid(z, 11,7)' instruction:
          total time for 3 threads in parallel: 0.5136763000062103 s
    
    Each thread using a loop on 'z[j] = z0[j + 10]':
          total time for 3 threads in parallel: 0.06358869998986449 s
    
    -----------------------------------------------------------------------------
    
    Each thread using the 'z = Mid(z, 11,7)' instruction:
          total time for 4 threads in parallel: 0.8272924999982365 s
    
    Each thread using a loop on 'z[j] = z0[j + 10]':
          total time for 4 threads in parallel: 0.06371049999064837 s
    
    -----------------------------------------------------------------------------
    
    Each thread using the 'z = Mid(z, 11,7)' instruction:
          total time for 5 threads in parallel: 1.066882700007113 s
    
    Each thread using a loop on 'z[j] = z0[j + 10]':
          total time for 5 threads in parallel: 0.07534949999316609 s
    
    -----------------------------------------------------------------------------
    
    Each thread using the 'z = Mid(z, 11,7)' instruction:
          total time for 6 threads in parallel: 1.389924499996908 s
    
    Each thread using a loop on 'z[j] = z0[j + 10]':
          total time for 6 threads in parallel: 0.07787849999262164 s
    
    -----------------------------------------------------------------------------
    
    Each thread using the 'z = Mid(z, 11,7)' instruction:
          total time for 7 threads in parallel: 1.739438400009874 s
    
    Each thread using a loop on 'z[j] = z0[j + 10]':
          total time for 7 threads in parallel: 0.07804730001440419 s
    
    -----------------------------------------------------------------------------
    
    Each thread using the 'z = Mid(z, 11,7)' instruction:
          total time for 8 threads in parallel: 2.166369900010437 s
    
    Each thread using a loop on 'z[j] = z0[j + 10]':
          total time for 8 threads in parallel: 0.0804321999890476 s
    
    -----------------------------------------------------------------------------
    

These types of functions work (also internally) with var-len strings and also in writing mode.
(no problem for 'Len()')


Same problem, a little more surprising, with 'Instr()' (but lots of read access to var-len string characters):

Code: Select all

Type Thread
    Dim As UInteger value
    Dim As Any Ptr pHandle
    Declare Static Sub thread1(ByVal pt As Thread Ptr)
    Declare Static Sub thread2(ByVal pt As Thread Ptr)
End Type

Sub Thread.thread1(ByVal pt As Thread Ptr)
    Dim As Zstring * 32 z0 = "FreeBASIC version 1.20"
    Dim As Zstring * 32 z = "version"
    Dim As Integer l
    For i As Integer = 1 To pt->value
        l = Instr(Z0, z)
    Next i
End Sub

Sub Thread.thread2(ByVal pt As Thread Ptr)
    Dim As Zstring * 32 z0 = "FreeBASIC version 1.20"
    Dim As Zstring * 32 z = "version"
    Dim As Integer l
    For i As Integer = 1 To pt->value
        For j As Integer = 0 To Len(z0) - 1 - Len(z) + 1
            For k As Integer = 0 To Len(z) -1
                If z0[j + k] <> z[k] Then Continue For, For
            Next k
            l = j + 1
            Exit For
        Next j
    Next i
End Sub

Sub MyThreads(ByVal pThread As Any Ptr, ByVal threadNB As UInteger = 1)
    Dim As Thread td(1 To threadNB)
    Dim As Double t
   
    t = Timer
    For i As Integer = 1 To threadNB
        td(i).value = 1000000
        td(i).pHandle = ThreadCreate(pThread, @td(i))
    Next I
    For i As Integer = 1 To threadNB
        ThreadWait(td(i).pHandle)
    Next I
    t = Timer - t

    Print "      total time for " & threadNB & " threads in parallel: " & t & " s"
    Print
End Sub

Print
For i As Integer = 1 To 8
    Print "Each thread using the 'l = Instr(Z0, z)' instruction:"
    Mythreads(@Thread.thread1, I)
    Print "Each thread using a double loop on 'z0[j + k] <> z[k]':"
    Mythreads(@Thread.thread2, I)
    Print "-----------------------------------------------------------------------------"
    Print
Next i

Sleep
  • Output example:

    Code: Select all

    Each thread using the 'l = Instr(Z0, z)' instruction:
          total time for 1 threads in parallel: 0.3322436000098321 s
    
    Each thread using a double loop on 'z0[j + k] <> z[k]':
          total time for 1 threads in parallel: 0.1748073000124606 s
    
    -----------------------------------------------------------------------------
    
    Each thread using the 'l = Instr(Z0, z)' instruction:
          total time for 2 threads in parallel: 0.4777280000018465 s
    
    Each thread using a double loop on 'z0[j + k] <> z[k]':
          total time for 2 threads in parallel: 0.1885630000044927 s
    
    -----------------------------------------------------------------------------
    
    Each thread using the 'l = Instr(Z0, z)' instruction:
          total time for 3 threads in parallel: 0.7530844000144583 s
    
    Each thread using a double loop on 'z0[j + k] <> z[k]':
          total time for 3 threads in parallel: 0.1988676000025009 s
    
    -----------------------------------------------------------------------------
    
    Each thread using the 'l = Instr(Z0, z)' instruction:
          total time for 4 threads in parallel: 1.294090000010726 s
    
    Each thread using a double loop on 'z0[j + k] <> z[k]':
          total time for 4 threads in parallel: 0.2265055000003002 s
    
    -----------------------------------------------------------------------------
    
    Each thread using the 'l = Instr(Z0, z)' instruction:
          total time for 5 threads in parallel: 1.835542399999213 s
    
    Each thread using a double loop on 'z0[j + k] <> z[k]':
          total time for 5 threads in parallel: 0.2461808999965172 s
    
    -----------------------------------------------------------------------------
    
    Each thread using the 'l = Instr(Z0, z)' instruction:
          total time for 6 threads in parallel: 2.417686400002552 s
    
    Each thread using a double loop on 'z0[j + k] <> z[k]':
          total time for 6 threads in parallel: 0.260329200004108 s
    
    -----------------------------------------------------------------------------
    
    Each thread using the 'l = Instr(Z0, z)' instruction:
          total time for 7 threads in parallel: 3.1629648999929 s
    
    Each thread using a double loop on 'z0[j + k] <> z[k]':
          total time for 7 threads in parallel: 0.2648269999879886 s
    
    -----------------------------------------------------------------------------
    
    Each thread using the 'l = Instr(Z0, z)' instruction:
          total time for 8 threads in parallel: 4.043754200001814 s
    
    Each thread using a double loop on 'z0[j + k] <> z[k]':
          total time for 8 threads in parallel: 0.2748090000038843 s
    
    -----------------------------------------------------------------------------
    

In conclusion, to optimize multi-threading versus strings, work with Zstrings only instead of var-len strings, and do not use built-in string functions (except 'Len()' and 'Asc()') but instead user code working on zstring indexes.

Even all string operators are involved!

Example with the '= (assign)' operator:

Code: Select all

Type Thread
    Dim As UInteger value
    Dim As Any Ptr pHandle
    Declare Static Sub thread1(ByVal pt As Thread Ptr)
    Declare Static Sub thread2(ByVal pt As Thread Ptr)
End Type

Sub Thread.thread1(ByVal pt As Thread Ptr)
    Dim As Zstring * 32 z0 = "FreeBASIC version 1.20"
    Dim As Zstring * 32 z
    For i As Integer = 1 To pt->value
        z = z0
    Next i
End Sub

Sub Thread.thread2(ByVal pt As Thread Ptr)
    Dim As Zstring * 32 z0 = "FreeBASIC version 1.20"
    Dim As Zstring * 32 z
    For i As Integer = 1 To pt->value
        Dim As Integer j
        For j = 0 To iif(Len(z0) < Sizeof(Z), Len(z0) - 1, Sizeof(z) - 2)
            z[j] = z0[j]
        Next j
        z[j] = 0
    Next i

End Sub

Sub MyThreads(ByVal pThread As Any Ptr, ByVal threadNB As UInteger = 1)
    Dim As Thread td(1 To threadNB)
    Dim As Double t
   
    t = Timer
    For i As Integer = 1 To threadNB
        td(i).value = 1000000
        td(i).pHandle = ThreadCreate(pThread, @td(i))
    Next I
    For i As Integer = 1 To threadNB
        ThreadWait(td(i).pHandle)
    Next I
    t = Timer - t

    Print "      total time for " & threadNB & " threads in parallel: " & t & " s"
    Print
End Sub

Print
For i As Integer = 1 To 8
    Print "Each thread using the 'z = z0' instruction:"
    Mythreads(@Thread.thread1, I)
    Print "Each thread using a loop on 'z[j] = z0[j]':"
    Mythreads(@Thread.thread2, I)
    Print "-----------------------------------------------------------------------------"
    Print
Next i

Sleep
  • Output example:

    Code: Select all

    Each thread using the 'z = z0' instruction:
          total time for 1 threads in parallel: 0.04907770001351253 s
    
    Each thread using a loop on 'z[j] = z0[j]':
          total time for 1 threads in parallel: 0.1235908999866666 s
    
    -----------------------------------------------------------------------------
    
    Each thread using the 'z = z0' instruction:
          total time for 2 threads in parallel: 0.1521594999937719 s
    
    Each thread using a loop on 'z[j] = z0[j]':
          total time for 2 threads in parallel: 0.1242202999945761 s
    
    -----------------------------------------------------------------------------
    
    Each thread using the 'z = z0' instruction:
          total time for 3 threads in parallel: 0.3016642000066554 s
    
    Each thread using a loop on 'z[j] = z0[j]':
          total time for 3 threads in parallel: 0.1416004999987592 s
    
    -----------------------------------------------------------------------------
    
    Each thread using the 'z = z0' instruction:
          total time for 4 threads in parallel: 0.4130713000087667 s
    
    Each thread using a loop on 'z[j] = z0[j]':
          total time for 4 threads in parallel: 0.1798235000062363 s
    
    -----------------------------------------------------------------------------
    
    Each thread using the 'z = z0' instruction:
          total time for 5 threads in parallel: 0.5329422000074686 s
    
    Each thread using a loop on 'z[j] = z0[j]':
          total time for 5 threads in parallel: 0.1988542000004685 s
    
    -----------------------------------------------------------------------------
    
    Each thread using the 'z = z0' instruction:
          total time for 6 threads in parallel: 0.6678812000001102 s
    
    Each thread using a loop on 'z[j] = z0[j]':
          total time for 6 threads in parallel: 0.2123802999949334 s
    
    -----------------------------------------------------------------------------
    
    Each thread using the 'z = z0' instruction:
          total time for 7 threads in parallel: 1.048261299987132 s
    
    Each thread using a loop on 'z[j] = z0[j]':
          total time for 7 threads in parallel: 0.2264801999932047 s
    
    -----------------------------------------------------------------------------
    
    Each thread using the 'z = z0' instruction:
          total time for 8 threads in parallel: 1.251318199985619 s
    
    Each thread using a loop on 'z[j] = z0[j]':
          total time for 8 threads in parallel: 0.2318750000131047 s
    
    -----------------------------------------------------------------------------
    
SARG
Posts: 1854
Joined: May 27, 2005 7:15
Location: FRANCE

Re: Thread status polling - safe?

Post by SARG »

@fxm

space instruction calls 'fb_space'

fb_space calls 'fb_hStrAllocTemp( )'

fb_hStrAllocTemp( ) calls 'fb_StrLock()' --> FBCALL void fb_StrLock( void ) { EnterCriticalSection( &__fb_string_mutex ); }

So it explains that using directly pointer is faster. :wink:

I didn't check other functions but I suppose they uses the same principle when threading and creation of temporary strings. Len doesn't need it.


BTW you remove your previous question in red.
fxm
Moderator
Posts: 12468
Joined: Apr 22, 2009 12:46
Location: Paris suburbs, FRANCE

Re: Thread status polling - safe?

Post by fxm »

Indeed, I think that to be thread safe, all dynamic memory allocations/reallocations/deallocations requests are internally protected by a mutex.
So concurrent requests will actually be serialized by this internal locking/unlocking of the mutex.
This obviously happens when the user calls these requests directly, but also indirectly when using var-len strings (or string functions or string operators, on any string) or dynamic arrays.

- Example of multi-threading using the '= (assign)' operator, applied either on zstrings or on bytes:

Code: Select all

Type Thread
    Dim As UInteger value
    Dim As Any Ptr pHandle
    Declare Static Sub thread1(ByVal pt As Thread Ptr)
    Declare Static Sub thread2(ByVal pt As Thread Ptr)
End Type

Sub Thread.thread1(ByVal pt As Thread Ptr)
    Dim As Zstring * 32 z
    For i As Integer = 1 To pt->value
        z = "X"
    Next i
End Sub

Sub Thread.thread2(ByVal pt As Thread Ptr)
    Dim As Zstring * 32 z
    For i As Integer = 1 To pt->value
        z[0] = Asc("X"): z[1] = 0
    Next i
End Sub

Sub MyThreads(ByVal pThread As Any Ptr, ByVal threadNB As UInteger = 1)
    Dim As Thread td(1 To threadNB)
    Dim As Double t
   
    t = Timer
    For i As Integer = 1 To threadNB
        td(i).value = 1000000
        td(i).pHandle = ThreadCreate(pThread, @td(i))
    Next I
    For i As Integer = 1 To threadNB
        ThreadWait(td(i).pHandle)
    Next I
    t = Timer - t

    Print "      total time for " & threadNB & " threads in parallel: " & t & " s"
    Print
End Sub

Print
For i As Integer = 1 To 8
    Print "Each thread using the 'z = ""X""' instruction:"
    Mythreads(@Thread.thread1, I)
    Print "Each thread using the 'z[0] = Asc(""X""): z[1] = 0' instruction:"
    Mythreads(@Thread.thread2, I)
    Print "-----------------------------------------------------------------------------"
    Print
Next i

Sleep
  • Output example:

    Code: Select all

    Each thread using the 'z = "X"' instruction:
          total time for 1 threads in parallel: 0.03509200000115698 s
    
    Each thread using the 'z[0] = Asc("X"): z[1] = 0' instruction:
          total time for 1 threads in parallel: 0.007378499991006038 s
    
    -----------------------------------------------------------------------------
    
    Each thread using the 'z = "X"' instruction:
          total time for 2 threads in parallel: 0.1292754000276091 s
    
    Each thread using the 'z[0] = Asc("X"): z[1] = 0' instruction:
          total time for 2 threads in parallel: 0.005797399973516804 s
    
    -----------------------------------------------------------------------------
    
    Each thread using the 'z = "X"' instruction:
          total time for 3 threads in parallel: 0.2533639000275798 s
    
    Each thread using the 'z[0] = Asc("X"): z[1] = 0' instruction:
          total time for 3 threads in parallel: 0.008446999977678615 s
    
    -----------------------------------------------------------------------------
    
    Each thread using the 'z = "X"' instruction:
          total time for 4 threads in parallel: 0.4020076999898095 s
    
    Each thread using the 'z[0] = Asc("X"): z[1] = 0' instruction:
          total time for 4 threads in parallel: 0.01053689998224172 s
    
    -----------------------------------------------------------------------------
    
    Each thread using the 'z = "X"' instruction:
          total time for 5 threads in parallel: 0.5048076000225876 s
    
    Each thread using the 'z[0] = Asc("X"): z[1] = 0' instruction:
          total time for 5 threads in parallel: 0.009472399993427416 s
    
    -----------------------------------------------------------------------------
    
    Each thread using the 'z = "X"' instruction:
          total time for 6 threads in parallel: 0.6370484999799544 s
    
    Each thread using the 'z[0] = Asc("X"): z[1] = 0' instruction:
          total time for 6 threads in parallel: 0.01175199999403276 s
    
    -----------------------------------------------------------------------------
    
    Each thread using the 'z = "X"' instruction:
          total time for 7 threads in parallel: 0.8271034999968663 s
    
    Each thread using the 'z[0] = Asc("X"): z[1] = 0' instruction:
          total time for 7 threads in parallel: 0.0117849000017145 s
    
    -----------------------------------------------------------------------------
    
    Each thread using the 'z = "X"' instruction:
          total time for 8 threads in parallel: 1.059193899973081 s
    
    Each thread using the 'z[0] = Asc("X"): z[1] = 0' instruction:
          total time for 8 threads in parallel: 0.01265520000160336 s
    
    We see here that for 1 thread, the assign operator applied on zstrings is about 5 times slower than when it is applied on bytes, but that this factor increases to about 100 for 8 threads!
- Even the '= (equal)' operator is impacted:

Code: Select all

Type Thread
    Dim As UInteger value
    Dim As Any Ptr pHandle
    Declare Static Sub thread1(ByVal pt As Thread Ptr)
    Declare Static Sub thread2(ByVal pt As Thread Ptr)
End Type

Sub Thread.thread1(ByVal pt As Thread Ptr)
    Dim As Zstring * 32 z1 = "FreeBASIC"
    Dim As Zstring * 32 z2 = "version"
    Dim As Integer result
    For i As Integer = 1 To pt->value
        result = (z1 = Z2)
    Next i
End Sub

Sub Thread.thread2(ByVal pt As Thread Ptr)
    Dim As Zstring * 32 z1 = "FreeBASIC"
    Dim As Zstring * 32 z2 = "version"
    Dim As Integer result
    For i As Integer = 1 To pt->value
        result = -1
        If Len(z1) <> Len(z2) Then
            result = 0
        Else
            For j As Integer = 0 To Len(z1) - 1
                If z1[j] <> z2[j] Then result = 0 : Exit For
            Next j
        End If
    Next i
End Sub

Sub MyThreads(ByVal pThread As Any Ptr, ByVal threadNB As UInteger = 1)
    Dim As Thread td(1 To threadNB)
    Dim As Double t
   
    t = Timer
    For i As Integer = 1 To threadNB
        td(i).value = 1000000
        td(i).pHandle = ThreadCreate(pThread, @td(i))
    Next I
    For i As Integer = 1 To threadNB
        ThreadWait(td(i).pHandle)
    Next I
    t = Timer - t

    Print "      total time for " & threadNB & " threads in parallel: " & t & " s"
    Print
End Sub

Print
For i As Integer = 1 To 8
    Print "Each thread using the '(z1 = Z2)' instruction:"
    Mythreads(@Thread.thread1, I)
    Print "Each thread using the a loop on 'z1[j] <> z2[j]' instruction:"
    Mythreads(@Thread.thread2, I)
    Print "-----------------------------------------------------------------------------"
    Print
Next i

Sleep
  • Output example:

    Code: Select all

    Each thread using the '(z1 = Z2)' instruction:
          total time for 1 threads in parallel: 0.04545439999657219 s
    
    Each thread using the a loop on 'z1[j] <> z2[j]' instruction:
          total time for 1 threads in parallel: 0.01287489997355351 s
    
    -----------------------------------------------------------------------------
    
    Each thread using the '(z1 = Z2)' instruction:
          total time for 2 threads in parallel: 0.0903127999720823 s
    
    Each thread using the a loop on 'z1[j] <> z2[j]' instruction:
          total time for 2 threads in parallel: 0.01764429998527817 s
    
    -----------------------------------------------------------------------------
    
    Each thread using the '(z1 = Z2)' instruction:
          total time for 3 threads in parallel: 0.1638007999845854 s
    
    Each thread using the a loop on 'z1[j] <> z2[j]' instruction:
          total time for 3 threads in parallel: 0.01940870002488282 s
    
    -----------------------------------------------------------------------------
    
    Each thread using the '(z1 = Z2)' instruction:
          total time for 4 threads in parallel: 0.3058213999951533 s
    
    Each thread using the a loop on 'z1[j] <> z2[j]' instruction:
          total time for 4 threads in parallel: 0.02044119999339955 s
    
    -----------------------------------------------------------------------------
    
    Each thread using the '(z1 = Z2)' instruction:
          total time for 5 threads in parallel: 0.4767383999892161 s
    
    Each thread using the a loop on 'z1[j] <> z2[j]' instruction:
          total time for 5 threads in parallel: 0.01874520001169344 s
    
    -----------------------------------------------------------------------------
    
    Each thread using the '(z1 = Z2)' instruction:
          total time for 6 threads in parallel: 0.6476152000108186 s
    
    Each thread using the a loop on 'z1[j] <> z2[j]' instruction:
          total time for 6 threads in parallel: 0.02309240000610657 s
    
    -----------------------------------------------------------------------------
    
    Each thread using the '(z1 = Z2)' instruction:
          total time for 7 threads in parallel: 0.8365377000048682 s
    
    Each thread using the a loop on 'z1[j] <> z2[j]' instruction:
          total time for 7 threads in parallel: 0.02092179999965538 s
    
    -----------------------------------------------------------------------------
    
    Each thread using the '(z1 = Z2)' instruction:
          total time for 8 threads in parallel: 1.084742900010554 s
    
    Each thread using the a loop on 'z1[j] <> z2[j]' instruction:
          total time for 8 threads in parallel: 0.02315860001729675 s
    
    -----------------------------------------------------------------------------
    
    We see here that for 1 thread, the equal operator applied on zstrings is about 3.5 times slower than when it is applied on bytes, but that this factor increases to about 50 for 8 threads!
fxm
Moderator
Posts: 12468
Joined: Apr 22, 2009 12:46
Location: Paris suburbs, FRANCE

Re: Thread status polling - safe?

Post by fxm »

From my last modified code, I added an optimization for the zstring assignment in the recursive function 'recursiveSmart1()' inside the For loop on 'ipw':
- Below, the line in comment is replaced the line that follows it:

Code: Select all

		'wordPlaced = entries.entry(ipw).word 'line optimized by the following line:
		fb_memcopy(wordPlaced[0], entries.entry(ipw).word[0], len(entries.entry(ipw).word) + 1)

The improvement is more efficient when compiling in 64-bit:

Code: Select all

32-bit gas -exx
Threads    original code with string    modified code with:       modified code with:           modified code with:
                                        zstring * 256             zstring * 256                 zstring * 256
                                                                  + erasedStr[len(word)] = 0    + erasedStr[len(word)] = 0
                                                                                                + above optimization
1           16.0                        15.8                      15.4                          14.5
2           22.5 / 2 = 11.2             17.2 / 2 = 8.6            15.9 / 2 = 8.0                15.6 / 2 = 7.8
3           31.4 / 3 = 10.5 <- best     18.5 / 3 = 6.2            16.0 / 3 = 5.3                16.9 / 3 = 5.6
4           44.6 / 4 = 11.2             21.4 / 4 = 5.3            18.8 / 4 = 4.7                17.5 / 4 = 4.4
5           59.5 / 5 = 11.9             24.8 / 5 = 5.0            21.3 / 5 = 4.3                19.3 / 5 = 3.9
6           78.7 / 6 = 13.1             28.7 / 6 = 4.8 <- best    23.7 / 6 = 4.0                20.4 / 6 = 3.4
7          100.0 / 7 = 14.3             34.2 / 7 = 4.9            26.4 / 7 = 3.8 <- best        21.4 / 7 = 3.1
8          124.1 / 8 = 15.8             39.1 / 8 = 4.9            30.6 / 8 = 3.8 <- best        22.5 / 8 = 2.8 <- best

Code: Select all

64-bit gcc
Threads    original code with string    modified code with:       modified code with:           modified code with:
                                        zstring * 256             zstring * 256                 zstring * 256
                                                                  + erasedStr[len(word)] = 0    + erasedStr[len(word)] = 0
                                                                                                + above optimization
1           11.1                        10.1                       9.6                           9.3
2           15.2 / 2 =  7.6 <- best     10.9 / 2 = 5.4            10.4 / 2 = 5.2                 9.9 / 2 = 4.9
3           25.3 / 3 =  8.4             12.5 / 3 = 4.2            11.2 / 3 = 3.7                10.1 / 3 = 3.4
4           38.3 / 4 =  9.6             15.1 / 4 = 3.8            12.5 / 4 = 3.1                10.4 / 4 = 2.6
5           45.9 / 5 =  9.2             17.6 / 5 = 3.5 <- best    15.1 / 5 = 3.0 <- best        10.9 / 5 = 2.2
6           71.4 / 6 = 11.8             22.4 / 6 = 3.7            18.0 / 6 = 3.0 <- best        11.5 / 6 = 1.9
7           91.6 / 7 = 13.1             26.6 / 7 = 3.8            21.3 / 7 = 3.0 <- best        11.8 / 7 = 1.7
8          120.0 / 8 = 15.0             33.2 / 8 = 4.1            25.2 / 8 = 3.2                12.0 / 8 = 1.5 <- best

[edit]
Post updated with better optimization.
Last edited by fxm on Jan 31, 2025 19:14, edited 2 times in total.
badidea
Posts: 2629
Joined: May 24, 2007 22:10
Location: The Netherlands

Re: Thread status polling - safe?

Post by badidea »

What do you have as entry_type?
Now with:

Code: Select all

type entry_type
	dim as zstring * 32 word
	dim as string clue
	'dim as string word, clue
end type
With your custom string copy it is indeed faster.
But even faster is leaving out the copy, and just use:

Code: Select all

dim as u8 wpch = entries.entry(ipw).word[ichpw]
In the next loop.
A bit harder to read however.
Post Reply