Why Not All CPUs Are Used for Multithreading ?

Windows specific questions.
Post Reply
fifoul
Posts: 15
Joined: Oct 17, 2005 13:20
Location: France

Why Not All CPUs Are Used for Multithreading ?

Post by fifoul »

Hello,

I have made a small prog for calculation using multithreading but when i start the program and i open the Resource Monitor in Windows, i see that there's only one of my numa used

Only 36 processors (NUMA 1) of my 72 logic cpu are used to execute all threads of the program
I would like to know if it's possible to use all processors NUMA 0 and NUMA 1 (36 + 36 logic processors) to execute all threads


all this processors (NUMA node 1) are used at 100% :

Processor 0 (node 1)
Processor 1 (node 1)
Processor 2 (node 1)
Processor 3 (node 1)
Processor 4 (node 1)
Processor 5 (node 1)
Processor 6 (node 1)
Processor 7 (node 1)
Processor 8 (node 1)
Processor 9 (node 1)
Processor 10 (node 1)
Processor 11 (node 1)
Processor 12 (node 1)
Processor 13 (node 1)
Processor 14 (node 1)
Processor 15 (node 1)
Processor 16 (node 1)
Processor 17 (node 1)
Processor 18 (node 1)
Processor 19 (node 1)
Processor 20 (node 1)
Processor 21 (node 1)
Processor 22 (node 1)
Processor 23 (node 1)
Processor 24 (node 1)
Processor 25 (node 1)
Processor 26 (node 1)
Processor 27 (node 1)
Processor 28 (node 1)
Processor 29 (node 1)
Processor 30 (node 1)
Processor 31 (node 1)
Processor 32 (node 1)
Processor 33 (node 1)
Processor 34 (node 1)
Processor 35 (node 1)

all this processors (NUMA node 0) stay at 0%

Processor 0 (node 0)
Processor 1 (node 0)
Processor 2 (node 0)
Processor 3 (node 0)
Processor 4 (node 0)
Processor 5 (node 0)
Processor 6 (node 0)
Processor 7 (node 0)
Processor 8 (node 0)
Processor 9 (node 0)
Processor 10 (node 0)
Processor 11 (node 0)
Processor 12 (node 0)
Processor 13 (node 0)
Processor 14 (node 0)
Processor 15 (node 0)
Processor 16 (node 0)
Processor 17 (node 0)
Processor 18 (node 0)
Processor 19 (node 0)
Processor 20 (node 0)
Processor 21 (node 0)
Processor 22 (node 0)
Processor 23 (node 0)
Processor 24 (node 0)
Processor 25 (node 0)
Processor 26 (node 0)
Processor 27 (node 0)
Processor 28 (node 0)
Processor 29 (node 0)
Processor 30 (node 0)
Processor 31 (node 0)
Processor 32 (node 0)
Processor 33 (node 0)
Processor 34 (node 0)
Processor 35 (node 0)


My computer is composed of 2x Intel(R) Xeon(R) CPU E5-2697 v4 (2x 18 cores or 2x 36 threads)
My OS is win 10 pro


this is a part of my program :

Code: Select all

'-------------------------------------------------------------------------------

Dim thread_0 As Any Ptr
sub thread0()
 calcul_3d(0)
end Sub

Dim thread_1 As Any Ptr
sub thread1()
 calcul_3d(1)
end Sub

Dim thread_2 As Any Ptr
sub thread2()
 calcul_3d(2)
end Sub


.
.
.
.
.
.
.

Dim thread_61 As Any Ptr
sub thread61()
 calcul_3d(61)
end Sub

Dim thread_62 As Any Ptr
sub thread62()
 calcul_3d(62)
end Sub

Dim thread_63 As Any Ptr
sub thread63()
 calcul_3d(63)
end Sub

'-------------------------------------------------------------------------------

do

 thread_0=ThreadCreate(@thread0,0)
 thread_1=ThreadCreate(@thread1,0)
 thread_2=ThreadCreate(@thread2,0)
 thread_3=ThreadCreate(@thread3,0)
 thread_4=ThreadCreate(@thread4,0)
 thread_5=ThreadCreate(@thread5,0)
 thread_6=ThreadCreate(@thread6,0)
 thread_7=ThreadCreate(@thread7,0)
 thread_8=ThreadCreate(@thread8,0)
 thread_9=ThreadCreate(@thread9,0)
 thread_10=ThreadCreate(@thread10,0)
 thread_11=ThreadCreate(@thread11,0)
 thread_12=ThreadCreate(@thread12,0)
 thread_13=ThreadCreate(@thread13,0)
 thread_14=ThreadCreate(@thread14,0)
 thread_15=ThreadCreate(@thread15,0)
 thread_16=ThreadCreate(@thread16,0)
 thread_17=ThreadCreate(@thread17,0)
 thread_18=ThreadCreate(@thread18,0)
 thread_19=ThreadCreate(@thread19,0)
 thread_20=ThreadCreate(@thread20,0)
 thread_21=ThreadCreate(@thread21,0)
 thread_22=ThreadCreate(@thread22,0)
 thread_23=ThreadCreate(@thread23,0)
 thread_24=ThreadCreate(@thread24,0)
 thread_25=ThreadCreate(@thread25,0)
 thread_26=ThreadCreate(@thread26,0)
 thread_27=ThreadCreate(@thread27,0)
 thread_28=ThreadCreate(@thread28,0)
 thread_29=ThreadCreate(@thread29,0)
 thread_30=ThreadCreate(@thread30,0)
 thread_31=ThreadCreate(@thread31,0)
 thread_32=ThreadCreate(@thread32,0)
 thread_33=ThreadCreate(@thread33,0)
 thread_34=ThreadCreate(@thread34,0)
 thread_35=ThreadCreate(@thread35,0)
 thread_36=ThreadCreate(@thread36,0)
 thread_37=ThreadCreate(@thread37,0)
 thread_38=ThreadCreate(@thread38,0)
 thread_39=ThreadCreate(@thread39,0)
 thread_40=ThreadCreate(@thread40,0)
 thread_41=ThreadCreate(@thread41,0)
 thread_42=ThreadCreate(@thread42,0)
 thread_43=ThreadCreate(@thread43,0)
 thread_44=ThreadCreate(@thread44,0)
 thread_45=ThreadCreate(@thread45,0)
 thread_46=ThreadCreate(@thread46,0)
 thread_47=ThreadCreate(@thread47,0)
 thread_48=ThreadCreate(@thread48,0)
 thread_49=ThreadCreate(@thread49,0)
 thread_50=ThreadCreate(@thread50,0)
 thread_51=ThreadCreate(@thread51,0)
 thread_52=ThreadCreate(@thread52,0)
 thread_53=ThreadCreate(@thread53,0)
 thread_54=ThreadCreate(@thread54,0)
 thread_55=ThreadCreate(@thread55,0)
 thread_56=ThreadCreate(@thread56,0)
 thread_57=ThreadCreate(@thread57,0)
 thread_58=ThreadCreate(@thread58,0)
 thread_59=ThreadCreate(@thread59,0)
 thread_60=ThreadCreate(@thread60,0)
 thread_61=ThreadCreate(@thread61,0)
 thread_62=ThreadCreate(@thread62,0)
 thread_63=ThreadCreate(@thread63,0)
 
 threadwait(thread_0)
 threadwait(thread_1)
 threadwait(thread_2)
 threadwait(thread_3)
 threadwait(thread_4)
 threadwait(thread_5)
 threadwait(thread_6)
 threadwait(thread_7)
 threadwait(thread_8)
 threadwait(thread_9)
 threadwait(thread_10)
 threadwait(thread_11)
 threadwait(thread_12)
 threadwait(thread_13)
 threadwait(thread_14)
 threadwait(thread_15)
 threadwait(thread_16)
 threadwait(thread_17)
 threadwait(thread_18)
 threadwait(thread_19)
 threadwait(thread_20)
 threadwait(thread_21)
 threadwait(thread_22)
 threadwait(thread_23)
 threadwait(thread_24)
 threadwait(thread_25)
 threadwait(thread_26)
 threadwait(thread_27)
 threadwait(thread_28)
 threadwait(thread_29)
 threadwait(thread_30)
 threadwait(thread_31)
 threadwait(thread_32)
 threadwait(thread_33)
 threadwait(thread_34)
 threadwait(thread_35)
 threadwait(thread_36)
 threadwait(thread_37)
 threadwait(thread_38)
 threadwait(thread_39)
 threadwait(thread_40)
 threadwait(thread_41)
 threadwait(thread_42)
 threadwait(thread_43)
 threadwait(thread_44)
 threadwait(thread_45)
 threadwait(thread_46)
 threadwait(thread_47)
 threadwait(thread_48)
 threadwait(thread_49)
 threadwait(thread_50)
 threadwait(thread_51)
 threadwait(thread_52)
 threadwait(thread_53)
 threadwait(thread_54)
 threadwait(thread_55)
 threadwait(thread_56)
 threadwait(thread_57)
 threadwait(thread_58)
 threadwait(thread_59)
 threadwait(thread_60)
 threadwait(thread_61)
 threadwait(thread_62)
 threadwait(thread_63)
 
 display_result()

loop

'-------------------------------------------------------------------------------
Last edited by fxm on Mar 25, 2023 12:03, edited 1 time in total.
Reason: Added code tags.
fxm
Moderator
Posts: 12081
Joined: Apr 22, 2009 12:46
Location: Paris suburbs, FRANCE

Re: Why Not All CPUs Are Used for Multithreading ?

Post by fxm »

First, check that both processors are enabled while your program is running:
- Press the "Ctrl, "Shift" and "Esc" keys to open the Task Manager.
- Click the "Details" tab.
- Right-click the program you want to use both cores and click "Set Affinity" from the drop-down menu.
- Activate the check box next to "<All Processors>" to allow the program to use both processors.
- Click "OK" to save your changes.
fxm
Moderator
Posts: 12081
Joined: Apr 22, 2009 12:46
Location: Paris suburbs, FRANCE

Re: Why Not All CPUs Are Used for Multithreading ?

Post by fxm »

On the other hand, check that the power mode for your Windows PC is set to "Best performance".
(to change the power mode quickly, select the Battery icon on the taskbar, and then drag the slider to the power mode you want)
fifoul
Posts: 15
Joined: Oct 17, 2005 13:20
Location: France

Re: Why Not All CPUs Are Used for Multithreading ?

Post by fifoul »

Thanks fxm for your answer

In the Task Manager/Set Affinity, it only offered me to select either node 0(36 CPUs) OR node 1(36 CPUs), but not node 0 AND node 1 (36+36 CPUs) at the same time.

I found the problem, I disabled the NUMA option in the bios performance menu
Now node 1 is composed of 64 processors and node 0 of 8 processors.

And the program runs 64 threads on 64 processors simultaneously.

I did a little performance test and this is what I get:

with option NUMA enabled in bios:

The program run 64 threads on 36 processors(node 1)
The node 1(36 logic CPUs are at 100%)
The node 0(36 logic CPUs are at 0%)
The execution time is 295s for 50 loops

with option NUMA disabled in bios:

The program run 64 threads on 64 processors(node 1)
The node 1(64 logic CPUs are at 100%)
The node 0(8 logic CPUs are at 0%)
The execution time is 164s for 50 loops

it's 1.8 times faster with the NUMA option disabled
fxm
Moderator
Posts: 12081
Joined: Apr 22, 2009 12:46
Location: Paris suburbs, FRANCE

Re: Why Not All CPUs Are Used for Multithreading ?

Post by fxm »

It is good when practice meets theory:
64 / 36 = 1.78 times faster
Provoni
Posts: 513
Joined: Jan 05, 2014 12:33
Location: Belgium

Re: Why Not All CPUs Are Used for Multithreading ?

Post by Provoni »

fifoul wrote: Mar 25, 2023 11:38 Thanks fxm for your answer

In the Task Manager/Set Affinity, it only offered me to select either node 0(36 CPUs) OR node 1(36 CPUs), but not node 0 AND node 1 (36+36 CPUs) at the same time.

I found the problem, I disabled the NUMA option in the bios performance menu
Now node 1 is composed of 64 processors and node 0 of 8 processors.

And the program runs 64 threads on 64 processors simultaneously.

I did a little performance test and this is what I get:

with option NUMA enabled in bios:

The program run 64 threads on 36 processors(node 1)
The node 1(36 logic CPUs are at 100%)
The node 0(36 logic CPUs are at 0%)
The execution time is 295s for 50 loops

with option NUMA disabled in bios:

The program run 64 threads on 64 processors(node 1)
The node 1(64 logic CPUs are at 100%)
The node 0(8 logic CPUs are at 0%)
The execution time is 164s for 50 loops

it's 1.8 times faster with the NUMA option disabled
I used a dual Xeon v2 Win 10 Pro workstation a while ago and was able to use all threads with FreeBASIC accross all numa nodes. It should not be restricted to one node.
Post Reply