Sorry, I've done BASIC and ASM , but no C ... (ASM for the stuff not available in BASIC in the "old times").schooner wrote:How would you translate this C function for BitScanForward?
I can hardly read the stuff you are referring to.
Sorry, I've done BASIC and ASM , but no C ... (ASM for the stuff not available in BASIC in the "old times").schooner wrote:How would you translate this C function for BitScanForward?
Maybe what I'll say is dumb, perdon me if so. But if you compile your code to asm, wouldn't you get some hint? Or does it go in an opposite way?schooner wrote:How would you translate this C function for BitScanForward?
Code: Select all
#include "windows.bi"
extern "C"
function BSF(byval Index as DWORD ptr, byval Mask as DWORD64) as BOOLEAN
dim t as LONG64
asm
bsf rax, qword ptr [Mask]
mov qword ptr [t], rax
end asm
*Index = t
return Mask <> 0
end function
end extern
Code: Select all
Function BitScanForward64 cdecl (ByVal num as ulongint) as ulongint
asm
bsfq rax,[num]
mov [Function],rax
end asm
end function
Function BitScanReverse64 cdecl (ByVal num as ulongint) as ulongint
asm
bsrq rax,[num]
mov [Function],rax
end asm
end function
print BitScanForward64(&b100ull)
print BitScanForward64(&b100000000000000000000000000000000000000ull)
print BitScanReverse64(&b100ull)
print BitScanReverse64(&b100000000000000000000000000000000000000ull)
Code: Select all
Function BitScanForward64(ByVal num as ulongint) as ulongint
asm
bsfq rax,[num]
mov [Function],rax
end asm
end function
Function BitScanReverse64(ByVal num as ulongint) as ulongint
asm
bsrq rax,[num]
mov [Function],rax
end asm
end function
print BitScanForward64(&h8000000000000000)
print BitScanReverse64(&h8000000000000000)
print BitScanForward64(&h1000000000000000)
print BitScanReverse64(&h1000000000000000)
print BitScanForward64(&h0000000000001000)
print BitScanReverse64(&h0000000000001000)
print BitScanForward64(&h000000000000FFFF)
print BitScanReverse64(&h000000000000FFFF)
sleep
Forward/reverse report the same position if there is only 1bit set (in the whole ULongInt).63
63
60
60
12
12
0
15
Code: Select all
''------------------------------------------------------------------------------
'' POPCNT instruction supported if for CPUID Function 1, bit 23 of ECX is set.
'' Note that CPUID will trash the callee-save register RBX.
''------------------------------------------------------------------------------
function HavePopCnt naked() as integer
asm
mov eax, 1
push rbx
cpuid
pop rbx
xor rax, rax
bt ecx, 23
jnc 0f
mov rax, -1
0:
ret
end asm
end function
''------------------------------------------------------------
'' This code adapted from the Wikipedia Hamming_weight page:
''------------------------------------------------------------
#define m1 &h5555555555555555
#define m2 &h3333333333333333
#define m4 &h0f0f0f0f0f0f0f0f
#define h01 &h0101010101010101
function PopCount(byval x as uinteger) as integer
''---------------------------------------------------------------
'' This uses fewer arithmetic operations than any other known
'' implementation on machines with fast multiplication.
'' It uses 12 arithmetic operations, one of which is a multiply.
''---------------------------------------------------------------
x -= (x shr 1) and m1 '' put count of each 2 bits into those 2 bits
x = (x and m2) + ((x shr 2) and m2) '' put count of each 4 bits into those 4 bits
x = (x + (x shr 4)) and m4 '' put count of each 8 bits into those 8 bits
return ((x * h01) shr 56) '' returns left 8 bits of x+(x<<8)+(x<<16)+(x<<24)+...
end function
''------------------------------------------------------------------------------
'' In case it's not apparent how a PopCnt function can be implemented with just
'' two instructions:
''
'' For the X64 calling convention the first four non-floating-point arguments
'' are passed in RCX, RDX, R8, and R9, and 64-bit integers are returned in RAX.
''------------------------------------------------------------------------------
function PopCnt naked (byval x as uinteger) as integer
asm
popcnt rax, rcx
ret
end asm
end function
dim as double t
dim as integer res1, res2, res3
for i as integer = 1 to 10
print PopCount(i);chr(9);
next
print
if HavePopCnt() then
for i as integer = 1 to 10
print PopCnt(i);chr(9);
next
end if
print
#define COUNT 100000000
t = timer
for i as integer = 1 to COUNT
PopCount(i)
next
print "PopCount (no return) :";timer-t
t = timer
for i as integer = 1 to COUNT
res1 += PopCount(i)
next
print "PopCount (return to sum) :";timer-t
t = timer
for i as integer = 1 to COUNT
res2 += PopCount(i)
next
print "PopCount (return to sum and display):";timer-t
print res2
if HavePopCnt() then
t = timer
for i as integer = 1 to COUNT
PopCnt(i)
next
print "PopCnt (no return) :";timer-t
t = timer
for i as integer = 1 to COUNT
res2 = PopCnt(i)
next
print "PopCnt (return) :";timer-t
end if
sleep
Code: Select all
1 1 2 1 2 2 3 1 2 2
1 1 2 1 2 2 3 1 2 2
PopCount (no return) : 1.697277971194126
PopCount (return to sum) : 2.032488533179276
PopCount (return to sum and display): 2.021490770974197
1314447116
PopCnt (no return) : 0.3232392119243741
PopCnt (return) : 0.4673032796708867
My quick search shows the POPCNT instruction being introduced in the 2007/2008 range (Intel Nehalem, AMD Barcelona) and X64 in the 2003/2004 range (AMD Opteron, Intel P4 Prescott).schooner wrote:The popcnt() example compiles okay, and without any special _FB_SSE directives. Do you think a CUPID test is necessary for 64-bit compilation? Wouldn't a 64-bit processor already have a resident popcnt() instruction?
It is specified in the X64 calling convention, see the comments I added some time after I posted the code (and after your reply).Also, how did you know that the "x as uinteger" would end up in the rcx register? Is this true for all naked functions? It is probably safe since the rcx register is considered destructible.
Code: Select all
#include "win/intrin.bi"
dim as ulong idx
dim index as ulong ptr = @idx
print _BitScanForward64(index, &h8000000000000000),idx
print _BitScanReverse64(index, &h8000000000000000),idx
print _BitScanForward64(index, &h0000000000001000),idx
print _BitScanReverse64(index, &h0000000000001000),idx
print _BitScanForward64(index, &h000000000000FFFF),idx
print _BitScanReverse64(index, &h000000000000FFFF),idx
print _BitScanForward64(index, 1),idx
print _BitScanReverse64(index, 0),idx
1 63
1 63
1 12
1 12
1 0
1 15
1 0
0 1244728
Code: Select all
''------------------------------------------------------------------------------
'' POPCNT instruction supported if for CPUID Function 1, bit 23 of ECX is set.
'' Note that CPUID will trash the callee-save register RBX.
''------------------------------------------------------------------------------
function HavePopCnt naked() as integer
asm
mov eax, 1
push rbx
cpuid
pop rbx
xor rax, rax
bt ecx, 23
jnc 0f
mov rax, -1
0:
ret
end asm
end function
dim as integer SSE = 0
SSE = HavePopCnt()
if SSE = 1 then
print "POPCNT supported ";SSE
else
print "POPCNT not supported ";SSE
end -1
end if
It should return -1 if POPCNT is supported, otherwise 0. Per the Intel instruction set reference available here, for CPUID function 1 (EAX=1), support for the POPCNT instruction is indicated by bit 23 of the feature information returned in ECX. If the value of bit 23 is 1, POPCNT is supported; if the value of bit 23 is 0, POPCNT is not supported. BT returns the value of the indexed bit in the carry flag, so the jnc performs the jump when the value of the indexed bit is 0. And in case this is not obvious, XOR RAX, RAX is a compact (as in minimum instruction length) method of loading zero into rax.schooner wrote: I ran the above code on different machines, but it returns -1 no matter what hardware it is tested with. What value should it return?
Thank you, that makes some sense. The -1 result is also similar to the FreeBasic Bit() syntax which also returns -1 if the result is positive. What is the reasoning for the -1?MichaelW wrote:It should return -1 if POPCNT is supported, otherwise 0. Per the Intel instruction set reference available here, for CPUID function 1 (EAX=1), support for the POPCNT instruction is indicated by bit 23 of the feature information returned in ECX. If the value of bit 23 is 1, POPCNT is supported; if the value of bit 23 is 0, POPCNT is not supported. BT returns the value of the indexed bit in the carry flag, so the jnc performs the jump when the value of the indexed bit is 0. And in case this is not obvious, XOR RAX, RAX is a compact (as in minimum instruction length) method of loading zero into rax.schooner wrote: I ran the above code on different machines, but it returns -1 no matter what hardware it is tested with. What value should it return?
-1 = TRUE -- 0 = FALSE (definition of FB-Boolean var.) as from version >= 1.04.0schooner wrote:What is the reasoning for the -1?