64-bit op codes in 32-bit OS

New to FreeBASIC? Post your questions here.
fxm
Moderator
Posts: 12081
Joined: Apr 22, 2009 12:46
Location: Paris suburbs, FRANCE

Re: 64-bit op codes in 32-bit OS

Post by fxm »

@Jeff,
fzabkar wrote: Jun 13, 2022 6:04 ....
It appears that "qword" is an illegal variable name, probably because it is reserved.
SARG
Posts: 1755
Joined: May 27, 2005 7:15
Location: FRANCE

Re: 64-bit op codes in 32-bit OS

Post by SARG »

In fact not really a problem of reserved/illegal word for Freebasic but rather for the assembler.

'qword = &H8877665544332211' is correctly compiled and executed but

'movq mm0, [qword] ' is compiled like that : 'movq mm0,QWORD PTR ds:0x8'

So obviously when executed a problem as the address is not accessible.
srvaldez
Posts: 3373
Joined: Sep 25, 2005 21:54

Re: 64-bit op codes in 32-bit OS

Post by srvaldez »

@fzabkar
I would add emms after storing the result to Function, emms is necessary when using the mmx registers because the mmx registers are mapped to the FPU registers and if don't use emms after using the mmx registers it leaves the FPU registers in a mess
see https://www.felixcloutier.com/x86/emms
fzabkar
Posts: 154
Joined: Sep 29, 2018 2:52
Location: Australia

Re: 64-bit op codes in 32-bit OS

Post by fzabkar »

srvaldez wrote: Jun 13, 2022 15:56 @fzabkar
I would add emms after storing the result to Function, ...
I guess that means that there would be no real speed benefit. Well, at least I learned something.
TeeEmCee
Posts: 375
Joined: Jul 22, 2006 0:54
Location: Auckland

Re: 64-bit op codes in 32-bit OS

Post by TeeEmCee »

I'm no SIMD expert but AFAIK there's no good reason to use MMX when SSE2+ is available (or some higher version, SSE3 maybe?).

Use the movsd instruction (move scalar double-precision with zero-extension, meaning move a single 64-bit value into the lower half of a register). It works with unaligned memory.

Code: Select all

Function BytSwap8( Byval int64 As uLongInt ) As uLongInt

    Dim Swap8Mask As Const uLongInt = &H0607040502030001
    
    ASM
        movsd xmm1, [Swap8Mask]
        movsd xmm0, [int64]
        pshufb xmm0, xmm1
        movsd [Function], xmm0
    End ASM

End Function
fzabkar
Posts: 154
Joined: Sep 29, 2018 2:52
Location: Australia

Re: 64-bit op codes in 32-bit OS

Post by fzabkar »

Thanks, but the SSE2+ code takes twice as long to execute as my original 32-bit ASM code (on my Core 2 Duo).
TeeEmCee
Posts: 375
Joined: Jul 22, 2006 0:54
Location: Auckland

Re: 64-bit op codes in 32-bit OS

Post by TeeEmCee »

Yes it's very slow (about 5x slower for me), apparently because of the unaligned memory accesses. Unfortunately FB doesn't have a way to set the alignment of variables.

I thought you were just interested, and didn't really care about speed since you said all the data is streaming from disk. I don't think you'll beat the version with bswap for speed. In fact if I comment out the bswaps it only speeds up negligibly, I can't even reliably measure the difference. All time is spent on the overhead of calling a function, including moving arguments and results to/from the stack.
SARG
Posts: 1755
Joined: May 27, 2005 7:15
Location: FRANCE

Re: 64-bit op codes in 32-bit OS

Post by SARG »

When testing the speed of the 2 methods I confirm that srvarldez is totally right, not using emms can cause weird issue.
Test with this code and see the value of tt printed the second time

Code: Select all

Function BytSwap8_mm( Byval int64 As uLongInt ) As uLongInt
Dim Swap8Mask As Const uLongInt = &H0001020304050607
ASM
    movq mm1, [Swap8Mask]
    movq mm0, [int64]       
    pshufb mm0, mm1	        ' Swap8Mask
    movq [Function], mm0
	'emms
End ASM
End Function
dim as double tt=timer
print tt
BytSwap8_mm(&H8877665544332211)
print tt

Sleep
I tested the speed mm (with emms on) vs xmm :
gas32 bit --> mm = 10.56s / xmm = 8.29s
gas 64 bit --> mm = 3.53s / xmm = 1.98s

Code: Select all

Function BytSwap8_xmm( Byval int64 As uLongInt ) As uLongInt

Dim Swap8Mask As Const uLongInt = &H0001020304050607

ASM
        movsd xmm1, [Swap8Mask]
        movsd xmm0, [int64]
        pshufb xmm0, xmm1
        movsd [Function], xmm0
    End ASM

End Function

Function BytSwap8_mm( Byval int64 As uLongInt ) As uLongInt

Dim Swap8Mask As Const uLongInt = &H0001020304050607
ASM
    movq mm1, [Swap8Mask]
    movq mm0, [int64]       
    pshufb mm0, mm1
    movq [Function], mm0
	emms
End ASM
End Function
dim as double tt=timer

for i as integer =1 to 500000000
	BytSwap8_mm(&H8877665544332211)
Next
print timer-tt
tt=timer
for i as integer =1 to 500000000
	BytSwap8_xmm(&H8877665544332211)
Next
print timer-tt
Sleep
marcov
Posts: 3454
Joined: Jun 16, 2005 9:45
Location: Netherlands
Contact:

Re: 64-bit op codes in 32-bit OS

Post by marcov »

Afaik MM and xmm (SSE2) registers don't overlay the coprocessor.

Only MMX and 3DNOW did that. I'm not 100% sure about SSE-1 (mm* registers without x), it was before I got interested in SIMD, but iirc they are also separate (I find webpages about increasing speed using copro and SSE1 interleaved).

The unaligned penalties should mostly go away for CPUs released after 2010. Haswell (4th generation) and later however suffer from a shuffle bottleneck. Ivy bridge can emit more 128-bit shuffles than Haswell (but Haswell can do AVX256, with ymm)

Note also that XMM can do two swaps per pshufb. Coding a whole loop in assembler (giving it a pointer and a count) is advisable for bulk endian swapping.
TeeEmCee
Posts: 375
Joined: Jul 22, 2006 0:54
Location: Auckland

Re: 64-bit op codes in 32-bit OS

Post by TeeEmCee »

Yes, SSE and SSE2 registers are the same (xmm*), they don't and never did overlay the x87 registers. In hindsight overlaying those registers is regarded as a really dumb idea that resulted in the obsolescence of MMX and its replacement with SSE. I believe the reason they did that was not to save transistors but so that MMX could be used without requiring an OS update (to save/restore the MMX registers when switching processes).
Post Reply