64-bit inline assembler

General FreeBASIC programming questions.
srvaldez
Posts: 2253
Joined: Sep 25, 2005 21:54

64-bit inline assembler

Postby srvaldez » Sep 27, 2016 16:16

just some observations, feel free to correct my mistakes.
64-bit intel asm works ok if you use -O2 or less, sometimes -O0 may be needed.
here's quick and dirty example

Code: Select all

Sub iPower(Byref result As double, Byref x As double, Byval e As Integer)
    Asm
        mov rax,[e]
        mov rbx,rax
        ipower_absrax:
        neg rax
        js  ipower_absrax
        fld1          '  z=1.0
        fld1
        mov rdx,[x]
        fld qword ptr [rdx] 'load st0 with x
        cmp rax,0     'while e>0
        ipower_while1:
        jle ipower_wend1
        ipower_while2:
        bt rax,0      'test for odd/even
        jc ipower_wend2      'jump if odd
  '                while e is even
        sar rax,1     'rax=rax/2
        fmul st(0),st(0)  'x=x*x
        jmp ipower_while2
        ipower_wend2:
        sub rax,1
        fmul st(1),st(0)  'z=z*x 'st1=st1*st0
        jmp ipower_while1
        ipower_wend1:
        fstp st(0)      'cleanup fpu stack
        fstp st(1)      '"       "   "
        cmp rbx,0     'test to see if e<0
        jge ipower_noinv     'skip reciprocal if not less than 0
  '                if e<0 take reciprocal
        fld1
        fdivrp st(1),st(0)
        ipower_noinv:
        mov rax,[result]
        fstp qword ptr [rax] 'store z (st0)
        fstp st(0) 'clear fpu stack
        fstp st(0) 'clear fpu stack
    End Asm
End Sub

dim as double x, y
x=2
iPower(y,x,3)
print y
iPower(y,x,-3)
print y

geany compile command: fbc -w all "%f" -asm intel -gen gcc -Wc -O2
on Linux if you use -O3 for example, then you may get assembler errors like 'symbol already defined
D.J.Peters
Posts: 8019
Joined: May 28, 2005 3:28
Contact:

Re: 64-bit inline assembler

Postby D.J.Peters » Sep 27, 2016 17:42

Sub iPower(Byref result As double, Byref x As double, Byval e As Integer)

Why do you use the params from slow memory stack frame

Are the parameters not in register on 64-bit ?

the pointer of result is in RCX (byref)
the pointer of x are in RDX (byref)
the value of e is in R8 (byval)

Let me know if i'm wrong ?

Can you post an example how to access local and global vars with 64-bit inline assembler please ?

Joshy
srvaldez
Posts: 2253
Joined: Sep 25, 2005 21:54

Re: 64-bit inline assembler

Postby srvaldez » Sep 27, 2016 18:49

you are probably right about the parameters and registers, am no expert on this, but I don't think you can trust the registers to have the parameters as you would expect unless you compile with -O0 because gcc will more than likely make optimizations, found that out trying to use inline asm on my Mac.
maybe MichaelW will see this thread and post some good examples, he's the expert, but here's a very simple example

Code: Select all

dim shared as short ten=10
function TenPow(Byval x As double) as double
   dim as double y
   'dim as short ten=10
    Asm
        fld qword ptr [x]
        fild word ptr [ten]
        fyl2x
        fld st(0)
        frndint
        fsub st(1), st(0)
        fld1
        fscale
        fxch
        fxch st(2)
        f2xm1
        fld1
        faddp st(1), st(0)
        fmulp st(1), st(0)
        fstp st(1)
        fstp qword ptr [Function]
        'or you could return in y
        'fstp qword ptr [y]
    End Asm
    'return y
End Function

print TenPow(.5)
marcov
Posts: 2839
Joined: Jun 16, 2005 9:45
Location: Eindhoven, NL
Contact:

Re: 64-bit inline assembler

Postby marcov » Sep 27, 2016 18:54

D.J.Peters wrote:Can you post an example how to access local and global vars with 64-bit inline assembler please ?


(you might have to add rip to globals)
srvaldez
Posts: 2253
Joined: Sep 25, 2005 21:54

Re: 64-bit inline assembler

Postby srvaldez » Sep 28, 2016 2:29

here's the first example adapted for the Mac

Code: Select all

Sub iPower(Byref result As double, Byref x As double, Byval e As Integer)
    Asm
        ".intel_syntax noprefix"
        "push rax"
        "push rbx"
        "mov rax,rdx"
        "mov rbx,rax"
        "ipower_absrax:"
        "neg rax"
        "js  ipower_absrax"
        "fld1"          '  z=1.0
        "fld1"
        "fld qword ptr [rsi]" 'load st0 with x
        "cmp rax,0"     'while e>0
        "ipower_while1:"
        "jle ipower_wend1"
        "ipower_while2:"
        "bt rax,0"      'test for odd/even
        "jc ipower_wend2"      'jump if odd
  '                while e is even
        "sar rax,1"     'rax=rax/2
        "fmul st(0),st(0)"  'x=x*x
        "jmp ipower_while2"
        "ipower_wend2:"
        "sub rax,1"
        "fmul st(1),st(0)"  'z=z*x 'st1=st1*st0
        "jmp ipower_while1"
        "ipower_wend1:"
        "fstp st(0)"      'cleanup fpu stack
        "fstp st(1)"      '"       "   "
        "cmp rbx,0"     'test to see if e<0
        "jge ipower_noinv"     'skip reciprocal if not less than 0
  '                if e<0 take reciprocal
        "fld1"
        "fdivrp st(1),st(0)"
        "ipower_noinv:"
        "fstp qword ptr [rdi]" 'store z (st0)
        "fstp st(0)" 'clear fpu stack
        "fstp st(0)" 'clear fpu stack
        "pop rbx"
        "pop rax"
        ".att_syntax prefix"
    End Asm
End Sub

dim as double x, y
x=2
iPower(y,x,3)
print y
iPower(y,x,-3)
print y

geany compile command fbc -w all -asm att "%f" -gen gcc -Wc -O2
note that on the Mac there's no easy way to access FB variables making inline asm impractical.
srvaldez
Posts: 2253
Joined: Sep 25, 2005 21:54

Re: 64-bit inline assembler

Postby srvaldez » Sep 28, 2016 17:19

I know that the Mac is not a supported platform but naked functions fail, for example

Code: Select all

function dbl naked (Byval x As double) as double
    Asm
        "addsd  %xmm0, %xmm0"
        "ret"
    End Asm
End Function

print dbl(5)

relevant asm code

Code: Select all

   .text
   .globl DBL
   DBL:
   addsd  %xmm0, %xmm0
   ret
   
   .text
   .globl _main
_main:
   ...
   call   _DBL

the main program calls a decorated function whereas the naked function was not decorated.
btw, it works ok on windows and linux
MichaelW
Posts: 3500
Joined: May 16, 2006 22:34
Location: USA

Re: 64-bit inline assembler

Postby MichaelW » Oct 02, 2016 11:09

I didn't have time to do everything that I wanted, but I did verify that for 64-bit code, and using compiler Version 1.05.0 (01-31-2016), built for win64 (64bit), naked functions conform to the 64-bit calling convention.

Edit: Added code to do RIP-relative access to shared variables.

Edit2: Sorry, the above was a last minute change, and the shared variables are accessed as direct memory operands where the address of the variable is encoded into the accessing instruction. There are examples of RIP-relative addressing elsewhere in the assembly code output of the compiler. IIRC RIP-relative addressing is preferred because the encoding is smaller.

Code: Select all

''----------------------------------------------------------------------
'' The first four integer/floating-point arguments, taken in left to
'' right order*, should be passed in RCX/XMM0L**, RDX/XMM1L**,
'' R8/XMM2L**, and R9/XMM3L**, with any further arguments, taken in
'' right to left order*, passed on the stack.
''
'' * As they are listed in the function definition or prototype.
''
'' ** The choice of register is determined by the operand type, with
'' the register that does not match the type ignored.
''
'' Scalar values that fit in 64 bits are returned in RAX.
''
'' Floating-point values are returned in XMM0.
''
''----------------------------------------------------------------------

''----------------------------------------------------------------------
'' On entry to our functions, the stack layout is:
''    rsp+48  arg6
''    rsp+40  arg5
''    rsp+32  arg4  spill
''    rsp+24  arg3  spill
''    rsp+16  arg2  spill
''    rsp+8   arg1  spill
''    rsp     return address
''----------------------------------------------------------------------

function Test1 naked ( arg1 as integer, _
                       arg2 as integer, _
                       arg3 as integer, _
                       arg4 as integer, _
                       arg5 as integer, _
                       arg6 as integer ) as integer
    asm
        xor     rax, rax
        add     rax, rcx
        add     rax, rdx
        add     rax, r8
        add     rax, r9
        add     rax, [rsp+40]
        add     rax, [rsp+48]
        ret
    end asm
end function

''----------------------------------------------------------------------

function Test2 naked ( arg1 as double, _
                       arg2 as double, _
                       arg3 as double, _
                       arg4 as double, _
                       arg5 as double, _
                       arg6 as double ) as double
    asm
        addsd     xmm0, xmm1
        addsd     xmm0, xmm2
        addsd     xmm0, xmm3
        addsd     xmm0, [rsp+40]
        addsd     xmm0, [rsp+48]
        ret
    end asm
end function

''----------------------------------------------------------------------

dim shared as integer a = 1, b = 2, c = 3

function Test3 naked ( ) as integer
    asm
        xor       rax, rax
        add       rax, a
        add       rax, b
        add       rax, c
        ret
    end asm
end function

''----------------------------------------------------------------------

print Test1(1,2,3,4,5,6)

print Test2(1,2,3,4,5,6)

print Test3()

sleep


Code: Select all

   .file   "Test.c"
   .intel_syntax noprefix
   .data
   .align 8
A$:
   .quad   1
   .align 8
B$:
   .quad   2
   .align 8
C$:
   .quad   3
/APP
   .text
   .globl TEST1
   TEST1:
   xor     rax, rax
   add     rax, rcx
   add     rax, rdx
   add     rax, r8
   add     rax, r9
   add     rax, [rsp+40]
   add     rax, [rsp+48]
   ret
   .text
   .globl TEST2
   TEST2:
   addsd     xmm0, xmm1
   addsd     xmm0, xmm2
   addsd     xmm0, xmm3
   addsd     xmm0, [rsp+40]
   addsd     xmm0, [rsp+48]
   ret
   .text
   .globl TEST3
   TEST3:
   xor       rax, rax
   add       rax, A$
   add       rax, B$
   add       rax, C$
   ret
   .def   __main;   .scl   2;   .type   32;   .endef
/NO_APP
   .text
   .globl   main
   .def   main;   .scl   2;   .type   32;   .endef
main:
   push   rbp
   mov   rbp, rsp
   sub   rsp, 80
   mov   DWORD PTR 16[rbp], ecx
   mov   QWORD PTR 24[rbp], rdx
   call   __main
   mov   DWORD PTR -28[rbp], 0
   mov   rax, QWORD PTR 24[rbp]
   mov   r8d, 0
   mov   rdx, rax
   mov   ecx, DWORD PTR 16[rbp]
   call   fb_Init
.L2:
   mov   QWORD PTR 40[rsp], 6
   mov   QWORD PTR 32[rsp], 5
   mov   r9d, 4
   mov   r8d, 3
   mov   edx, 2
   mov   ecx, 1
   call   TEST1
   mov   QWORD PTR -8[rbp], rax
   mov   rax, QWORD PTR -8[rbp]
   mov   r8d, 1
   mov   rdx, rax
   mov   ecx, 0
   call   fb_PrintLongint
   movsd   xmm3, QWORD PTR .LC0[rip]
   movsd   xmm2, QWORD PTR .LC1[rip]
   movsd   xmm1, QWORD PTR .LC2[rip]
   movsd   xmm0, QWORD PTR .LC3[rip]
   movsd   QWORD PTR 40[rsp], xmm0
   movsd   xmm0, QWORD PTR .LC4[rip]
   movsd   QWORD PTR 32[rsp], xmm0
   movsd   xmm0, QWORD PTR .LC5[rip]
   call   TEST2
   movq   rax, xmm0
   mov   QWORD PTR -16[rbp], rax
   movsd   xmm0, QWORD PTR -16[rbp]
   mov   r8d, 1
   movapd   xmm1, xmm0
   mov   ecx, 0
   call   fb_PrintDouble
   call   TEST3
   mov   QWORD PTR -24[rbp], rax
   mov   rax, QWORD PTR -24[rbp]
   mov   r8d, 1
   mov   rdx, rax
   mov   ecx, 0
   call   fb_PrintLongint
   mov   ecx, -1
   call   fb_Sleep
.L3:
   mov   ecx, 0
   call   fb_End
   mov   eax, DWORD PTR -28[rbp]
   leave
   ret
   .section .rdata,"dr"
   .align 8
.LC0:
   .long   0
   .long   1074790400
   .align 8
.LC1:
   .long   0
   .long   1074266112
   .align 8
.LC2:
   .long   0
   .long   1073741824
   .align 8
.LC3:
   .long   0
   .long   1075314688
   .align 8
.LC4:
   .long   0
   .long   1075052544
   .align 8
.LC5:
   .long   0
   .long   1072693248
   .ident   "GCC: (x86_64-win32-sjlj-rev0, Built by MinGW-W64 project) 5.2.0"
   .def   fb_Init;   .scl   2;   .type   32;   .endef
   .def   TEST1;   .scl   2;   .type   32;   .endef
   .def   fb_PrintLongint;   .scl   2;   .type   32;   .endef
   .def   TEST2;   .scl   2;   .type   32;   .endef
   .def   fb_PrintDouble;   .scl   2;   .type   32;   .endef
   .def   TEST3;   .scl   2;   .type   32;   .endef
   .def   fb_Sleep;   .scl   2;   .type   32;   .endef
   .def   fb_End;   .scl   2;   .type   32;   .endef


Regarding the problem with code that runs OK with no compiler optimization, but fails with optimization, within my experience the problem is usually a failure to follow the calling convention. For example, I recently created a set of 64-bit clock-cycle count macros for GCC that use inline assembly. As is the norm for cycle-count code, the macros use CPUID as a "serializing" instruction. One unfortunate side effect of CPUID is that it modifies the EBX component of the callee-save register RBX. Since preserving RBX around the CPUID instruction would place a POP RBX instruction after the CPUID instruction, "polluting" the cycle count somewhat, I avoided preserving RBX. The code worked fine with no compiler optimizations, but with any level of optimization, it would trigger exceptions, apparently because the optimized code depended on RBX being preserved, as per the calling convention. While compiling with no optimization would correct the immediate problem, it is not overly practical because code compiled with no optimization is effectively optimized for debugging, and generally executes much, much slower than optimized code.

There is a Microsoft calling-convention reference here, and a more compact one here.
MichaelW
Posts: 3500
Joined: May 16, 2006 22:34
Location: USA

Re: 64-bit inline assembler

Postby MichaelW » Oct 19, 2016 13:47

On my Windows 10 notebook I can compile either of the apps with -gen gcc and -O 3 and they run with no problems.
srvaldez
Posts: 2253
Joined: Sep 25, 2005 21:54

Re: 64-bit inline assembler

Postby srvaldez » Oct 19, 2016 15:25

hi MichaelW
you are probably right about the problem of gcc optimization being that of not properly following the calling convention, however I did a small test on my Mac where the parameters of a function were not used except in the inline asm portion and it failed when optimized,
the test simply copied the value of the first byref parameter to the second byref parameter.
MichaelW
Posts: 3500
Joined: May 16, 2006 22:34
Location: USA

Re: 64-bit inline assembler

Postby MichaelW » Oct 19, 2016 19:11

This is the first example, with minimal corrections to handle the parameters and return value as per the calling convention, but more changes will be needed to fully conform, because per the calling convention "All floating point operations are done using the 16 XMM registers."

Code: Select all

function iPower naked ( Byval x As double, Byval e As Integer) as double
    Asm
        push    rbx         '' preserve non-volatile rbx
        '''mov rax,[e]
        mov rax, rdx
        mov rbx, rax
    ipower_absrax:
        neg rax
        js ipower_absrax
        fld1 '  z=1.0
        fld1
        '''mov rdx,[x]
        movq rdx, xmm0
        push rdx
        fld qword ptr [rsp] 'load st0 with x
        pop rdx
        cmp rax,0           'while e>0
    ipower_while1:
        jle ipower_wend1
    ipower_while2:
        bt rax,0            'test for odd/even
        jc ipower_wend2     'jump if odd
                            'while e is even
        sar rax,1           'rax=rax/2
        fmul st(0),st(0)    'x=x*x
        jmp ipower_while2
    ipower_wend2:
        sub rax,1
        fmul st(1),st(0)    'z=z*x 'st1=st1*st0
        jmp ipower_while1
    ipower_wend1:
        fstp st(0)          'cleanup fpu stack
        fstp st(1)          '"       "   "
        cmp rbx,0           'test to see if e<0
        jge ipower_noinv    'skip reciprocal if not less than 0
                            'if e<0 take reciprocal
        fld1
        fdivrp st(1),st(0)
    ipower_noinv:
        '''mov rax,[result]
        ''sub     rsp, 16      '' allocate buffer from stack
        ''                     '' maintaining 16-byte alignment   
        sub     rsp, 8      '' allocate buffer from stack
        '''fstp qword ptr [rax] 'store z (st0)
        fstp qword ptr [rsp] '' store z to buffer
        movq    xmm0, [rsp]  '' store buffer in return register
        add     rsp, 8      '' free buffer
        fstp st(0)          'clear fpu stack
        fstp st(0)          'clear fpu stack
        pop     rbx         '' recover non-volatile rbx
        ret
    End Asm
End function

dim as double x, y
x=2
print iPower(x,3)
''print y
print iPower(x,-3)
''print y
sleep
dim as double x, y
x=2
print iPower(x,3)
''print y
print iPower(x,-3)
''print y
sleep

Code: Select all

 8
 0.125


Edit: I'm not sure the above code is handling the stack correctly, even though the app runs OK even with -O 3. I need to determine if pushing/popping a 64-bit register changes the stack pointer by 8 bytes or 16 bytes.

Per Agner Fog's calling_conventions.pdf, available here, the stack word size is 8 bytes, but the stack must be aligned by 16 before any call instruction. So for a function that does not contain any call instructions, maintaining an 8-byte alignment is apparently sufficient, so I modified the above code to do just that.
TeeEmCee
Posts: 268
Joined: Jul 22, 2006 0:54
Location: Auckland

Re: 64-bit inline assembler

Postby TeeEmCee » Oct 22, 2016 16:38

srvaldez wrote:I know that the Mac is not a supported platform but naked functions fail...
the main program calls a decorated function whereas the naked function was not decorated.
btw, it works ok on windows and linux

I noticed that bug and and a pile of other OSX ones and fixed it, but I haven't submitted a pull request yet. You can try it though.
I spent several days trying to get -gen gas to work on OSX. Well, I got it working fine... unfortunately you can't actually use it, because Apple gas is broken. It has a major bug where if you ever refer to the same label twice in intel-syntax code or do a backwards jmp/call, it gives the error:

Code: Select all

fb_naked_asm.asm:33:suffix or operands invalid for `call'

This bug has been known for two decades, but Apple don't care about such things, and their rate of development for these core utilities is <1% of GNU's binutils anyway. I tried to fix it myself, but the gas source is the stuff of nightmares. I gave up on even getting FSF gas to compile after a few hours and can't tell if it even properly supports Mach-O, but it didn't a few years ago. I also tried LLVM's assembler, but it turns out its support for intel syntax is utterly broken... they did fix the most serious bug a few days ago though; haven't tried it since. There are no other assemblers supporting intel syntax for mach-o, unless you want to produce an ELF object file and convert to mach-o with objconv.

srvaldez wrote:here's the first example adapted for the Mac

Hey wait... are you saying that that code works for you? It doesn't assemble for me:

Code: Select all

fb_asm.c:26:suffix or operands invalid for `js'
fb_asm.c:38:suffix or operands invalid for `jmp'
fb_asm.c:42:suffix or operands invalid for `jmp'

That is, it hits the Apple gas bug I just mentioned. So you seem to have a working assembler, which I tried so hard and failed to find!
Can you please tell me where you got your build system (XCode, macports, homebrew?) and its version, and the OSX and gas versions (as -version)?

Edit: I'm not sure the above code is handling the stack correctly, even though the app runs OK even with -O 3. I need to determine if pushing/popping a 64-bit register changes the stack pointer by 8 bytes or 16 bytes.

Maybe you are referring to the existence of x86 instructions that push/pop a 16bit value on the stack to a 32 bit register and vice versa. There are no other mismatched-size push/pop instructions for other bitwidths.
srvaldez
Posts: 2253
Joined: Sep 25, 2005 21:54

Re: 64-bit inline assembler

Postby srvaldez » Oct 22, 2016 17:09

TeeEmCee wrote:
srvaldez wrote:here's the first example adapted for the Mac

Hey wait... are you saying that that code works for you? It doesn't assemble for me:

Code: Select all

fb_asm.c:26:suffix or operands invalid for `js'
fb_asm.c:38:suffix or operands invalid for `jmp'
fb_asm.c:42:suffix or operands invalid for `jmp'

That is, it hits the Apple gas bug I just mentioned. So you seem to have a working assembler, which I tried so hard and failed to find!
Can you please tell me where you got your build system (XCode, macports, homebrew?) and its version, and the OSX and gas versions (as -version)?

hello TeeEmCee :-)
I have Xcode 8.0.0 with the accompanying command line tools but it also worked with 7.3.0 version,
as --version
Apple LLVM version 8.0.0 (clang-800.0.38)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

what version of OS X are you using? I am using El Capitan, btw it's good to see you are interested in FB on the Mac.
<edit> I started by using a Mac version of FB built by venom viewtopic.php?f=17&t=24027 but have since compiled and use the latest FB git repo
venom's link is not working any more but I uploaded it here in case some wants it FreeBASIC-1.04.0-darwin-x86_64
<edit 2> my compile command for geany on my Mac is: fbc -w all -asm att -gen gcc -Wc -O2 "%f"
TeeEmCee
Posts: 268
Joined: Jul 22, 2006 0:54
Location: Auckland

Re: 64-bit inline assembler

Postby TeeEmCee » Oct 23, 2016 7:00

Wow! I didn't realise that Apple switched to LLVM's assembler. That's surprising, because that assembler is a very recent and buggy project (unlike the GCC toolchain, an assembler is NOT used by clang, except partially to parse inline asm blocks), and it didn't even seem to be attempting to be compatible with gas. In fact, it was originally called 'mc' instead of 'as'. I had quite a lot of trouble just finding the right commandline args to invoke it.

I forgot that Apple bizarrely patched the LLVM tools to report the XCode version number (8.0.0 in your case) instead of the real version number. I looked it up and found that XCode 7.3 ships LLVM 3.8.0.

I have llvm-as 3.8.1 on my gnu/linux machine, and llvm-as 3.7.1 on my mac, and neither can be used to replace gas. But commandline args and assembler directives are quite different between OSX and other Unix anyway (because OSX uses a 30 year old fork of GNU binutils), so maybe llvm-as 3.8 only works as a replacement for gas on OSX.
I'm using OSX 10.8.

BTW, here is the llvm-as bug I mentioned which was fixed 2 weeks ago. 'push' on x86 in intel syntax is miscompiled. It works fine on x86_64.
srvaldez
Posts: 2253
Joined: Sep 25, 2005 21:54

Re: 64-bit inline assembler

Postby srvaldez » Oct 23, 2016 15:10

hello TeeEmCee
I program occasionally as a hobby and my skills are beginner to maybe intermediate, does your fb fork include all your fixes?
also how do you compile fb?
I have been compiling fb like this
make FBFLAGS="-asm att" ENABLE_XQUARTZ=1 all
TeeEmCee
Posts: 268
Joined: Jul 22, 2006 0:54
Location: Auckland

Re: 64-bit inline assembler

Postby TeeEmCee » Oct 24, 2016 9:06

Yes, that branch has all my darwin-related work. It defaults to -asm att and -gen gcc, but of course if you compile it with fbc 1.05 then you will need to specify those manually. I have never tried ENABLE_XQUARTZ=1.

The things I need to do before getting it merged into the trunk are ensuring that the use of -macosx_version_min=10.4 is correct in general, finish translating the crt headers, and update changelog.txt.

Return to “General”

Who is online

Users browsing this forum: No registered users and 2 guests