Creating functions in assembly - Part I

UPDATE 1: Part II is ready.

In my current job I often need to write instructions in assembly. It's usually an one-line instruction so I simply embed it in C using inline asm. However, sometimes I need to write more than an one-line instruction and inline asm starts to get confused to me, in that case I prefer to write them in assembly directly.

The point is that I still want C for 2 reasons: (1) I'm not a proficient assembly programmer and (2) C is a way more productive. So, why not writing just a function in assembly and call it in C?

What is a function?

Function is a sequence of instructions grouped in a logical sense to perform a particular task. One could create a function that expects two inputs and gives one output based on those inputs. Then, instead of replicating the same code over and over, the programmer simply calls that function whenever needed.

However, functions are abstractions implemented by the programming language. The code generated by compilers makes use of labels and jumps. It means that there is no such thing as "function" in assembly, so need to create a label, put instructions under it, and make that label accessible from the outside world:

cat function.s
.align 2
.type my_function,@function;
.globl my_function;
my_function:
    blr

  • .align 2 - this section is 2-byte aligned.
  • .type my_function,@function - meta-information, useful for debugging but not required
  • .globl my_funcion - make "my_function" available from outside this function.s unit
  • my_function: - the label
  • blr - is a PowerPC instruction that branches unconditionally to the address stored in the PowerPC Link Register (before calling a function, the caller saves its address in the Link Register).

Fun fact: there's no type or signature checking, you can declare "my_function" to be whatever you want. If the link editor finds the object it's all good.

$ cat function.c
#include <stdio.h>

/*
 * I'm telling to C compiler that my_function expects an integer and returns another integer,
 * but I could have declared it like char my_function(double a, int b, char c) or whatever.
 */
extern int my_function(int param);

int main(void)
{
    int i = my_function(5);
    printf("%d\n", i);
    return 0;
}
$ gcc -g3 function.s function.c -o function
$ ./function
5

Isn't it cool? I just write a function in assembly that does nothing more than return, anyway it returns the same value used as parameter. Magic? No, and we will see why.

Application Binary Interface - ABI

Unfortunately it's not as easy as I told you before. Actually, to write a compliant C function in assembly we need to respect some rules (the same rules that the C compiler respects when it generates the binary code). These rules are defined by the ABI, or Application Binary Interface, that depends on the target architecture (off course, it's assembly! :-).

Suppose I have got a new toy CPU, a simple one with 3 registers and some KB of memory and now I want to port my own language for it. So I get the ABI document from the company's website and read the section about function calls:

  • REG1 register - volatile: store the result value
  • REG2 register - volatile: general purpose
  • REG3 register - non-volatile: store the caller address
  • Parameters must be passed through the stack

After reading the document I wrote my compiler backend that translate this code:

fn mult(var a, var b)
{
    return a * b;
}

fn main()
{
    var x = mult(5, 3);
    print(x);
    print(mult(x, 2));
}

...to this one: This image is interactive, use "n" or "swipe left", "p" of "swipe right" to debug the code.

This should illustrates how the assembly code follows the simple toy ABI above and, by this way, my compiler can generates binary code compatible with the OS and other libraries in the same system. In part II, I'll use a real ABI to implement the function in assembly fully compatible with Linux in PPC64.

Hope you like it. Thank you! :-)