Class Inheritance in C

2014 - Dec - 14

C language is not objected oriented, bla bla bla. Right, ok, let's see how to implement class inheritance in C.

The Python Way

Python is classy. It's easily possible to write a whole class hierarchy in Python but how the interpreter implements that?

PyObject is the base class in Python, it's the mother of everybody else. In C, it's a struct with some fields #defined HEAD, which are inhered by any of its subclass.

// http://svn.python.org/view/python/trunk/Include/object.h
...
#define _PyObject_HEAD_EXTRA
#define _PyObject_EXTRA_INIT
...

#define PyObject_HEAD                   \
    _PyObject_HEAD_EXTRA                \
    Py_ssize_t ob_refcnt;               \
    struct _typeobject *ob_type;
 
#define PyObject_VAR_HEAD \
    PyObject_HEAD \
    Py_ssize_t ob_size; /* Number of items in variable part */
 
typedef struct _object {
    PyObject_HEAD
} PyObject;

See below the children of PyObject: PyStringObject, PyFloatObject, and PyListObject. But how do I know that they're children of PyObject? hmm, it'll be clearer later but take a look at the structs, all have PyObject_HEAD or PyObject_VAR_HEAD (which also starts with PyObject_HEAD, see above) as the FIRST field.

// http://svn.python.org/view/python/trunk/Include/stringobject.h
typedef struct {
    PyObject_VAR_HEAD
    long ob_shash;
    int ob_sstate;
    char ob_sval[1];
} PyStringObject;
 
// http://svn.python.org/view/python/trunk/Include/floatobject.h
typedef struct {
    PyObject_HEAD
    double ob_fval;
} PyFloatObject;
 
// http://svn.python.org/view/python/trunk/Include/listobject.h
typedef struct {
    PyObject_VAR_HEAD
    PyObject **ob_item;
    Py_ssize_t allocated;
} PyListObject;

A naive class diagram could be made from those structs:

PyObject object diagram

Casting

Let's the magic begins. In the code below, forget about everything and focus on "return ((PyListObject*)op)->ob_item[i];".

Do you know type casting? In a nutshell it's how programmers tell the compiler they are geniuses and can be trust when they ask to handle a data type as another data type. I heard some people calling it "russian roulette".

// http://svn.python.org/view/python/trunk/Objects/listobject.c?view=markup
PyObject *PyList_GetItem(PyObject *op, Py_ssize_t i)
{
    if (!PyList_Check(op)) {
        PyErr_BadInternalCall();
        return NULL;
    }

    if (i < 0 || i >= Py_SIZE(op)) {
        if (indexerr == NULL) {
            indexerr = PyString_FromString("list index out of range");
            if (indexerr == NULL)
                return NULL;
        }
        PyErr_SetObject(PyExc_IndexError, indexerr);
        return NULL;
    }
    return ((PyListObject *)op)->ob_item[i];
}

So, ((PyListObject*)op) casts op object pointer (PyObject*) to a PyListObject* (note the PyList_Check(op) as a sanity check). Then it accesses ob_item field (PyObject doesn't have ob_item but PyListObject does, as you told your compiler that you're the genius it'll let you do that).

Finally, whatever is stored at position i will be returned to the caller, and it will be returned as PyObject because the function requires and C performs an implicit conversion to attend such requirement.

Aaaarrrghhh, but those structs (PyObject vs PyList) are completely different! Why didn't that break the program?

Because it's safe and well defined in the C99 standard.. REPLAY: C99!

Within a structure object, the non-bit-field members and the units in which bit-fields reside have addresses that increase in the order in which they are declared. A pointer to a structure object, suitably cast, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa. There may therefore be unnamed holes within a structure object, but not at its beginning, as necessary to achieve the appropriate alignment.

In a nutshell, it means that a pointer to any struct always point to its first element (declaration order), so you can cast it to any other struct if (and only if) its first field have the same type. Cool!

Show me the Code

Based on a sample C code, let's go through the not-optimized code generated by GCC to see how it's implemented. It's just a proof of concept.

#include <stdlib.h>
#include <stdio.h>

#define HEAD int type;

enum { INTEGER, FLOAT };

typedef struct
{
    HEAD
} Number;

typedef struct
{
    HEAD
    int val;
} Integer;

typedef struct
{
    HEAD
    float val;
} Float;

int main()
{
    Integer i;
    i.type = INTEGER;
    i.val  = 42;

    Float f;
    f.type = FLOAT;
    f.val  = 36.85;

    Number **numbers = (Number**)malloc(sizeof(Number*) * 2);

    numbers[0] = (Number*)&i;
    numbers[1] = (Number*)&f;

    // for debugging purpose ↓
    int y = ((Integer*)numbers[0])->val;
    
    int x = numbers[0]->type;
    x = numbers[1]->type;
    // for debugging purpose ↑

    for (unsigned int count = 0; count < 2; ++count)
    {
        switch (numbers[count]->type)
        {
            case INTEGER:
                printf("%d - Integer: %d\n", 
                        count, 
                        ((Integer*)numbers[count])->val);
                break;

            case FLOAT:
                printf("%d - Float: %.2f\n", 
                        count, 
                        ((Float*)numbers[count])->val);
                break;

            default:
                break;
        }
    }

    free(numbers);

    return 0;
}

Compiling:
gcc -g -std=c99 code.c -o codebin

Running:
./codebin
0 - Integer: 42
1 - Float: 36.85

Good, let's see the internals.

Number **numbers = (Number**)malloc(sizeof(Number*) * 2);

0x4005e3 <main+38>      mov    $0x10,%edi
0x4005e8 <main+43>      callq  0x4004c0 <malloc@plt>
0x4005ed <main+48>      mov    %rax,-0x8(%rbp)

(gdb) info registers rax
rax            0x602010 6299664

(gdb) x/wx $rbp-0x8
0x7fffffffe3c8: 0x00602010

(gdb) x/wx 0x00602010
0x602010:       0x00000000

Malloc reads the size from EDI register and stores the memory address in RAX. That address is copied to RBP-0x8, the place of our variable numbers in the stack.

malloc instructions

numbers[0] = (Number*)&i;

0x4005f1 <main+52>      mov    -0x8(%rbp),%rax
0x4005f5 <main+56>      lea    -0x20(%rbp),%rdx
0x4005f9 <main+60>      mov    %rdx,(%rax)
(gdb) x/wx $rbp-0x20
0x7fffffffe3b0: 0x00000000

(gdb) x/2wx 0x00602010
0x602010:       0xffffe3b0      0x00007fff

RBP-0x8 (as known as "numbers") is copied to RAX, the address of RBP-0x20 (the address of struct "i") is copied in RDX, and finally the content of RDX (which is the address of "i") is stored in the heap. The instruction mov %rdx,(%rax) copies the RDX content to the place where RAX content is pointing to, which is the memory allocated.

Note that RBP-0x20 is the place of of Integer first element.

assembly instructions

Important: the address of a local variable, in the stack, was stored in the heap, fortunately we are running everything in main() but it could be harmful it was a function, for instance. Read about stack frames.

int y = ((Integer*)numbers[0])->val;

0x40060b <main+78>      mov    -0x8(%rbp),%rax
(gdb) info registers rax
rax            0x602010 6299664

0x40060f <main+82>      mov    (%rax),%rax
(gdb) info registers rax
rax            0x7fffffffe3b0   140737488348080

(gdb) x/wx 0x7fffffffe3b0+4
0x7fffffffe3b4: 0x0000002a ==> (42 int)

0x400612 <main+85>      mov    0x4(%rax),%eax
(gdb) info registers eax
eax            0x2a     42

0x400615 <main+88>      mov    %eax,-0x28(%rbp)
(gdb) x/wx $rbp-0x28
0x7fffffffe3a8: 0x0000002a

Nothing new here, but see that the cast to Integer allows a Number variable has access to a element that only Integer has.

assembly instructions

int x = numbers[0]->type;

0x400618 <main+91>      mov    -0x8(%rbp),%rax
0x40061c <main+95>      mov    (%rax),%rax
0x40061f <main+98>      mov    (%rax),%eax
(gdb) info registers rax
rax            0x0      0

0x400621 <main+100>     mov    %eax,-0x24(%rbp)
(gdb) x/wx $rbp-0x24
0x7fffffffe3ac: 0x00000000

assembly instructions

x = numbers[1]->type;

0x400624 <main+103>     mov    -0x8(%rbp),%rax
(gdb) info registers rax
rax            0x602010 6299664

0x400628 <main+107>     add    $0x8,%rax
(gdb) info registers rax
rax            0x602018 6299672

0x40062c <main+111>     mov    (%rax),%rax
(gdb) x/2wx 0x602018
0x602018:       0xffffe3c0      0x00007fff

(gdb) x/wx 0x7fffffffe3c0
0x7fffffffe3c0: 0x00000001

0x40062f <main+114>     mov    (%rax),%eax
(gdb) info registers rax
rax            0x7fffffffe3c0   140737488348096

(gdb) info registers eax
eax            0x1      1

0x400631 <main+116>     mov    %eax,-0x24(%rbp)
(gdb) x/wx $rbp-0x24
0x7fffffffe3ac: 0x00000001

assembly instructions

References