C language doesn’t offer support for object oriented programming, but it’s not necessarily an obstacle. Actually, it’s been used behind the scenes for a long time. Let’s see how Python interpretor implements it own class inheritance in C.

The Python Way

Python is an object oriented language, it’s possible to write a base class and classes deriving (subclasses) from it. However, the interpretor (CPython) is written in C and, as mentioned, C doesn’t offer such support. So, let’s see how they implement it.

PyObject is the base class in Python, it’s the mother of everybody else. But, in C, it’s a simple struct with some fields #defined _HEAD_, which are inhered (actually it’s defined) by any of its subclass.

    // http://svn.python.org/view/python/trunk/Include/object.h
    ...
    #define _PyObject_HEAD_EXTRA
    #define _PyObject_EXTRA_INIT
    ...
    
    #define PyObject_HEAD                   \
        _PyObject_HEAD_EXTRA                \
        Py_ssize_t ob_refcnt;               \
        struct _typeobject *ob_type;
     
    #define PyObject_VAR_HEAD \
        PyObject_HEAD \
        Py_ssize_t ob_size; /* Number of items in variable part */
     
    typedef struct _object {
        PyObject_HEAD
    } PyObject;

See below the children of PyObject: PyStringObject, PyFloatObject, and PyListObject. But how do I know that they’re children of PyObject? It’ll be clearer later, but take a look at the structures definitions, they all have PyObject_HEAD or PyObject_VAR_HEAD (which also starts with PyObject_HEAD) field.

    // http://svn.python.org/view/python/trunk/Include/stringobject.h
    typedef struct {
        PyObject_VAR_HEAD
        long ob_shash;
        int ob_sstate;
        char ob_sval[1];
    } PyStringObject;
     
    // http://svn.python.org/view/python/trunk/Include/floatobject.h
    typedef struct {
        PyObject_HEAD
        double ob_fval;
    } PyFloatObject;
     
    // http://svn.python.org/view/python/trunk/Include/listobject.h
    typedef struct {
        PyObject_VAR_HEAD
        PyObject **ob_item;
        Py_ssize_t allocated;
    } PyListObject;

A simple class diagram could be made from those structures:

PyObject object diagram

Casting

Here the magic begins. In the code below focus on return ((PyListObject*)op)->ob_item[i].

Do you know type casting? In a nutshell, it’s the way to force the conversion between two different data types. In C, casts are usually dangerous and must be used wisely.

    // http://svn.python.org/view/python/trunk/Objects/listobject.c?view=markup
    PyObject *PyList_GetItem(PyObject *op, Py_ssize_t i)
    {
        if (!PyList_Check(op)) {
            PyErr_BadInternalCall();
            return NULL;
        }
    
        if (i < 0 || i >= Py_SIZE(op)) {
            if (indexerr == NULL) {
                indexerr = PyString_FromString("list index out of range");
                if (indexerr == NULL)
                    return NULL;
            }
            PyErr_SetObject(PyExc_IndexError, indexerr);
            return NULL;
        }
        return ((PyListObject *)op)->ob_item[i];
    }

See, op is a data pointer of a PyObject type. That ((PyListObject*)op) casts (forced conversion) op to a PyListObject type (note the PyList_Check(op) as a sanity check). Then it accesses ob_item field to return the i item. Note that ob_item is an array of PyObject data pointers as correctly specified in the function signature.

Those structures (PyObject and PyList) are completely different, why don’t that break the program?

In this particular case, the conversion is safe and well behaved because it’s defined by the C99 standard..

Within a structure object, the non-bit-field members and the units in which bit-fields reside have addresses that increase in the order in which they are declared. A pointer to a structure object, suitably cast, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa. There may therefore be unnamed holes within a structure object, but not at its beginning, as necessary to achieve the appropriate alignment.

Basically, it means that a pointer to any struct always point to its first element (declaration order), so you can cast it to any other struct if (and only if) its first field have the same type. The compiler will handle different sizeof(struct) for us.

Show me some code

Based on a sample C code, let’s go through the not-optimized code generated by GCC to see how it’s implemented.

 1    #include <stdlib.h>
 2    #include <stdio.h>
 3    
 4    #define HEAD int type;
 5    
 6    enum { INTEGER, FLOAT };
 7    
 8    typedef struct
 9    {
10        HEAD
11    } Number;
12    
13    typedef struct
14    {
15        HEAD
16        int val;
17    } Integer;
18    
19    typedef struct
20    {
21        HEAD
22        float val;
23    } Float;
24    
25    int main()
26    {
27        Integer i;
28        i.type = INTEGER;
29        i.val  = 42;
30    
31        Float f;
32        f.type = FLOAT;
33        f.val  = 36.85;
34    
35        Number **numbers = (Number**)malloc(sizeof(Number*) * 2);
36    
37        numbers[0] = (Number*)&i;
38        numbers[1] = (Number*)&f;
39    
40        // for debugging purpose ↓
41        int y = ((Integer*)numbers[0])->val;
42    
43        int x = numbers[0]->type;
44        x = numbers[1]->type;
45        // for debugging purpose ↑
46    
47        for (unsigned int count = 0; count < 2; ++count)
48        {
49            switch (numbers[count]->type)
50            {
51                case INTEGER:
52                    printf("%d - Integer: %d\n", 
53                            count, 
54                            ((Integer*)numbers[count])->val);
55                    break;
56    
57                case FLOAT:
58                    printf("%d - Float: %.2f\n", 
59                            count, 
60                            ((Float*)numbers[count])->val);
61                    break;
62    
63                default:
64                    break;
65            }
66        }
67    
68        free(numbers);
69    
70        return 0;
71    }
    # Compiling:
    $ gcc -g -std=c99 code.c -o codebin
    
    # Running:
    $ ./codebin
    0 - Integer: 42
    1 - Float: 36.85

Now let’s see the internals.

    Number **numbers = (Number**)malloc(sizeof(Number*) * 2);
    
    0x4005e3 <main+38>      mov    $0x10,%edi
    0x4005e8 <main+43>      callq  0x4004c0 <malloc@plt>
    0x4005ed <main+48>      mov    %rax,-0x8(%rbp)
    
    (gdb) info registers rax
    rax            0x602010 6299664
    
    (gdb) x/wx $rbp-0x8
    0x7fffffffe3c8: 0x00602010
    
    (gdb) x/wx 0x00602010
    0x602010:       0x00000000

The malloc() function gets the size stored in edi register and writes the memory address in rax. That address is copied to rbp-0x8, the place of our variable numbers in the stack.

malloc instructions

    numbers[0] = (Number*)&i;
    
    0x4005f1 <main+52>      mov    -0x8(%rbp),%rax
    0x4005f5 <main+56>      lea    -0x20(%rbp),%rdx
    0x4005f9 <main+60>      mov    %rdx,(%rax)
    (gdb) x/wx $rbp-0x20
    0x7fffffffe3b0: 0x00000000
    
    (gdb) x/2wx 0x00602010
    0x602010:       0xffffe3b0      0x00007fff

Register rbp-0x8 (AKA “numbers”) is copied to rax, the address of rbp-0x20 (address of struct “i”) is copied into rdx, and the content of rdx (address of “i”) is stored in the heap. The instruction mov %rdx,(%rax) copies the rdx content into the place where rax content is pointing to, which is the memory that malloc() gave to me. Note that rbp-0x20 is the place of Integer first element.

assembly instructions

Important: the address of a local variable (stack) was stored in the heap, fortunately we are running everything in main(). But it can be dangerous in other scenarios. Read about stack frames.

    int y = ((Integer*)numbers[0])->val;
    
    0x40060b <main+78>      mov    -0x8(%rbp),%rax
    (gdb) info registers rax
    rax            0x602010 6299664
    
    0x40060f <main+82>      mov    (%rax),%rax
    (gdb) info registers rax
    rax            0x7fffffffe3b0   140737488348080
    
    (gdb) x/wx 0x7fffffffe3b0+4
    0x7fffffffe3b4: 0x0000002a ==> (42 int)
    
    0x400612 <main+85>      mov    0x4(%rax),%eax
    (gdb) info registers eax
    eax            0x2a     42
    
    0x400615 <main+88>      mov    %eax,-0x28(%rbp)
    (gdb) x/wx $rbp-0x28
    0x7fffffffe3a8: 0x0000002a

Note that the cast to Integer “grants” a Number variable access to a field that only Integer has defined (val).

assembly instructions

    int x = numbers[0]->type;
    
    0x400618 <main+91>      mov    -0x8(%rbp),%rax
    0x40061c <main+95>      mov    (%rax),%rax
    0x40061f <main+98>      mov    (%rax),%eax
    (gdb) info registers rax
    rax            0x0      0
    
    0x400621 <main+100>     mov    %eax,-0x24(%rbp)
    (gdb) x/wx $rbp-0x24
    0x7fffffffe3ac: 0x00000000

assembly instructions

    x = numbers[1]->type;
    
    0x400624 <main+103>     mov    -0x8(%rbp),%rax
    (gdb) info registers rax
    rax            0x602010 6299664
    
    0x400628 <main+107>     add    $0x8,%rax
    (gdb) info registers rax
    rax            0x602018 6299672
    
    0x40062c <main+111>     mov    (%rax),%rax
    (gdb) x/2wx 0x602018
    0x602018:       0xffffe3c0      0x00007fff
    
    (gdb) x/wx 0x7fffffffe3c0
    0x7fffffffe3c0: 0x00000001
    
    0x40062f <main+114>     mov    (%rax),%eax
    (gdb) info registers rax
    rax            0x7fffffffe3c0   140737488348096
    
    (gdb) info registers eax
    eax            0x1      1
    
    0x400631 <main+116>     mov    %eax,-0x24(%rbp)
    (gdb) x/wx $rbp-0x24
    0x7fffffffe3ac: 0x00000001

assembly instructions

Conclusion

Class inheritance in C is not as elegant/easy as it is in other languages, but it can be achieved. Anyway, as we saw above, there’re several pitfalls that could cause damage to you system. Such technique must be used diligently.

References