A C typedef convention for complex types

C type syntax is maddening to read and write. The only sensible way to proceed is to force yourself to only ever express composite types in a typedef.

An example from cdecl.org:

// declare foo as pointer to function (void) returning pointer to array 3 of int
int (*(*foo)(void ))[3];

Maybe you can read this, but I can’t. How might we more sensibly write this? First let’s re-use the common formal syntax for generics to describe such types:

foo : ptr<fn<void,ptr<arr<int,3>>>>

We have the type language:

type ::= ptr<type>
       | fn<type, type>
       | arr<type, int>
       | ...
       ;

It is difficult to embed this language in a single token, because of the nested <>. Instead, let’s use a “reverse Polish notation”:

void int 3 arr ptr fn ptr

The words above represent instructions to build a type. We keep a stack as we read through the words, and at the end, we expect to have a single type on the stack. So as we read through, our stack is:

INSTRUCTION  STACK
===========  ==============================
void         void
int          void, int
3            void, int, 3
arr          void, arr<int,3>
ptr          void, ptr<arr<int,3>>
fn           fn<void, ptr<arr<int,3>>>
ptr          ptr<fn<void, ptr<arr<int,3>>>>

This language is easily embeddable in a token:

void_int_3_arr_ptr_fn_ptr

We then gradually construct these types using typedef:

typedef int int_3_arr[3];
typedef int_3_arr* int_3_arr_ptr;
typedef int_3_arr_ptr (*void_int_3_arr_ptr_fn)(void);
typedef void_int_3_arr_ptr_fn* void_int_3_arr_ptr_fn_ptr;

void_int_3_arr_ptr_fn_ptr foo;

Unfortunately, this fn rule can’t deal with multiple parameters. As a practical concession, we introduce fn0, fn1, fn2, etc., which specify how many parameters there are. For functions with parameter void, we use fn0. So we now have:

int_3_arr_ptr_fn0_ptr

We can read this in reverse: “a pointer to a function which takes 0 arguments and returns a pointer to an array of 3 integers.”

Now let’s try a real-world example:

extern void (*signal(int, void(*)(int)))(int);

How could this be more clearly written? To quote this SO answer, “it’s a function that takes two arguments, an integer and a pointer to a function that takes an integer as an argument and returns nothing, and it (signal()) returns a pointer to a function that takes an integer as an argument and returns nothing.”

We can write this in our formal language as fn2<int,ptr<fn1<int,void>>,ptr<fn1<int,void>>>, and then convert this to:

typedef void (*int_void_fn1)(int);
typedef int_void_fn1* int_void_fn1_ptr;
typedef int_void_fn1_ptr (*int_int_void_fn1_ptr_int_void_fn1_ptr_fn2)(int, int_void_fn1_ptr);

extern int_int_void_fn1_ptr_int_void_fn1_ptr_fn2 signal;

We can no longer read this purely in reverse, but at least the rules to understand it are simple. This is not great, but it’s at least decipherable. For the C type syntax, I need a cup of coffee and a syntax reference.

We could embed the nested syntax more obviously by substituting some characters for <, , and >, but the set of valid characters in identifiers is pretty small. We can’t do much better than C, a, and D:

fn2_C_int_a_ptr_C_fn1_C_int_a_void_D_D_a_ptr_C_fn1_C_int_a_void_D_D_D

… but this looks terribly mangled.


I wrote this because I felt like it. This post is my own, and not associated with my employer.

Jim. Public speaking. Friends. Vidrio.