Sunday, April 24, 2011

C++ really is a different language

When I write code, I tend to think of C++ as a superset of C. Everything I write in C is basically portable to C++ without too much trouble. However, mostly because of name mangling, C and C++ really are different under the hood. Without some help, C and C++ code could not be linked. The C++ compiler mangles every name. It mangles the names of methods so that classes have separate name spaces. It mangles the names of functions so that they can be overloaded. But, since it can't know in advance when a name will be overloaded, it mangles every single name.

So let's say we have a low-level library written in C, let's say for the filesystem. It has a body in a .c file somewhere, and a .h file describing the interface. This .h file is valid C and valid C++, but they mean different things in the different languages.

When you compile the library itself in C, it creates a bunch of functions with names like fat16_open_file, fat16_write, etc. Those names are written into the machine language .o file which is the final output of the compiler.

Now your higher-level program in C++ includes that same header. When it sees a function like:

int fat16_write(struct fat16_root*, char* data, int size)

It quite naturally assumes that the function is really called __QQNRUG_hhqnfat16_write_struct_fat16_rootP_cP_I_sghjc. How could it be otherwise? The C++ compiler then happily writes code which calls this mangled function. When the linker comes along, well, you know the rest.

So how do we fix this? One way is to recompile the lower-level library in C++, but for various reasons that may not be possible or a good idea.

So, we use an interesting capability in C++ to handle code in other languages. I don't know how general this mechanism is, and I think it is used about 99.99% of the time for this purpose.

extern "C" {
  #include "fat16.h"
}

What this extern statement does is say that everything inside the brackets is in a different language, not C++. In this case, the header is to be treated as C code, and the names are not mangled. I imagine this is some exceptionally difficult code inside the compiler, and works on languages which are only sufficiently close to C++, but it works, and is needed to allow the adoption of C++ in the presence of so much existing C code.

No comments:

Post a Comment