Even after the scope is restricted to designing a foreign-language
interface from Haskell to C, the task remains surprisingly tricky. At
first, one might think that one could take the C header file
describing a C procedure, and generate suitable interface code to make
the procedure callable from Haskell.
Alas, there are numerous tiresome details that are simply not expressed
by the C procedure prototype in the header file. For example,
consider calling a C procedure that opens a file, passing a character
string as argument. The C prototype might look like this:
int open( char *filename );
Our goal is to generate code that implements a Haskell procedure with
type
open :: String -> IO FileDescriptor
First there is the question of data representation. One has to
decide either to alter the Haskell language implementation, so that is
string representation is identical to that of C, or to translate the
string from one representation to another at run time. This translation
is conventionally called marshalling.
Since Haskell is lazy, the second approach is required. (In general,
it is tremendously constraining to try to keep common representations
between two languages. For example, precisely how are structures laid out
C?)
Next come questions of allocation and lifetime. Where should
we put the translated string? In a static piece of storage? (But how
large a block should we allocate? Is it safe to re-use the same block
on the next call?) Or in Haskell's heap? (But what if the called
procedure does something that triggers garbage collection, and the
transformed string is moved? Can the called procedure hold on to the
string after it returns?) Or in C's malloc'ed heap? (But how
will it get deallocated? And malloc is expensive too.)
C procedures often accept pointer parameters (such as strings)
that can be NULL. How is that to be reflected on the host-language
side of the interface? For example, if the documentation for open told
us that it would do something sensible when called with a NULL string,
we might like the Haskell type for open to be
open :: Maybe String -> IO FileDescriptor
so that we can model NULL by Nothing.
The desired return type, FileDescriptor, will presumably
have a Haskell definition such as this:
newtype FileDescriptor = FD Int
The file descriptor returned by open is just an integer, but
Haskell programmers often use newtype declarations create new
distinct types isomorphic to existing ones. Now the type system will
prevent, say, an attempt to add one to a FileDescriptor.
Needless to say, the Haskell result type is not going to be described
in the C header file.
The file-open procedure might fail; sometimes details of the
failure are stored in some global variable, errno. Somehow this
failure and the details of what went wrong must be reflected into
Haskell's IO monad.
The open procedure causes a side effect, so it is appropriate for
its type to be in Haskell's IO monad. Some C functions really
are functions (that is, they have no side effects), and in this case
it makes sense to give them a ``pure'' Haskell type. For example, the
C function sin should appear to the Haskell programmer as a
function with type
sin :: Float -> Float
C function prototypes are not explicit about the mode of
their parameters. Which parameters are in parameters, which
out and which are in out - that is, in what direction do
data pass via a parameter?
None of these details are mentioned in the C header file. Instead,
many of them are in the manual page for the procedure, while others
(such as parameter lifetimes) may not even be written down at all.