7.2. Unboxed types and primitive operations

GHC is built on a raft of primitive data types and operations. While you really can use this stuff to write fast code, we generally find it a lot less painful, and more satisfying in the long run, to use higher-level language features and libraries. With any luck, the code you write will be optimised to the efficient unboxed version in any case. And if it isn't, we'd like to know about it.

We do not currently have good, up-to-date documentation about the primitives, perhaps because they are mainly intended for internal use. There used to be a long section about them here in the User Guide, but it became out of date, and wrong information is worse than none.

The Real Truth about what primitive types there are, and what operations work over those types, is held in the file fptools/ghc/compiler/prelude/primops.txt. This file is used directly to generate GHC's primitive-operation definitions, so it is always correct! It is also intended for processing into text.

Indeed, the result of such processing is part of the description of the External Core language. So that document is a good place to look for a type-set version. We would be very happy if someone wanted to volunteer to produce an SGML back end to the program that processes primops.txt so that we could include the results here in the User Guide.

What follows here is a brief summary of some main points.

7.2.1. Unboxed types

Most types in GHC are boxed, which means that values of that type are represented by a pointer to a heap object. The representation of a Haskell Int, for example, is a two-word heap object. An unboxed type, however, is represented by the value itself, no pointers or heap allocation are involved.

Unboxed types correspond to the “raw machine” types you would use in C: Int# (long int), Double# (double), Addr# (void *), etc. The primitive operations (PrimOps) on these types are what you might expect; e.g., (+#) is addition on Int#s, and is the machine-addition that we all know and love—usually one instruction.

Primitive (unboxed) types cannot be defined in Haskell, and are therefore built into the language and compiler. Primitive types are always unlifted; that is, a value of a primitive type cannot be bottom. We use the convention that primitive types, values, and operations have a # suffix.

Primitive values are often represented by a simple bit-pattern, such as Int#, Float#, Double#. But this is not necessarily the case: a primitive value might be represented by a pointer to a heap-allocated object. Examples include Array#, the type of primitive arrays. A primitive array is heap-allocated because it is too big a value to fit in a register, and would be too expensive to copy around; in a sense, it is accidental that it is represented by a pointer. If a pointer represents a primitive value, then it really does point to that value: no unevaluated thunks, no indirections…nothing can be at the other end of the pointer than the primitive value.

There are some restrictions on the use of primitive types, the main one being that you can't pass a primitive value to a polymorphic function or store one in a polymorphic data type. This rules out things like [Int#] (i.e. lists of primitive integers). The reason for this restriction is that polymorphic arguments and constructor fields are assumed to be pointers: if an unboxed integer is stored in one of these, the garbage collector would attempt to follow it, leading to unpredictable space leaks. Or a seq operation on the polymorphic component may attempt to dereference the pointer, with disastrous results. Even worse, the unboxed value might be larger than a pointer (Double# for instance).

Nevertheless, A numerically-intensive program using unboxed types can go a lot faster than its “standard” counterpart—we saw a threefold speedup on one example.

7.2.2. Unboxed Tuples

Unboxed tuples aren't really exported by GHC.Exts, they're available by default with -fglasgow-exts. An unboxed tuple looks like this:

(# e_1, ..., e_n #)

where e_1..e_n are expressions of any type (primitive or non-primitive). The type of an unboxed tuple looks the same.

Unboxed tuples are used for functions that need to return multiple values, but they avoid the heap allocation normally associated with using fully-fledged tuples. When an unboxed tuple is returned, the components are put directly into registers or on the stack; the unboxed tuple itself does not have a composite representation. Many of the primitive operations listed in this section return unboxed tuples.

There are some pretty stringent restrictions on the use of unboxed tuples:

Unboxed tuple types are subject to the same restrictions as other unboxed types; i.e. they may not be stored in polymorphic data structures or passed to polymorphic functions.
Unboxed tuples may only be constructed as the direct result of a function, and may only be deconstructed with a case expression. eg. the following are valid:
f x y = (# x+1, y-1 #) g x = case f x x of { (# a, b #) -> a + b }
but the following are invalid:
f x y = g (# x, y #) g (# x, y #) = x + y
No variable can have an unboxed tuple type. This is illegal:
f :: (# Int, Int #) -> (# Int, Int #) f x = x
because x has an unboxed tuple type.

Note: we may relax some of these restrictions in the future.

The IO and ST monads use unboxed tuples to avoid unnecessary allocation during sequences of operations.