|
Overview of the Pyrex Language
This document informally describes the extensions to the Python language
made by Pyrex. Some day there will be a reference manual covering everything
in more detail.
Contents
Python functions vs. C functions
There are two kinds of function definition in Pyrex:
Python functions are defined using the def statement, as
in Python. They take Python objects as parameters and return Python objects.
C functions are defined using the new cdef statement. They
take either Python objects or C values as parameters, and can return either
Python objects or C values.
Within a Pyrex module, Python functions and C functions can call each other
freely, but only Python functions can be called from outside the module by
interpreted Python code. So, any functions that you want to "export" from
your Pyrex module must be declared as Python functions.
Parameters of either type of function can be declared to have C data types,
using normal C declaration syntax. For example,
def spam(int i, char *s): ...
cdef int eggs(unsigned long l, float f): ...
When a parameter of a Python function is declared to have a C data type,
it is passed in as a Python object and automatically converted to a C value,
if possible. Automatic conversion is currently only possible for numeric types
and string types; attempting to use any other type for the parameter of a
Python function will result in a compile-time error.
C functions, on the other hand, can have parameters of any type, since
they're passed in directly using a normal C function call.
Python objects as parameters and return values
If no type is specified for a parameter or return value, it is assumed
to be a Python object. (Note that this is different from the C convention,
where it would default to int.) For example, the following defines
a C function that takes two Python objects as parameters and returns a Python
object:
cdef spamobjs(x, y): ...
Reference counting for these objects is performed automatically according
to the standard Python/C API rules (i.e. borrowed references are taken as
parameters and a new reference is returned).
The name object can also be used to explicitly declare something
as a Python object. This can be useful if the name being declared would otherwise
be taken as the name of a type, for example,
cdef ftang(object int): ...
declares a parameter called int which is a Python object. You can
also use object as the explicit return type of a function, e.g.
cdef object ftang(object int): ...
It is probably a good idea to always be explicit about object parameters
in C functions, in the interests of clarity.
C variable and type definitions
The cdef statement is also used to declare C variables, either local
or module-level:
cdef int i, j, k cdef float f, g[42], *h
and C struct, union or enum types:
cdef struct Grail: int age float volume
cdef union Food: char *spam float *eggs
cdef enum CheeseType: cheddar, edam, camembert
cdef enum CheeseState: hard = 1 soft = 2 runny = 3
There is currently no special syntax for defining a constant, but you can
use an anonymous enum declaration for this purpose, for example,
cdef enum:
tons_of_spam = 3
Note that the words struct, union and enum are used only when defining
a type, not when referring to it. For example, to declare a variable pointing
to a Grail you would write
cdef Grail *gp
and not
cdef struct Grail *gp # WRONG
There is also a ctypedef statement for giving names to types, e.g.
ctypedef unsigned long ULong
ctypedef int *IntPtr
Scope rules
Pyrex determines whether a variable belongs to a local scope, the module
scope, or the built-in scope completely statically. As with Python,
assigning to a variable which is not otherwise declared implicitly declares
it to be a Python variable residing in the scope where it is assigned. Unlike
Python, however, a name which is referred to but not declared or assigned
is assumed to reside in the builtin scope, not the module scope. Names
added to the module dictionary at run time will not shadow such names.
Statements and expressions
Control structures and expressions follow Python syntax for the most part.
When applied to Python objects, they have the same semantics as in Python
(unless otherwise noted). Most of the Python operators can also be applied
to C values, with the obvious semantics.
If Python objects and C values are mixed in an expression,
conversions are performed automatically between Python objects and C numeric
or string types.
Reference counts are maintained automatically for
all Python objects, and all Python operations are automatically checked for
errors, with appropriate action taken.
Differences between C and Pyrex expression syntax
Pyrex also includes some C operations which have no direct Python equivalent.
Some of them are expressed differently in Pyrex than in C.
- There is no -> operator in Pyrex. Instead of
p->x, use p.x
- There is no * operator in Pyrex. Instead of
*p, use p[0]
- There is an & operator, with the same semantics
as in C
- Type casts are written <type>value , for
example:
cdef char *p, float *q p = <char*>q
The null C pointer is called NULL, not 0 (and
NULL is a reserved word).
Integer for-loops
You should be aware that a for-loop such as
for i in range(n):
...
won't be very fast, even if i and n are declared as C integers,
because range is a Python function. For iterating over ranges of
integers, Pyrex has another form of for-loop:
for i from 0 <= i < n:
...
If the loop variable and the lower and upper bounds are all C integers, this
form of loop will be much faster, because Pyrex will translate it into pure
C code.
Some things to note about the for-from
loop:
- The target expression must be a variable name.
- The name between the lower and upper bounds must be the same as the
target name.
- The direction of iteration is determined by the relations. If they
are both from the set {<, <=} then it is upwards;
if they are both from the set {>, >=} then it is
downwards. (Any other combination is disallowed.)
Like other Python looping statements, break and continue
may be used in the body, and the loop may have an else clause.
Error return values
If you don't do anything special, a function declared with cdef that
does not return a Python object has no way of reporting Python exceptions
to its caller. If an exception is detected in such a function, a warning message
is printed and the exception is ignored.
If you want a C function that does not return
a Python object to be able to propagate exceptions to its caller, you need
to declare an exception value for it. Here is an example:
cdef int spam() except -1:
...
With this declaration, whenever an exception occurs inside spam
, it will immediately return with the value -1. Furthermore, whenever
a call to spam returns -1, an exception will be assumed
to have occurred and will be propagated.
When you declare an exception value for a function,
you should never explicitly return that value. If all possible return values
are legal and you can't reserve one entirely for signalling errors, you can
use an alternative form of exception value declaration:
cdef int spam() except? -1:
...
The "?" indicates that the value -1 only indicates a possible
error. In this case, Pyrex generates a call to PyErr_Occurred if
the exception value is returned, to make sure it really is an error.
There is also a third form of exception value
declaration:
cdef int spam() except *:
...
This form causes Pyrex to generate a call to PyErr_Occurred after
every call to spam, regardless of what
value it returns. If you have a function returning void that needs
to propagate errors, you will have to use this form, since there isn't any
return value to test.
Some things to note:
- Currently, exception values can only declared for functions returning
an integer, float or pointer type, and the value must be a literal
, not an expression (although it can be negative). The only possible pointer
exception value is NULL. Void functions can only use the except
* form.
- The exception value specification is part of the signature of
the function. If you're passing a pointer to a function as a parameter or
assigning it to a variable, the declared type of the parameter or variable
must have the same exception value specification (or lack thereof). Here
is an example of a pointer-to-function declaration with an exception value:
int (*grail)(int, char *) except -1
- You don't need to (and shouldn't) declare exception values for functions
which return Python objects. Remember that a function with no declared return
type implicitly returns a Python object.
External declarations
By default, C functions and variables declared at the module level are local
to the module (i.e. they have the C static storage class). They can
also be declared extern to specify that they are defined elsewhere,
for example:
cdef extern int spam_counter
cdef extern void order_spam(int tons)
Referencing C header files
When you use an extern definition on its own as above, Pyrex includes a declaration
for it in the generated C file. This can cause problems if the declaration
doesn't exactly match the declaration that will be seen by other C code.
If you're wrapping an existing C library, for example, it's important that
the generated C code is compiled with exactly the same declarations as the
rest of the library.
To achieve this, you can tell Pyrex that
the declarations are to be found in a C header file, like this:
cdef extern from "spam.h":
int spam_counter
void order_spam(int tons)
The cdef extern from clause does three things:
- It directs Pyrex to place a #include statement for the named
header file in the generated C code.
- It prevents Pyrex from generating any C code for the declarations
found in the associated block.
- It treats all declarations within the block as though they started
with cdef extern.
It's important to understand that Pyrex does not itself read the C
header file, so you still need to provide Pyrex versions of any declarations
from it that you use. However, the Pyrex declarations don't always have to
exactly match the C ones, and in some cases they shouldn't or can't. In particular:
- Don't use const. Pyrex doesn't know anything about const, so
just leave it out. Most of the time this shouldn't cause any problem, although
on rare occasions you might have to use a cast.
1
- Leave out any platform-specific extensions to C declarations
such as __declspec().
- If the header file declares a big struct and you only want to
use a few members, you can just declare the members you're interested in.
- If the header file uses typedef names such as size_t to
refer to platform-dependent flavours of numeric types, you will need a corresponding
ctypedef statement, but you
don't need to match the type exactly, just use something of the right general
kind (int, float, etc). For example,
ctypedef int size_t
will work okay whatever the actual size of a size_t is (provided the header
file defines it correctly).
- If the header file uses macros to define constants, translate
them into a dummy enum declaration.
- If the header file defines a function using a macro, declare
it as though it were an ordinary function, with appropriate argument and
result types.
A few more tricks and tips:
- If you want to include a C header because it's needed by another header,
but don't want to use any declarations from it, put pass
in the extern-from block:
cdef extern from "spam.h":
pass
- If you want to include some external declarations, but don't want to
specify a header file (because it's included by some other header that you've
already included) you can put * in place of the header file name:
cdef extern from *:
...
Styles of struct, union and enum declaration
There are two main ways that structs, unions and enums can be declared in
C header files: using a tag name, or using a typedef. There are also some
variations based on various combinations of these.
It's important to make the Pyrex
declarations match the style used in the header file, so that Pyrex can emit
the right sort of references to the type in the code it generates. To make
this possible, Pyrex provides two different syntaxes for declaring a struct,
union or enum type. The style introduced above corresponds to the use of
a tag name. To get the other style, you prefix the declaration with ctypedef
, as illustrated below.
The following table shows the various
possible styles that can be found in a header file, and the corresponding
Pyrex declaration that you should put in the cdef exern from block.
Struct declarations are used as an example; the same applies equally to union
and enum declarations.
Note that in all the cases below,
you refer to the type in Pyrex code simply as Foo
, not struct Foo.
|
C code |
Possibilities for corresponding Pyrex
code |
Comments |
1 |
struct Foo {
...
}; |
cdef struct Foo:
... |
Pyrex will refer to the type as struct Foo in the generated
C code. |
2 |
typedef struct {
...
} Foo; |
ctypedef struct Foo:
... |
Pyrex will refer to the type simply as Foo in
the generated C code. |
3 |
typedef struct foo {
...
} Foo; |
cdef struct foo:
...
ctypedef foo Foo #optional |
If the C header uses both a tag and a typedef
with different names, you can use either form of declaration in Pyrex
(although if you need to forward reference the type, you'll have to use the
first form). |
ctypedef struct Foo:
... |
4 |
typedef struct Foo {
...
} Foo; |
cdef struct Foo:
... |
If the header uses the same name for the tag and the typedef,
you won't be able to include a ctypedef for it -- but then, it's not
necessary. |
Accessing Python/C API routines
One particular use of the cdef extern from statement is for gaining access
to routines in the Python/C API. For example,
cdef extern from "Python.h":
object PyString_FromStringAndSize(char *s, int len)
will allow you to create Python strings containing null bytes.
Public Declarations
You can make C variables and functions defined in a Pyrex module accessible
to external C code (or another Pyrex module) using the public
keyword, as follows:
cdef public int spam
# public variable declaration
cdef public void grail(int
num_nuns): # public function declaration
...
If there are any public declarations in a Pyrex module, a .h file
is generated containing equivalent C declarations for inclusion in other C
code.
Extension Types
As well as creating normal user-defined classes with the Python class
statement, Pyrex also lets you create new built-in Python types, known as
extension types. You define
an extension type using the cdef class statement. Here's an example:
cdef class Shrubbery:
cdef
int width, height
def
__init__(self, w, h):
self.width = w
self.height = h
def
describe(self):
print "This shrubbery is", self.width, \
"by", self.height, "cubits."
As you can see, a Pyrex extension type definition looks a lot like a Python
class definition. Within it, you use the def statement to define methods
that can be called from Python code. You can even define many of the special
methods such as __init__ as you would in Python.
The main difference is that
you can use the cdef statement to define attributes. The attributes
may be Python objects (either generic or of a particular extension type),
or they may be of any C data type. So you can use extension types to wrap
arbitrary C data structures and provide a Python-like interface to them.
Some other differences between
extension types and Python classes:
The set of
attributes of an extension type is fixed at compile time; you can't add attributes
to an extension type instance at run time simply by assigning to them, as
you could with a Python class instance. (You can subclass the extension type
in Python and add attributes to instances of the subclass, however.)
Attributes defined with cdef are only accessible from
Pyrex code, not from Python code. (A way of defining Python-accessible attributes
is planned, but not yet implemented. In the meantime, use accessor methods.)
To access the cdef-attributes of an extension type instance,
the Pyrex compiler must know that you have an instance of that type, and
not just a generic Python object. It knows this already in the case of the
"self" parameter of the methods of that type, but in other cases you will
have to tell it by means of a declaration. For example,
def widen_shrubbery(Shrubbery
sh, extra_width):
sh.width
= sh.width + extra_width
Some of the __xxx__ special methods behave differently from
their Python counterparts, and some of them are named differently as well.
See here for more information.
Special methods of extension types
This section has a whole separate page
devoted to it.
Subclassing extension types
Pyrex extension types can be subclassed in Python. They cannot currently
inherit from other built-in or extension types, but this may be possible in
a future version.
Forward-declaring extension types
Extension types can be forward-declared, like struct and union types. This
will be necessary if you have two extension types that need to refer to each
other, e.g.
cdef class
Shrubbery # forward declaration
cdef class Shrubber:
cdef
Shrubbery work_in_progress
cdef class Shrubbery:
cdef
Shrubber creator
External extension types
Extension types can be declared extern. In conjunction with the
cdef extern from
statement, and together with a slight addition to the extension class
syntax, this provides a way of gaining access to the internals of pre-existing
Python objects. For example, the following declarations will let you get
at the C-level members of the built-in complex object.
cdef extern
from "complexobject.h":
struct Py_complex:
double real
double imag
ctypedef class complex [type PyComplex_Type, object PyComplexObject]:
cdef Py_complex cval
Note the use of ctypedef class. This is because, in the Python header
files, the PyComplexObject struct is declared with
ctypedef
struct {
...
} PyComplexObject;
Here is an example of a function which uses the complex type declared
above.
def spam(complex
c):
print "Real:", c.cval.real
print "Imag:", c.cval.imag
When declaring an external extension type, you don't declare any methods.
Declaration of methods is not required in order to call them, because the
calls are Python method calls. Also, as with structs inside a cdef extern
from block, you only need to declare those C members which you wish to
access.
Name specification
clause
The part of the class declaration in square brackets is a special feature
only available for extern extension types. The reason for it is that
Pyrex needs to know the C names of the struct representing an instance of
the type, and of the Python type-object for the type. It knows these names
for non-extern extension types, because it generates them itself, but in
the case of an extern extension type, you need to tell it what they are.
Both the type
and object parts are optional. If you don't specify the object
part, Pyrex assumes it's the same as the name of the class. For instance,
the class declaration could also be written
class
PyComplexObject [type PyComplex_Type]:
...
but then you would have to write the function as
def
spam(PyComplexObject c):
...
You can also omit the type part of the specification, but this will
severely limit what you can do with the type, because Pyrex needs the type
object in order to perform type tests. A type test is required every time
an argument is passed to a Python function declared as taking an argument
of that type (such as spam() above), or a generic Python object is assigned
to a variable declared to be of that type. Without access to the type object,
Pyrex won't allow you to do any of those things. Supplying the type object
name is therefore recommended if at all possible.
Final remarks
There is one more subtlety to the above example that should be mentioned.
By calling the extension type "complex", we're creating a module-level variable
called "complex" that shadows the built-in name "complex". This isn't a problem,
because they both have the same value, i.e. the type-object of the built-in
complex type. In the Pyrex module, the name "complex" can be used both as
a constructor of complex objects, and as a type name for declaring variables
and arguments of type complex.
If we call the
class something else, however, such as "PyComplexObject" as in the second
version above, we would have to use "PyComplexObject" as the type name. Both
"complex" and "PyComplexObject" would work as constructors ("complex" because
it's a built-in name), but only "PyComplexObject" would work as a type name
for declaring variables and arguments.
Limitations
Pyrex is not quite a full superset of Python. The following restrictions
apply:
Function definitions (whether using def or cdef) cannot be
nested within other function definitions.
Class definitions can only appear at the top level of a module,
not inside a function.
The import * form of import is not allowed anywhere
(other forms of the import statement are fine, though).
Generators cannot be defined in Pyrex.
The above restrictions will most likely remain, since removing them would
be difficult and they're not really needed for Pyrex's intended applications.
There are
also some temporary limitations which may eventually be lifted:
Class and function definitions cannot be placed inside control structures.
In-place operators (+=, etc) are not yet supported.
List comprehensions are not yet supported.
There are probably also some other gaps which I can't think of at the moment.
Footnotes
1. A problem with const could arise if you have something like
cdef extern from "grail.h": char *nun
where grail.h actually contains
extern const char *nun;
and you do
cdef void oral(char *s): #something that doesn't change s
...
oral(nun)
which will cause the compiler to complain. You can work around it by casting
away the constness:
oral(<char *>nun)
|