Tcl CFFI package (v2.0b1)

ConceptsTop, Main, Index

This page describes some general concepts behind CFFI. Basic knowledge of the package as described in Quick start is assumed. For a more direct mapping of C declarations to CFFI declarations see Cookbook.

ScopesTop, Main, Index

To avoid conflicts arising from the same name being used in different in packages layered on CFFI, program elements like type aliases, enumerations, prototypes and pointer tags are defined within an enclosing scope. The scope is named after the Tcl namespace in which the defining command is invoked.

For example, assuming the libgit2 and libzip namespaces are used for wrapping shared libraries of the same name, the following definition for the STATUS alias used in two (imagined) functions would not be in conflict as they have different scopes.

namespace eval libgit2 {
    cffi::Wrapper create lib1 libgit2.so
    cffi::alias define STATUS int
    lib1 function git_commit STATUS {}
}
namespace eval libzip {
    cffi::Wrapper create lib2 libzip.so
    cffi::alias define STATUS long
    lib2 function libzip_open STATUS {path string}
}

When a program element name is referenced from another definition, if the name is not fully qualified it is first looked up in the scope of the definition. If not found, it is looked up in the global space. In the above definition of libzip_open, the STATUS alias is resolved in the libzip scope. If it had not been defined there, the global scope would be checked. To refer to a name in any other scope, it must be fully qualified, for example ::libgit2::STATUS.

Note that although CFFI scopes are named after Tcl namespaces, they are not directly tied to them. For example, deleting will a Tcl namespace will not cause the scope of the same name to disappear.

Type declarationsTop, Main, Index

A type declaration consists of a data type followed by zero or more annotations that further specify handling of values of the type. For example, the nonzero annotation on a function return type indicates a return value of zero should be treated as an error.

For example, a type declaration for a parameter might look like

int {default 0} byref

where the base data type is an integer, the default annotation specifies a value to be used if no argument is supplied and byref indicates that the value is actually passed by reference.

Type declarations appear in three different contexts:

The permitted data types and annotations are dependent on the context in which the type declaration appears.

Type annotationsTop, Main, Index

The table below summarizes the available type annotations and the types and contexts in which they are allowed.

bitmaskThe parameter, function return or field value is treated as an integer formed by a bitwise-OR of a list of integer values.
byrefThe parameter or function return value is passed or returned by reference.
countedThe parameter or function return is a reference counted pointer whose validity is checked. See Pointer safety.
defaultSpecifies a default value to use for a parameter or field.
discardSpecifies that an function return value be discarded and empty result returned.
disposeThe parameter is a pointer that is unregistered irrespective of function return status. See Pointer safety.
disposeonsuccessThe parameter is a pointer that is disposed only if function returns successfully. See Pointer safety.
enumThe parameter, return value or field is an enumeration.
errnoIf the function return value indicates an error condition, the error code is available via the C RTL errno variable.
inMarks a parameter passed to a function as input to the function. See Input and output parameters.
inoutMarks a parameter passed to a function as both input and output. See Input and output parameters.
lasterrorIf the function return value indicates an error condition, the error code is available via the Windows GetLastError API.
multiszThe value is a concatenation of multiple nul-terminated strings with an empty string indicating the end. (Windows MULTI_SZ format)
nonnegativeRaise an exception if the function return value is negative.
nonzeroRaise an exception if the function return values is zero.
novaluechecksDisable value validity checks. The checks depend on the value type. See discussion of specific types for its semantics.
nullifemptyTreat an empty string or struct dictionary value passed as function argument or struct field as a NULL pointer. See Strings as NULL pointers. In the case of out or inout parameters, specifies that if the variable name passed as the argument is the empty string, a NULL pointer should be passed as the function argument.
nullokDeprecated. Alias for novaluechecks.
onerrorSpecifies an error handler if a function return value indicates an error condition.
outMarks a parameter as output-only from a function. See Input and output parameters.
pinnedThe parameter or function return is a reference pinned pointer whose validity is checked. See Pointer safety.
positiveRaise an exception if a function return value is negative or zero.
retvalMarks the parameter as an output parameter whose value is to be returned as the result of the function. See Output parameters as function result.
storealwaysTreat an output parameter as valid regardless of any error indications from the function call.
storeonerrorTreat an output parameter as valid only in the presence of error indication from a function call.
structsizeDefault a field value to the size of the containing struct.
unsafeDo not do any pointer validation on a parameter, return value or field. See Pointer safety.
winerrorTreat the function return value as a Windows status code.
zeroRaise an exception if a function return value is not zero.

Later sections will further detail usage of the above.

Data typesTop, Main, Index

CFFI data types correspond to C types and may be

The type info, type size and type count commands may be used to obtain information about a type.

% cffi::type info {int[10]}
Size 40 Count 10 Alignment 4 Definition {{int[10]}} BaseSize 4

The void typeTop, Main, Index

This corresponds to the C void type and is only permitted as the return type of a function. Note that the C void * type is declared as a pointer type.

Integer typesTop, Main, Index

The following integer types are supported.

scharC signed char
ucharC unsigned char
shortC signed short
ushortC unsigned short
intC signed int
uintC unsigned int
longC signed long
ulongC unsigned long
longlongC signed long long
ulonglongC unsigned long long

Floating point typesTop, Main, Index

The following floating point types are supported.

floatC float
doubleC double

ArraysTop, Main, Index

Arrays are declared as

TYPE[N]

where N is a positive integer indicating the number of elements in an array of values of type TYPE. At the script level, arrays are represented as Tcl lists.

Dynamically sized arraysTop, Main, Index

Additionally, within parameter declarations, N may also be the name of a parameter within the same function declarations. In this case, the array is sized dynamically depending on the value of the referenced parameter at the time the call is made. This is useful in the common case of a function writing to a buffer. For example, consider the Win32 API for generating random numbers

BOOL CryptGenRandom(HCRYPTPROV hProv, DWORD dwLen, BYTE *pbBuffer);

Here pbBuffer is really a pointer to an array that is of size dwLen. Assuming the ADVAPI32.DLL DLL has already been wrapped as advapi32 and Win32 type aliases loaded, one might define the CFFI wrapper as

advapi32 stdcall CryptGenRandom BOOL {
    hProv HANDLE
    dwLen DWORD
    pbBuffer {uchar[512] out}
}

However this can can lead to corruption if the function is mistakenly called with the dwLen argument greater than 512. A safer way to define the function is

advapi32 stdcall CryptGenRandom BOOL {
    hProv HANDLE
    dwLen DWORD
    pbBuffer {uchar[dwLen] out}
}

This ensures a buffer of the correct size is passed to the function based on the length passed in the call.

If the size argument used by a dynamic array type is passed as 0, an error is raised unless the type declaration includes a novaluechecks annotation. In that case, the array pointer argument is passed as NULL to the function.

In the above function, the dwLen parameter was an in parameter containing the size of the buffer and the function filled the entire buffer. In many cases, a function is passed a pointer to the size of the buffer. It then overwrites the location with the actual count stored in the buffer. In such cases, the parameter should be declared with the inout annotation. When the function is called the size of the buffer passed is that specified by the corresponding variable argument. On return, the number of elements returned is that actually returned by the wrapped function.

Arrays as stringsTop, Main, Index

C arrays are generally represented as a list at the script level. So in the above example, the value stored in pbBuffer would be seen in Tcl as a list of unsigned 8-bit integer values. Sometimes this list representation is not the most appropriate or convenient.

For example, the returned data from CryptGenRandom might be better handled as a binary string. CFFI provides the types bytes, chars, unichars and winchars that are defined as arrays but treat C arrays of 8-bit values as strings instead. See Strings and Binary strings for more information.

PointersTop, Main, Index

Pointers are declared in one of the following forms:

pointer
pointer.TAG

The first is the equivalent of a void* C pointer. The second form associates the pointer type with a tag.

Pointer values are currently represented in the form

ADDRESS^TAG

where the tag is optional as in declarations. Applications must not rely on this specific representation as it is subject to change. Instead the pointer ensemble command set should be used to manipulate pointers. In particular, the ::cffi::pointer make command constructs a pointer from a memory address and tag. The ::cffi::pointer address and ::cffi::pointer tag do the reverse.

Pointer tagsTop, Main, Index

A pointer tag is used to provide for some measure of type safety. Tags can be associated with pointer values as well as pointer type declarations. The tag attached to a pointer value must match the tag for the struct field it is assigned to or the function parameter it is passed as. Otherwise an error is raised. Tags also provide a typing mechanism for function pointers. This is described in Prototypes and function pointers.

Note however that, although similar, pointer tags are orthogonal to the type system. Any tag may be associated with a pointer type or value, irrespective of the underlying C pointer type.

Tags for pointer types are defined in the corresponding struct field or function parameter declarations. Pointer values are associated with the tags of the type through which they are created, qualified with a scope. For example, the pointer returned by a function declared in the global namespace

LIB function get_path pointer.PATH {}

will be tagged with ::PATH. On the other hand if the function was declared within a namespace ns

namespace eval ns {
    LIB function get_path pointer.PATH {}
}

pointers returned from the function would be tagged with ::ns::PATH. Furthermore, the tag in the definition may be fully qualified as

namespace eval ns {
    LIB function get_path pointer.::ns2::PATH {}
}

in which case returned pointers have the same exact tag. Note the scope ns2 need not even correspond to a Tcl namespace.

Pointers can only be assigned to a struct field or passed as a parameter if the corresponding pointer type in the struct field or parameter definition has the same tag. If there is no tag specifed for the pointer field or parameter, it will accept pointer values with any tag analogous to a C void * pointer.

Casting pointersTop, Main, Index

Normally a pointer with a tag is not accepted as a function argument or struct field if it differs from the tag in the declaration. There are two exceptions to this:

This pointer castable command enables this second feature. For example,

::cffi::pointer castable Rectangle Shape

will result in any pointer value with tag Rectangle being accepted wherever the tag Shape is accepted. Note this implies transitivity.

A pointer may also be cast explicitly to one with a different tag with the pointer cast command. This requires that the existing tag is castable to the new tag. So given the above example,

In case the pointer is a safe (registered) pointer, explicit casts change the tag associated with the registered pointer.

For debugging and troubleshooting purposes, the pointer castables command may be used to list the tags that are castable and their mappings.

Pointer safetyTop, Main, Index

Pointer type checking via tags does not protect against errors related to invalid pointers, double frees etc. To provide some level of protection against these types of errors, pointers returned from functions, either as return values or through output parameters are by default registered in an internal table. These are referred to as safe pointers. Any pointer use is then checked for registration and an error raised if it is not found.

Pointers that have been registered are unregistered when they are passed to a C function as an argument for a parameter that has been annotated with the dispose or disposeonsuccess annotation.

The following fragment illustrates safe pointers. The fragment assumes a wrapper object crtl for the C runtime library has already been created.

% crtl function malloc pointer {sz size_t}
% crtl function free void {ptr {pointer dispose}}
% set p [malloc 10]
0x55dbb8b2ca10^void
% free $p
% free $p
Pointer 0x55dbb8b2ca10^ is not registered.

The pointer returned by malloc is automatically registered. When the free function is invoked, its argument is checked for registration. Moreover, because the free function's ptr parameter has the dispose annotation, it is unregistered before the function is called. The second call to free therefore fails as desired.

The disposeonsuccess annotation is similar to dispose except that if the function return type includes error check annotations, the pointer is unregistered only if the return value passes the error checks.

Reference counted pointers

Some C API's return the same resource pointer multiple times while internally maintaining a reference count. Examples are dlopen on Linux or LoadLibrary and COM API's on Windows. Such pointers need to be declared with the counted attribute. This works similarly to the default safe pointers except that the same pointer value can be registered multiple times. Correspondingly, the pointer can be accessed until the same number of calls are made to a function that disposes of the pointer. The Linux example below illustrates this.

% cffi::Wrapper create crtl
::crtl
% crtl function dlopen {pointer counted} {path string flags int}
% crtl function dlclose int {dlptr {pointer dispose}}
% set dlptrA [dlopen /usr/lib/x86_64-linux-gnu/libc.so.6 1]
0x00007fb07ebb7500^
% set dlptrA [dlopen /usr/lib/x86_64-linux-gnu/libc.so.6 1]
0x00007fb07ebb7500^

Note the same pointer value was returned from both calls. We can then call dlclose multiple times but not more than the number of times dlopen was called.

% dlclose $dlptrA
0
% dlclose $dlptrA
0
% dlclose $dlptrA
Pointer 0x00007fb07ebb7500^ is not registered.
Unsafe pointers

C being C, there are many situations where pointers are generated and passed around in a somewhat ad hoc manner with no clear ownership. For such situations where safe and counted pointers can raise exceptions that are false positives, pointer declarations can be annotated as unsafe. Return values from functions and output parameters with this annotation will not be registered as safe pointers. Conversely, input parameters with this designation will not be checked for registration.

In addition to the implicit registration of pointers, applications can explicitly control pointer registration or with the ::cffi::pointer check, ::cffi::pointer safe, ::cffi::pointer counted and ::cffi::pointer dispose commands.

Pinned pointers

The ::cffi::pointer pin command may be used to permanently register a pointer as safe. The pointer will remain registered irrespective of any ::cffi::pointer dispose calls and can only be unregistered with ::cffi::pointer invalidate.

The pinned attribute on a pointer return declaration or output parameter has the same effect. The pointer is registered and will remain so until invalidated with ::cffi::pointer invalidate.

Pinned pointers have no associated tag and any pointers with the same address components as a pinned pointer will be considered registered.

The primary use of pinned pointers is for API's that make use of "pseudo handles" that are always valid.

Invalid pointer valuesTop, Main, Index

At the script level, a null pointers is any pointer whose address component is 0. The token NULL may also be used for this purpose.

Null pointers have their own safety checks and are independent of the pointer registration mechanisms described above. By default, a function result that is a null pointer is treated as an error and triggers the function's error handling mechanisms. Similarly, an attempt to pass a null pointer to a function or store it as a field value in a C struct will raise an exception. This can be overridden by including the novaluechecks annotation on the function return, parameter or structure fields type definition. For return values of type string, unistring and winstring with this annotation, an empty string is returned when the called function returns NULL. In case of structs that are returned by reference, a novaluechecks annotation will map a NULL return value to a struct with default values for all fields. If any field does not have a default, an error is raised.

Note that when returned as output parameters from a function, either directly or embedded as struct field, null pointers are permitted even without the novaluechecks annotation.

Memory operationsTop, Main, Index

Pointers are ofttimes returned by functions but more often than not the referenced memory has to be allocated and passed in to functions. Some type constructs like strings and structs hide this at the script level but there are times when direct access to the memory content addressed by pointers is desired.

The memory command ensemble provides such functionality. The commands ::cffi::memory allocate and ::cffi::memory free provide memory management facilities. Access to the content is available through ::cffi::memory tobinary and ::cffi::memory frombinary commands which convert to and from Tcl binary strings. The ::cffi::memory get and ::cffi::memory set commands provide type-aware access to read and write memory.

As an alternative to the memory command, the arena command implements a memory arena in which frames can be allocated with ::cffi::arena pushframe. Memory blocks can then be allocated within the last allocated frame using ::cffi::arena allocate. These blocks are all freed when the frame is deallocated with ::cffi::arena popframe. Multiple calls can be made to ::cffi::arena pushframe with ::cffi::arena popframe freeing the last allocated frame. In effect this behaves like a software stack and is useful for short-lived storage as it is faster and results in less memory fragmentation than the heap based memory command.

StringsTop, Main, Index

Strings in C are generally represented in memory as a sequence of null terminated bytes in some specific encoding. They may be declared either as a char * or as an array of char where the size of the array places a limit on the maximum length.

At the script level, these can be declared in multiple ways:

pointerAs discussed in the previous section, this is a pointer to raw memory. To access the underlying string, the memory referenced by the pointer has to be converted into a Tcl string value with the ::cffi::memory tostring command.
string.ENCODINGValues declared using this type are still pointers at the C level but are converted to and from Tcl strings implicitly at the C API interface itself using the specified encoding. If .ENCODING is left off, the system encoding is used.
unistringThis is similar to string.ENCODING except the values are Tcl_UniChar* at the C level and the encoding is implicitly the one used by Tcl for the Tcl_UniChar data type.
winstringThis is similar to string.ENCODING except the values are WCHAR at the C level and the encoding is implicitly the one UTF-16 as used by the Windows API. This type is only present on Windows.
chars.ENCODINGThe value is an array of characters at the C level. The type must always appear as an array, for example, chars.utf-8[10] and not as a scalar chars.utf-8. In this as well, conversion to and from Tcl strings is implicit using the specified encoding, which again defaults to the system encoding. Following standard C rules, arrays are passed by reference as function arguments and thus an declaration of chars[10] would also be passed into a function as a char*. Within a struct definition on the other hand, it would be stored as an array.
unicharsThe value is an array of Tcl_UniChar characters and follows the same rules as chars except that the encoding is always that used by Tcl for the Tcl_UniChar type.
wincharsThe value is an array of WCHAR characters and follows the same rules as chars except that the encoding is UTF-16 as used in the Windows API. This type is only present on Windows.

The choice of using pointer, string (or unistring, winstring), or chars (or unichars, winchars) depends on the C declaration and context as well as convenience.

The examples below illustrate use cases for each of the above to wrap these directory related functions.

char *get_current_dir_name(void);
char *getcwd(char *buf, size_t size);
int chdir(const char *path);

The first function, get_current_dir_name returns a pointer to malloc'ed memory that must be freed. We cannot use the string type for implicit conversion to strings because we need access to the raw pointer so it can be freed. We are thus forced to stick to the use of pointers. Our CFFI wrapper would be defined as (assuming libc is wrapper object)

libc function get_current_dir_name pointer {}
libc function free void {p {pointer dispose}}

We need the free function because as stated by the get_current_dir_name man page, the returned pointer is malloc'ed and has to be freed by the application. (Note the use of dispose in the parameter declaration as described in Pointer safety.)

The actual use of the function would involve explicit pointer handling.

% set p [get_current_dir_name]
0x0000558b92986d60^
% puts [cffi::memory tostring $p]
/mnt/d/src/tcl-cffi/build-ubuntu-x64
% free $p

The second function getcwd requires the caller to supply the buffer into which the directory path will be written. The buffer size the function expects is not a constant but rather given by the value of the size argument. While this function could also be wrapped using pointers and explicitly allocated memory, it is much simpler to use the chars type to supply a buffer.

libc function getcwd string {buf {chars[size] out} size int}

Two notable points about this definition: first, the use of dynamic arrays for parameters as described in Dynamically sized arrays. Second, the return type is string because the pointer returned by getcwd is the same as the pointer passed in and since CFFI is automatically managing that memory, there is no need to get a hold of the raw pointer.

This simplifies the usage, for the return value as well as output argument:

% puts [getcwd dir 256]
/mnt/d/src/tcl-cffi/build-ubuntu-x64
% puts $dir
/mnt/d/src/tcl-cffi/build-ubuntu-x64 

The final example only involves passing in a path to the chdir function. Since we are dealing with only passing a constant string, this is the simplest case. Just defining the parameter as string suffices.

libc function chdir int {dir string}

Usage is also straightforward.

% chdir /tmp
0
% getcwd dir 512
/tmp

MULTI_SZ stringsTop, Main, Index

Some Windows API's make use of the MULTI_SZ string type which is a string consisting of a sequence of nul-terminated strings in memory followed by an additional nul (i.e. an empty string indicates the end). This can be mapped to the winstring and winchars type by annotating the declaration with the multisz attribute. At the script level, these are represented as a list of strings.

% kernel32 stdcall GetPrivateProfileSectionNamesW uint {
    buf {winchars[bufSize] multisz out}
    bufSize uint
    filename {winstring nullifempty}
}
::GetPrivateProfileSectionNamesW
% GetPrivateProfileSectionNamesW buf 1000 ""
306
% set buf
AeDebug CLOCK COLORS CONSOLE CURSORS DESKTOP ...

Strings as NULL pointersTop, Main, Index

Some API's allow for char* pointer parameters fields to be NULL. If these are wrapped as one of the string type string, unistring, winstring or one of the character array types chars, unichars or winchars, the nullifempty annotation can be used to specify that empty string values should be passed or stored as NULL pointers as opposed to pointers to an empty string.

Binary stringsTop, Main, Index

While the string, unistring, winstring, chars, unichars and winchars types deal with character strings, the types binary or bytes serve a similar purpose for dealing with binary data - a sequence of bytes in memory. The binary type translates to a C unsigned char * type where the memory is treated as a Tcl binary string (byte array). Similarly, the bytes type is analogous to the chars type except it declares a size array of bytes, not characters in an encoding. These types are converted between Tcl values and C values with the Tcl_GetByteArrayFromObj and Tcl_NewByteArrayFromObj functions.

Consider the wrapper for the CryptGenRandom function that we saw earlier.

advapi32 stdcall CryptGenRandom BOOL {
    hProv HANDLE
    dwLen DWORD
    pbBuffer {uchar[dwLen] out}
}

When this function is called as

CryptGenRandom $hProv 100 data

the random bytes are returned in the variable data as a list of 100 integer values in the range 0-255.

Most applications of random data would probably prefer this be a binary string instead. The function wrapper would therefore be better defined as

advapi32 stdcall CryptGenRandom BOOL {
    hProv HANDLE
    dwLen DWORD
    pbBuffer {bytes[dwLen] out}
}

Now the above call to the function would result in variable data containing a binary string of length 100.

While the bytes type corresponds to chars, the binary type corresponds to string. The underlying C type is actually a pointer, not an array. Because there is no inherent length indicator as there is for string type which is nul-terminated, binary can only be used in type declaration for input parameters to a function and in no other context. The function receives the data as retrieved by Tcl's Tcl_GetByteArrayFromObj function.

As for chars, the bytes type can also be annotated with nullifempty in which case binaries strings of zero length are passed as NULL pointers. Without the annotation, a binary string of zero length will raise an error.

In the case of the binary type, the nullifempty annotation is superfluous. Zero length binary strings typed as binary are always passed as NULL pointers.

StructsTop, Main, Index

C structs are wrapped through the ::cffi::Struct class. This encapsulates the layout of the struct and provides various methods for manipulation. A structure layout is a list of alternating field name and type declarations. An example of a definition would be

::cffi::Struct create Point {
    x int
    y int
}

A struct field may be of any type except void and binary. In addition, fields that are string, unistring or winstring impose certain limitations. They can be used in structs that are passed in and out of functions as arguments but cannot be allocated from the heap using methods like allocate, tonative etc.

As for function parameters, field types can have associated annotations. For example, the above definition may be changed to assign default values to fields.

::cffi::Struct create Point {
    x {int {default 0}}
    y {int {default 0}}
}

Annotations that may be applied to field type declarations include

Once defined, structs can be referenced in function prototypes and in other structs as struct.STRUCTNAME, for example struct.Point. Referencing is scope-based. If the struct name is not fully qualified, it is looked up in the current Tcl namespace and then in the global scope.

At the script level, C struct values are represented as dictionaries with field names as dictionary keys. An exception is raised if any field is missing unless the field declaration has a default annotation or the struct is defined with the -clear option which defaults all fields to a zero value.

Alternatively, structs can also be manipulated as native C structs in memory using raw pointers and explicit transforms. For example,

% set pPoint [Point allocate]
0x00000211cb924de0^Point
% Point tonative
wrong # args: should be "Point tonative POINTER INITIALIZER ?INDEX?"
% Point tonative $pPoint {x 0 y 1}
% Point fromnative $pPoint
x 0 y 1
% Point setnative $pPoint x 42
% Point fromnative $pPoint
x 42 y 1

NOTE: structs that are manipulated as raw structs in memory cannot contain fields of type string, unistring and winstring. They must use raw pointers and explicitly manage their target memory.

The package provides other methods to access fields and otherwise manipulate native structs in memory. See ::cffi::Struct.

Packed structsTop, Main, Index

Compilers allow various means for changing the padding and alignment of fields in a struct; for example, the pack pragma in MSC. Correspondingly the -pack option may be used to define the equivalent structure in CFFI. The use and its effect is illustrated below.

% cffi::Struct create S {uc uchar d double s short}
::S
% S describe
Struct ::S nRefs=1 size=24 alignment=8 flags=0 nFields=3
uchar uc offset=0 size=1
double d offset=8 size=8
short s offset=16 size=2
% cffi::Struct create Spacked {uc uchar d double s short} -pack 1
::Spacked
% Spacked describe
Struct ::Spacked nRefs=1 size=11 alignment=1 flags=0 nFields=3
uchar uc offset=0 size=1
double d offset=1 size=8
short s offset=9 size=2

Note that packed structs cannot be passed by value to functions. This is a limitation of the underlying backends, libffi and dyncall. This combination of passing packed structs by value is very unlikely to happen in practice.

Variable sized structsTop, Main, Index

C allows the definition of structs of variable size where the last field in the struct is a Variable Length Array (VLA), an array whose length is not fixed. Using C99's syntax, an example would be

struct S {
    int count;
    double values[];
};

The equivalent definition in CFFI is

cffi::Struct define S {
    count int
    values double[count]
}

When converting to native form, the actual size of the values array will be as contained in the count field.

Variable size structs have the following restrictions some of which parallel C99:

UnionsTop, Main, Index

C unions are wrapped through the ::cffi::Union class analogous to the ::cffi::Struct class for defining C structs. An example definition of a union would be

cffi::Union create U {
    c uchar
    d double
}

The type is then refernced as union.U.

Unlike structs, the content of a union is not well defined and depends on some discriminator outside of the union itself. Moreover, underlying FFI libraries libffi and dyncall do not directly support passing of unions to and from functions. Thus the union type has certain limitations:

Below is a usage example using the above union.

% set ubin [U encode c 42]
% U decode c $ubin
42
% U decode d $ubin
3.91702106007e-312

Note as above that garbage is returned if a field other than what was stored is retrieved.

If a struct S is defined as

cffi::Struct create S {
    u union.U
    i int
}

allocating the struct and retrieving field values would be done as

% set pStruct [S new [list i 1 u [U encode c 42]]]
0x000002868be5f3e0^::S
% U decode c [S getnative $pStruct u]
42
% S getnative $pStruct i
1

Note the difference between how i is passed versus u. Modifying the union and storing a different field would look like

% S setnative $pStruct u [U encode d 3.14]
% U decode d [S getnative $pStruct u]
3.14

As in C care must be taken that the same field is retrieved from a union as was stored in it.

UUID'sTop, Main, Index

The uuid type maps to the UUID structure on Windows and uuid_t on Unix-y platforms. Although the type could also have been modeled using a struct or bytes array, a separate type allows for a more natural string representation than either of those.

% cffi::Wrapper create ole32 ole32
::ole32
% ole32 stdcall CLSIDFromProgID {long nonnegative lasterror} {p winstring u {uuid retval}}
::CLSIDFromProgID
% CLSIDFromProgID Shell.Application
13709620-c279-11ce-a49e-444553540000

Caution: The string representation used is platform-dependent for compatibility with applications on that platform (COM in particular). When sharing UUID's between platforms, use of the binary form may be preferable. The type tobinary command can be used for this purpose.

Parameters of type uuid can only be passed by reference.

Type aliasesTop, Main, Index

Type aliases provide a convenient way to bind data types and one or more annotations. They can then be used in type declarations in the same manner as the built-in types.

In addition to avoiding repetition, type aliases facilitate abstraction. For example, many Windows API's have an output parameter that is typed as a fixed size buffer of length MAX_PATH characters. A type alias OUTPUT_PATH defined as

cffi::alias define OUTPUT_PATH {unichar[512] out}

can be used in function and struct field declarations.

Similarly, type aliases can be used to hide platform differences. For example, in the following function prototype,

kernel stdcall HeapCreate pointer.HEAP {opts uint initSize SIZE_T maxSize SIZE_T}

SIZE_T is an alias that resolves to either uint or ulonglong depending on whether the platform is 32- or 64-bit.

Various points to note about type aliases:

For convenience, the package provides the ::cffi::alias load command which defines some standard C type aliases like size_t as well as some platform-specific type aliases such as HANDLE on Windows. These are all loaded in the ::cffi scope.

Currently defined type aliases can be listed with the ::cffi::alias list command and removed with ::cffi::alias delete.

EnumerationsTop, Main, Index

Enumerations allow the use of symbolic constants in place of integral values passed as arguments to functions or on assignment to struct fields. Their primary purpose is similar to preprocessor #define constants and enum types in C. They are defined and otherwise managed through the cffi::enum command ensemble. The fragment below provides an example.

cffi::enum define CMARK_OPTS {
    DEFAULT 0
    NORMALIZE 256
    VALIDATE 512
    SMART 1024
}
cmarkLib function cmark_render_html pointer {
    root pointer.cmark_node
    opts {int {enum CMARK_OPTS}}
}
set htmlptr [cmark_render_html $root NORMALIZE]

Alternatives to the ::cffi::enum define command used above include ::cffi::enum sequence and ::cffi::enum flags which are convenient for defining sequential values and bit masks respectively.

Enumeration can also be used in literal form where they are directly expressed in the type definition. For example, the cmark_render_html function could also be defined as below without the CMARK_OPTS named enumeration.

cffiLib function cmark_render_html pointer {
    root pointer.cmark_node
    opts {int {enum {
        DEFAULT 0
        NORMALIZE 256
        VALIDATE 512
        SMART 1024
    }}}
}

When combined with the bitmask annotation, bitmasks can be symbolically represented as a list.

cffiLib function cmark_render_html pointer {
    root pointer.cmark_node
    opts {int bitmask {enum CMARK_OPTS}}
}
set htmlptr [cmark_render_html $root {SMART NORMALIZE}]

When enumeration types are returned from a function or through output parameters or struct fields, they are returned as integers, not mapped to enumeration member names. The ::cffi::enum name or ::cffi::enum unmask commands can be used to accomplish for the purpose if desired.

For additional commands related to enumerations see ::cffi::enum.

FunctionsTop, Main, Index

To invoke a function in a DLL or shared library, the library must first be loaded through the creation of a ::cffi::Wrapper object. The ::cffi::Wrapper.function and ::cffi::Wrapper.stdcall methods of the object can then be used to create Tcl commands that wrap individual functions implemented in the library.

Calling conventionsTop, Main, Index

The 32-bit Windows platform uses two common calling conventions for functions: the default C calling convention and the stdcall calling convention which is used by most system libraries. These differ in terms of parameter and stack management and it is crucial that the correct convention be used when defining the corresponding FFI.

Other than use of the two separate methods for definition, there is no difference in terms of the function prototype used for definition or the method of invocation.

Note that this difference in calling convention is only applicable to 32-bit Windows. For other platforms, including 64-bit Windows, stdcall behaves in identical fashion to function.

Function wrappersTop, Main, Index

The function wrapping methods function and stdcall have the following syntax:

DLLOBJ function FNNAME RETTYPE PARAMS
DLLOBJ stdcall FNNAME RETTYPE PARAMS

where DLLOBJ is the object wrapping a shared library, FNNAME is the name of the function (and an optional Tcl alias) within the library, RETTYPE is the function return type declaration and PARAMS is a list of alternating parameter names and type declarations. The type declarations may include annotations that control behaviour and conversion between Tcl and C values.

The C function may then be invoked as FNNAME like any other Tcl command.

Return typesTop, Main, Index

A function return declaration is a type or type alias followed by zero or more annotations. The resolved type must not be void or an array including chars, unichars, winchars, binary and bytes. Note pointers to these are permitted.

In the case of string, unistring and winstring types, the script level return values are constructed by dereferencing the returned pointer as character strings. Since the underlying pointer is not available, any storage cannot be freed and these types should only be used as the return type in cases where that is not needed (for example, when the function returns pointers to static strings).

Returning structs from functions is only supported by the libffi backend.

Return annotationsTop, Main, Index

The following annotations may be follow the type in a return type declaration.

ParametersTop, Main, Index

The PARAMS argument in a function prototype is a list of alternating parameter name and parameter type declaration elements. A parameter type declaration may begin with any supported type except void and may be followed a sequence of optional type annotations.

Input and output parametersTop, Main, Index

Parameters of a function may be used to pass data to the function (pure input parameters), get data back from the function (pure output parameters) or both. CFFI parameter type declarations denote these with the in, out and inout annotations respectively. If none of these annotations are present, the parameter defaults to an implicit in annotation.

In addition arguments may be passed to the function either by value or by reference where the pointer to the value is passed. Parameters that are pure input are normally passed by value. In some cases, functions take even pure input arguments by reference, (for example large structures). In such cases, the CFFI parameter declaration should have the byref annotation to indicate that a pointer to the value should be passed and not the value itself. Note that arrays are always passed by reference in C so array types do not need to be explicitly annotated with byref as they default to that in any case.

In the case of in parameters, at the time of calling the function the argument must be specified as a Tcl value even when the byref annotation is present. The passing through a pointer to the reference is implicit.

NOTE: In the case of string, unistring and winstring, in parameters correspond to char * and Tcl_UniChar * respectively, while in byref map to char ** and Tcl_UniChar **.

Parameters that are out or inout are always passed by reference irrespective of whether the byref annotation is present or not. The argument to the function must be specified as the name of a variable in the caller's context. For inout parameters, the variable must exist and contain a valid value for the parameter type. For out parameters, the variable need not exist. In both cases, on return from the function the output value stored in the parameter by the function will be stored in the variable. Note that inout cannot be used with string, unistring and winstring types while neither out nor inout can be used with binary.

There are some subtleties with respect to error handling that are relevant to output parameters and must be accounted for in declarations. See Errors and output parameters for more on this.

Output parameters as function resultTop, Main, Index

Many functions return values as pairs with the function return value being a status or error code and the actual function result being returned as an output parameter. In such cases, the retval annotation on the output parameter can be used to return it as the result of the wrapped command.

The retval annotation

The parameter annotated with retval does not appear in the wrapped command signature (i.e. it is not supplied as an argument in the invocation).

The return value will be the parameter output value only if the function's native return value passes the error checks. Otherwise, an exception is raised as usual.

See Delegating return values for an example of retval usage.

Parameter annotationsTop, Main, Index

The following annotations may follow the type in a parameter type declaration:

Structs as parametersTop, Main, Index

In the case of parameters that are structs, the input argument for the parameter when the function is called should be a dictionary value. Conversely, output parameter results are returned as a dictionary of the same form.

Variable size structs have some restriction as function return types and parameters. They cannot be the return type for a function and cannot be passed by value. Additionally, when passed by reference they must be in or inout parameters. They cannot be out parameters as the field containing the size of the VLA array is part of the struct and must be passed in to the function.

Error handlingTop, Main, Index

C functions generally indicate errors through their return value. Details of the error are either in the return value itself or intended to be retrieved by some other mechanism such as errno.

One way to deal with this at the script level is to simply check the return value (generally an integer or pointer) and take appropriate action. This has two downsides. The first is that error conditions in Tcl are almost always signalled by raising an exception rather than through a return status mechanism so checking status on every call is not very idiomatic. The second, perhaps more important, downside is that the detail behind the error, stored in errno or available via GetLastError() on Windows, is more often than not lost by the time the Tcl interpreter returns to the script level.

Error annotationsTop, Main, Index

Additional sets of type annotations are provided to solve these issues. The first set of annotations is used to define the error check conditions to be applied to function return values. The second set is used to specify how the error detail is to be retrieved.

The following annotations for error checking can be used for integer return types.

zeroThe value must be zero.
nonzeroThe value must be non-zero.
nonnegativeThe value must be zero or greater.
positiveThe value must be greater than zero.

The return value from every call to the function is then checked as to whether it satisfies the condition. Failure to do so is treated as an error condition.

An error condition is also generated when a function returning a pointer returns an null pointer or, on Windows only, the INVALID_HANDLE_VALUE preprocessor constant. This is also true for string, unistring and winstring return types as well as struct types that are returned by reference since those are all pointers beneath the covers. This treatment of null pointers as errors can be overridden with the the novaluechecks annotation. If this annotation is specified and the function returns a NULL pointer,

An error condition arising from one of the error checking annotations or a null pointer results in an exception being generated unless the onerror annotation is specified (see below). However, the default error message generated is generic and does not provide detail about why the error occured. The following error retrieval annotations specify how detail about the error is to be obtained.

errnoThe POSIX error is stored in errno. The error message is generated using the C runtime strerror function. Note: This annotation should only be used if the wrapped function uses the same C runtime as the cffi extension. The Tcl errorCode variable is set to a list comprising of CFFI, ERRNO, the POSIX name for the error (e.g. ENOENT), the numeric error code, and the error message.
lasterror(Windows only). The error code and message is retrieved using the Windows GetLastError and FormatMessage functions. The Tcl errorCode variable is set to a list comprising of CFFI, WIN32, the numeric error code, and the error message.
winerror(Windows only). The numeric return value is itself the Windows error code and the error message is generated with FormatMessage. The Tcl errorCode variable is set to a list comprising of CFFI, WIN32, the numeric error code, and the error message. This annotation can only be used with the zero error checking annotation.

These annotations can be applied only to integer types except that the errno and lasterror annotations can be used with pointer types as well.

In addition to the above built-in error handlers, the onerror annotation provides a means for customizing error handling when the error is from a library and not a system error. The annotation takes an additional argument which is a command prefix to be invoked when an error checking annotation is triggered. When this command prefix is invoked, a dictionary with the call information is passed. The dictionary contains the following keys:

ResultThe return value from the function that triggered the error handler.
InA nested dictionary mapping all in and inout parameter names to the values passed in to the called function.
OutA dictionary mapping all inout and out parameter names to the values returned on output by the function. These only include output parameters marked as storealways or storeonerror.
CommandThe Tcl command for which the error handler was triggered. This key will not be present if the function was invoked with an address through the ::cffi::call command.

The result of the handler execution is returned as the function call result and may be a normal result or a raised exception. The handler may use upvar for access to the calling script's context including any input or output arguments to the original function call.

This onerror facility may be used to ignore errors, provide default values as well as raise exceptions with more detailed library-specific information. Note that the use of a onerror handler that returns normally is not the same as not specifying any error checking annotations because the function return is still treated as an error condition in terms of the output variables as described in Errors and output parameters.

NOTE: Although the errno, lasterror, winerror and onerror annotations have effect only with respect to function return values, they can also be specified for parameters and struct fields where they are silently ignored. This is to permit the same type alias (e.g. status codes) to be used in all three declaration contexts.

One final annotation related to error handling is saveerrors. This is provided to deal with functions that do not return an error status but rely on the caller checking errno or GetLastError() after the call. This annotation can be added to any return type. If present, the errno value and GetLastError() value on Windows are saved internally after the CFFI call and can then be retrieved with the ::cffi::savederrors command.

Errors and output parametersTop, Main, Index

An important consideration in the presence of errors is how the called function deals with output (including input-output) parameters. There are three possibilities:

The distinction is particularly crucial for non-scalar output. Output parameters that have not been written to may result in corruption or crashes if the memory is accessed for conversion to Tcl script level values.

By default, script level output variables are only written to when the error checks pass (including the case where none are specified). This is the first case above. If the storealways annotation is specified for a parameter, it is stored irrespective of whether an error check failed or not. This is the second case. Finally, the storeonerror annotation targets the third case. The output parameter is stored only if an error check fails.

Note that an error checking annotation must be present for any of these to have an effect.

Varargs functionsTop, Main, Index

Some C functions take a variable number of arguments. Wrappers for these are defined as for normal functions except that the last parameter definition should be the string .... In addition, when the wrapped command is invoked, the varargs arguments should be passed as a pair consisting of a type declaration and a value. This is illustrated in the example below wrapping the snprintf function.

% cffi::Wrapper create libc libc.so.6
::libc
% libc function snprintf int {buf {chars[bufSize] out} bufSize int format string ...}
::snprintf
% snprintf buf 100 "The %s is %d." {string answer} {int 42}
17
% set buf
The answer is 42.

Note how the varargs arguments are passed as type, value pairs. Just as programming in C, extreme care has to be taken to pass the right argument types.

Varargs arguments have certain restrictions:

Prototypes and function pointersTop, Main, Index

The function wrapping methods function and stdcall described earlier bind a function type definition consisting of the return type and parameters with the address of a function as specified by its name. For some uses, it is useful to be able to independently specify the function type information independent of the function address. The ::cffi::prototype function and ::cffi::prototype stdcall commands are provided for this purpose. They take a very similar form to the corresponding methods:

cffi::prototype function NAME RETTYPE PARAMS
cffi::prototype stdcall NAME RETTYPE PARAMS

where RETTYPE and PARAMS are as described in Function wrappers. The commands result in the creation of a function prototype NAME which can be used as tags for pointers to functions. The ::cffi::call command can then be used to invoke the pointer target.

For example, consider the following C fragment

typedef int ADDER(int, int);
ADDER *function_returning_adder();
ADDER *fnptr = function_returning_adder();
fnptr(1,2);

This would be translated into CFFI as

cffi::prototype function ADDER int {x int y int}
DLLOBJ function function_returning_adder pointer.ADDER {}
set fnptr [function_returning_adder]
cffi::call $fnptr 1 2

InterfacesTop, Main, Index

Interfaces in CFFI are intended to wrap COM interfaces in Windows and described in that context below. However they can be used for similar structures outside of COM and on other platforms. The underlying C type is a structure whose first field is a pointer to a table of functions, the vtable (virtual function table). Each of these functions accept a pointer to the structure as their first parameter. The methods are invoked through this virtual table. A limited form of inheritance is supported in this model and also exposed through CFFI.

An interface in CFFI is defined with ::cffi::Interface. The COM IUnknown interface for example would be defined as

cffi::Interface IUnknown

This creates the IUnknown interface as an ensemble command to which methods can be added with either methods for functions that use the _cdecl calling convention or stdmethods for functions that use the _stdcall calling convention. COM uses the latter, so in CFFI the methods can be defined as

cffi::alias define HRESULT {long nonnegative winerror}
cffi::alias define IID {bytes[16]}
IUnknown stdmethods {
    QueryInterface HRESULT {riid IID ppvObject {pointer out}}
    AddRef uint {}
    Release uint {}
} -disposemethod Release

Points to note:

The above will result in definition of Tcl commands of the form InterfaceName.MethodName, e.g. IUnknown.QueryInterface, IUnknown.AddRef and so on. The first parameter to any method should be a pointer to the COM object, the remaining being the parameters as defined in the method definition.

Interface inheritanceTop, Main, Index

In some cases, one interface may inherit from another. In CFFI, this is indicated using the -inherit option to Interface. Additional methods can be defined on the derived interface in the usual fashion.

cffi::Interface IDispatch -inherit IUnknown
IDispatch stdmethods {
    GetTypeInfoCount HRESULT {ctinfo {int out}}
    ....
}

Methods defined in the inherited interface, IUnknown.AddRef etc., as well as methods defined in IDispatch can be invoked on objects supporting the IDispatch interface.

Example:

TBD

CallbacksTop, Main, Index

Some C functions take a parameter that is a pointer to a function that is then invoked by the called outer function, often in iterative fashion passing elements of some data set in turn. Wrapping such functions involves the following steps:

Callbacks cannot take a variable number of arguments.

Warning: CFFI callbacks can only be used when the called function invokes them before returning. They are not suitable in cases where the callback is called at a later time after the function returns. Doing so will likely result in a crash.

Use of callbacks is illustrated below for the ftw function available on some platforms to iterate through files and directories. The C declaration of the function is

int ftw(const char *dirpath,
        int (*fn) (const char *fpath, const struct stat *sb,
                   int typeflag),
        int nopenfd);

The second argument fn to this function is a pointer to a callback function that will be called for every file under the directory specified by the first argument.

To wrap this function with CFFI, first a prototype is defined that matches the declaration for the fn parameter.

% cffi::prototype function ftw_callback int {fpath string sbP {pointer.stat unsafe} typeflag int}
::ftw_callback

Then the ftw function is itself wrapped with the callback argument referencing the prototype.

% cffi::Wrapper create libc libc.so.6
::libc
% libc function ftw int {dirpath string fn pointer.ftw_callback nopenfd int}
::ftw

Next the callback function pointer is created.

proc print_name {fpath pSb typeflag} {puts $fpath; return 0}
% set cb [cffi::callback new ftw_callback print_name -1]
0x00007f1a56661010^::ftw_callback

The ftw function can then be invoked with this callback function pointer.

% ftw [pwd] $cb 5
/mnt/d/src/tcl-cffi/build-ubuntu-x64
/mnt/d/src/tcl-cffi/build-ubuntu-x64/cffitest.so
/mnt/d/src/tcl-cffi/build-ubuntu-x64/config.log
/mnt/d/src/tcl-cffi/build-ubuntu-x64/config.status
...output elided...

Finally, the callback pointer can be freed assuming we will not need it again.

% cffi::callback free $cb

It is useful to know that the callback command is invoked in the Tcl context from which the outer function was invoked. For example, if we wanted to collect file names instead of printing them out, we could collect them in a variable.

% proc collect_names {namevar fpath pSb typeflag} {
    upvar 1 $namevar names
    lappend names $fpath
    return 0
}
% set files {}
% set cb [cffi::callback new ftw_callback [list collect_names files] -1]
0x00007f1a56661080^::ftw_callback
% ftw [pwd] $cb 5
0
% set files
/mnt/d/src/tcl-cffi/build-ubuntu-x64 ...

The above example also shows that the second argument to cffi::callback is a command prefix, not necessarily a single-word command, to which the arguments from the callback invocation itself are appended.