ConceptsTop, Main, Index

This page describes the mapping of C types and program elements to the Tcl script level. Basic knowledge of the package as described in Quick start is assumed.

Type declarationsConcepts, Top, Main, Index

Type declarations appear in three different contexts:

  • As the return value from a function
  • As part of a parameter description in a function declaration
  • As part of a field description in a struct

At the script level, a type declaration consists of the type itself followed by zero or more annotations that provide further information about the type in that particular context. For example, a parameter may have the out annotation to indicate that it is an output parameter for the function. A pointer type declaration may have the unsafe annotation to indicate it is not to be checked for validity.

The annotations that are valid in each context are described in the relevant sections below.

Data typesConcepts, Top, Main, Index

This section describes the various data types supported by the package and their relation to C types. At runtime, the ::cffi::type info, ::cffi::type size and ::cffi::type count commands may be used to obtain information about a type.

The void typeConcepts, Top, Main, Index

This corresponds to the C void type and is only permitted as the return type of a function. Note that the C void * type is declared as a pointer type.

Numeric typesConcepts, Top, Main, Index

The following numeric types are supported.

scharC signed char
ucharC unsigned char
shortC signed short
ushortC unsigned short
intC signed int
uintC unsigned int
longC signed long
ulongC unsigned long
longlongC signed long long
ulonglongC unsigned long long
floatC float
doubleC double

ArraysConcepts, Top, Main, Index

Arrays are declared as

TYPE[N]

where N is a positive integer indicating the number of elements in an array of values of type TYPE. At the script level, arrays are represented as Tcl lists.

Additionally, within parameter declarations, N may also be the name of a parameter within the same function declarations. In this case, the array is sized dynamically depending on the value of the referenced parameter at the time the call is made.

PointersConcepts, Top, Main, Index

Pointers are declared in one of the following forms:

pointer
pointer.TAG

The first is the equivalent of a void* C pointer. The second form associates the pointer type with a tag.

Pointer tagsConcepts, Top, Main, Index

A pointer tag is used to provide for some measure of type safety. Tags can be associated with pointer values as well as pointer type declarations. The tag attached to a pointer value must match the tag for the struct field it is assigned to or the function parameter it is passed as. Otherwise an error is raised. Tags also providing a typing mechanism for function pointers. This is described in Prototypes and function pointers.

Note however that, although similar, pointer tags are orthogonal to the type system. Any tag may be associated with a pointer type or value, irrespective of the underlying C pointer type.

Tags for pointer types are defined in the corresponding struct or function declarations. Pointer values are associated with the tags of the type through which they are created. For example, the pointer returned by a function declared as

function get_path pointer.PATH {}

will be tagged with PATH. It can then only be assigned to a struct field or passed as a parameter if the corresponding pointer type is also tagged as PATH.

If there is no tag specifed for a pointer field or parameter, it will accept pointer values with any tag analogous to a C void * pointer.

Pointer safetyConcepts, Top, Main, Index

Pointer type checking via tags does not protect against errors related to invalid pointers, double frees etc. To provide some level of protection against these types of errors, pointers returned from functions, either as return values or through output parameters are by default registered in an internal table. These are referred to as safe pointers. Any pointer use is then checked for registration and an error raised if it is not found.

Pointers that have been registered are unregistered when they are passed to a C function as an argument for a parameter that has been annotated with the dispose or disposeonsuccess annotation.

The following fragment illustrates safe pointers. The fragment assumes a wrapper object crtl for the C runtime library has already been created.

% crtl function malloc pointer {sz size_t}
% crtl function free void {ptr {pointer dispose}}
% set p [malloc 10]
0x55dbb8b2ca10^void
% free $p
% free $p
Pointer 0x55dbb8b2ca10^ is not registered.

The pointer returned by malloc is automatically registered. When the free function is invoked, its argument is checked for registration. Moreover, because the free function's ptr parameter has the dispose annotation, it is unregistered before the function is called. The second call to free therefore fails as desired.

The disposeonsuccess annotation is similar to dispose except that if the function return type includes error check annotations, the pointer is unregistered only if the return value passes the checks.

Reference counted pointers

A safe pointer cannot be registered if it is already registered. However, some C API's return the same resource pointer multiple times while internally maintaining a reference count. Examples are dlopen on Linux or LoadLibrary and COM API's on Windows. Such pointers need to be declared with the counted attribute. This works similarly to the default safe pointers except that the same pointer value can be registered multiple times. Correspondingly, the pointer can be accessed until the same number of calls are made to a function that disposes of the pointer. The Linux example below illustrates this.

% cffi::dyncall::Library create crtl
::crtl
% crtl function dlopen {pointer counted} {path string flags int}
% crtl function dlclose int {dlptr {pointer dispose}}
% set dlptrA [dlopen /usr/lib/x86_64-linux-gnu/libc.so.6 1]
0x00007fb07ebb7500^
% set dlptrB [dlopen /usr/lib/x86_64-linux-gnu/libc.so.6 1]
0x00007fb07ebb7500^
% dlclose $dlptrA
0
% dlclose $dlptrB
0
% dlclose $dlptrA
Pointer 0x00007fb07ebb7500^ is not registered.
Unsafe pointers

For those situations where neither safe nor counted pointers are suitable, pointer declarations can be annotated as unsafe. Return values from functions and output parameters with this annotation will not be registered. Input parameters with this designation will not be checked for registration. Needless to say, the unsafe annotation should be used with care.

Memory operationsConcepts, Top, Main, Index

Pointers are ofttimes returned by functions but more often than not the referenced memory has to be allocated and passed in to functions. Some type constructs like strings and structs hide this at the script level but there are times when direct access to the memory content addressed by pointers is desired.

A set of commands grouped as the memory command ensemble provide such functionality. The commands ::cffi::memory allocate and ::cffi::memory free provide memory management facilities. Access to the content is available through ::cffi::memory tobinary and ::cffi::memory frombinary commands which convert to and from Tcl binary strings.

StringsConcepts, Top, Main, Index

Strings in C are generally represented in memory as a sequence of null terminated bytes in some specific encoding. They may be declared either as a char * or as an array of char where the size of the array places a limit on the maximum length.

At the script level, these can be declared in multiple ways:

pointerAs discussed in the previous section, this is a pointer to raw memory. To access the underlying string, the memory referenced by the pointer has to be converted into a Tcl string value with the ::cffi::memory tostring command.
string.ENCODINGValues declared using this type are still pointers at the C level but are converted to and from Tcl strings implicitly at the C API interface itself using the specified encoding. If .ENCODING is left off, the system encoding is used.
unistringThis is similar to string.ENCODING except the values are Tcl_UniChar* at the C level and the encoding is implicitly the one used by Tcl for the Tcl_UniChar data type.
chars.ENCODINGThe value is an array of characters at the C level. The type must always appear as an array, for example, chars.utf-8[10] and not as a scalar chars.utf-8. In this as well, conversion to and from Tcl strings is implicit using the specified encoding, which again defaults to the system encoding. Following standard C rules, arrays are passed by reference as function arguments and thus an declaration of chars[10] would also be passed into a function as a char*. Within a struct definition on the other hand, it would be stored as an array.
unicharsThe value is an array of Tcl_UniChar characters and follows the same rules as chars except that the encoding is always that used by Tcl for the Tcl_UniChar type.

The choice of using pointer, string (and unistring), or chars (and unichars) depends on the C declaration and context as well as convenience.

  • Function parameters of type char* that are purely input are best declared as string or unistring.
  • Function parameters that are actually output buffers in which the called function stores the output string value are best declared as chars[]. Generally these have an associated parameter which indicates the buffer size. In such cases the output parameter can be declared as (for example) chars[nchars] where nchars is the name of the parameter containing the buffer size. The string and unistring types cannot be used for out or inout parameters as there is no associated buffer size.
  • Function return values cannot be declared as chars or unichars as C itself does not support array return values. Generally, functions typed as returning char * need to be declaring as returning pointer as the pointers have to be explcitly managed. Only in the specific cases where the returned pointer is static or does not need to be disposed of for some other reason, the return value can be typed as string or unistring.

Binary stringsConcepts, Top, Main, Index

The types binary or bytes are used to declare a sequence of bytes in memory. The binary type translates to a C unsigned char * type where the memory is treated as a Tcl binary string (byte array). Similarly, the bytes type is analogous to the chars type except it declares a fixed size array of bytes, not characters. These types are converted between Tcl values and C values with the Tcl_GetByteArrayFromObj and Tcl_NewByteArrayFromObj functions.

The binary value can only be used as an input parameter to a function. It is not permitted in any other declaration context.

StructsConcepts, Top, Main, Index

C structs are defined through the ::cffi::Struct class. This encapsulates the layout of the struct and provides various methods for manipulation. A structure layout is a list of alternating field name and type declarations. An example of a definition would be

::cffi::Struct create Point {
    x int
    y int
}

Once defined, structs can be referenced in function prototypes and in other structs as struct.STRUCTNAME, for example struct.Point. Note that the struct name is the name of the object without the any leading :: global namespace prefix.

C struct values are generally represented as Tcl dictionaries with the struct field names as dictionary keys. C function parameter declarations that take pointers to structs can be declared as struct.STRUCTNAME byref, for example struct.Point byref. The byref is required as structs can currently only be passed by reference. The corresponding input argument for the parameter when the function is called should be the dictionary value. Conversely, output parameter results are returned as a dictionary of the same form.

Alternatively, structs can also be manipulated using raw pointers and explicit transforms to native C structures in memory. For example,

% set pPoint [Point allocate]
0x00000211cb924de0^Point
% Point tonative
wrong # args: should be "Point tonative POINTER INITIALIZER ?INDEX?"
% Point tonative $pPoint {x 0 y 1}
% Point fromnative $pPoint
x 0 y 1

See ::cffi::Struct for other methods related to allocation, conversion between Tcl binary strings and other utilities.

Type aliasesConcepts, Top, Main, Index

Type aliases provide a convenient way to bind data types and one or more annotations. They can then be used in type declarations in the same manner as the built-in types.

In addition to avoiding repetition, type aliases facilitate abstraction. For example, many Windows API's have an output parameter that is typed as a fixed size buffer of length MAX_PATH characters. A type alias OUTPUT_PATH defined as

cffi::alias define OUTPUT_PATH {unichar[512] out}

can be used in function and struct field declarations.

Similarly, type aliases can be used to hide platform differences. For example, in the following function prototype,

kernel stdcall HeapCreate pointer.HEAP {opts uint initSize SIZE_T maxSize SIZE_T}

SIZE_T is an alias that resolves to either uint or ulonglong depending on whether the platform is 32- or 64-bit.

Various points to note about type aliases:

  • A type alias must begin with an alphabetic character, an underscore or a colon. Subsequent characters may be one of these or a digit.
  • Type aliases can be nested, i.e. one alias may be defined in terms of another.
  • When a type alias is used in a declaration, additional annotations may be specified. These are merged with those included in the type alias definition.

For convenience, the package provides the ::cffi::alias load command which defines some standard C type aliases like size_t as well as some platform-specific type aliases such as HANDLE on Windows.

Currently defined type aliases can be listed with the ::cffi::alias list command and removed with ::cffi::alias delete.

EnumerationsConcepts, Top, Main, Index

Enumerations allow the use of symbolic constants in place of numeric values passed as arguments to functions. Their primary purpose is similar to preprocessor #define constants and enum types in C. They are defined and otherwise managed through the cffi::enum command ensemble. The fragment below provides an example.

cffi::enum define CMARK_OPTS {DEFAULT 0 NORMALIZE 256 VALIDATE 512 SMART 1024 }
cffiLib function cmark_render_html pointer {
    root pointer.cmark_node
    opts {int {enum CMARK_OPTS}}
}
set htmlptr [cmark_render_html $root NORMALIZE]

When combined with the bitmask annotation, bitmasks can be symbolically represented as a list.

cffiLib function cmark_render_html pointer {
    root pointer.cmark_node
    opts {int bitmask {enum CMARK_OPTS}}
}
set htmlptr [cmark_render_html $root {SMART NORMALIZE}]

FunctionsConcepts, Top, Main, Index

To invoke a function in a DLL or shared library, the library must first be loaded through the creation of a ::cffi::dyncall::Library object. The ::cffi::dyncall::Library.function and ::cffi::dyncall::Library.stdcall methods of the object can then be used to create Tcl commands that wrap individual functions implemented in the library.

Calling conventionsConcepts, Top, Main, Index

The 32-bit Windows platform uses two common calling conventions for functions: the default C calling convention and the stdcall calling convention which is used by most system libraries. These differ in terms of parameter and stack management and it is crucial that the correct convention be used when defining the corresponding FFI.

Other than use of the two separate methods for definition, there is no difference in terms of the function prototype used for definition or the method of invocation.

Note that this difference in calling convention is only applicable to 32-bit Windows. For other platforms, including 64-bit Windows, stdcall behaves in identical fashion to function.

Function wrappersConcepts, Top, Main, Index

The function wrapping methods function and stdcall have the following syntax:

DLLOBJ function FNNAME RETTYPE PARAMS
DLLOBJ stdcall FNNAME RETTYPE PARAMS

where FNNAME is the name of the function (and an optional Tcl alias), RETTYPE is the function return type declaration and PARAMS is a list of alternating parameter names and type declarations. The type declarations may include annotations that control behaviour and conversion between Tcl and C values.

The C function may then be invoked as FNNAME like any other Tcl command.

Return typesConcepts, Top, Main, Index

A function return declaration is a type or type alias followed by zero or more annotations. The resolved type must not be a struct or an array including chars, unichars, binary and bytes. Note pointers to these are permitted.

In the case of string and unistring types, the script level return values are constructed from C char * and Tcl_UniChar * types. Since the underlying pointer is not available, any storage cannot be freed and these types should only be used in cases where that is not needed (for example, when the function returns static strings).

Return annotationsConcepts, Top, Main, Index

The following annotations may be follow the type in a return type declaration.

  • For integer types, the error checking annotations zero, nonzero, nonnegative, positive, the error reporting annotations errno, lasterror, winerror and onerror may be specified. See Error handling for details on these.
  • For pointer type, nonzero, errno and lasterror annotations may be specified as well as the unsafe and counted annotations (but not dispose or disposeonsuccess). See Pointer safety for the meaning of these annotations.
  • For string and unistring types, the nonzero, errno and lasterror annotations may be specified. See Error handling for details on these.

ParametersConcepts, Top, Main, Index

The PARAMS argument in a function prototype is a list of alternating parameter name and parameter type declaration elements. A parameter type declaration may begin with any supported type except void and may be followed a sequence of optional type annotations.

Parameter annotationsConcepts, Top, Main, Index

Annotations that are valid for parameters are those related to pointers, those related to argument passing and those related to storing output values.

Annotations in the first category are unsafe, counted dispose and disposeonsuccess. See Pointer safety for details.

The second set deals with how arguments are passed to the C function. C functions are passed arguments either by value or by reference (i.e. as a pointer to a value). Moreover, parameters may be used to pass values to the function (input parameters), retrieve values (output parameters) or both. Correspondingly, a parameter type may be annotated with in (default), out and inout.

In the case of in parameters, at the time of calling the function the argument must be specified as a Tcl value. These are then passed in to the C function by value if a scalar or by reference if an array or a struct. Scalars can be forced to be passed by reference by annotating the parameter with byref. In the case of string and unistring, in parameters correspond to char * and Tcl_UniChar * respectively, while in byref map to char ** and Tcl_UniChar **.

An in parameter may also be annotated with a default value so that no argument need be provided at the time of the call. The default parameter value is annotated as a list of two elements, the first being the annotation keyword default and the second being the value to use. As for Tcl procs, if a default is specified for a parameter, all subsequent parameters must also have a default specified.

In the case of out or inout parameters, the argument to the function must be specified as the name of a variable in the caller's context. For inout parameters, the variable must exist and contain a valid value for the parameter type. For out parameters, the variable need not exist. In both cases, on return from the function the output value stored in the parameter by the function will be stored in the variable. Parameters annotated with out and inout are always passed by reference for all types and use of byref is redundant. Note that inout cannot be used with string and unistring types while neither out nor inout can be used with binary.

There are some subtleties with respect to error handling that are relevant to output parameters and must be accounted for in declarations. See Errors and output parameters.

A few additional annotations are available for parameters, mainly as a convenience.

The annotation nullifempty is available only for in parameters of type string, unistring and struct. If present, a NULL pointer is passed into the C function if the passed argument is an empty string in the case of string or unistring and an empty dictionary in the case of struct. This facility is useful for API's where NULL pointers signify default options.

The annotation enum is available for integer types. It has an associated argument that specifies an Enum. When the function is called, any enum value names from that enumeration are accepted and the corresponding integer value is used as the argument to the C function.

The bitmask annotation is also limited to integer types. If specified, an argument may be a list of integers. This are all OR-ed together and the result passed to the C function. If the enum annotation was also specified, each element of the list may be an integer or a enum value name.

Error handlingConcepts, Top, Main, Index

C functions generally indicate errors through their return value. Details of the error are either in the return value itself or intended to be retrieved by some other mechanism.

One way to deal with this at the script level is to simply check the return value (generally an integer or pointer) and take appropriate action. This has two downsides. The first is that error conditions in Tcl are almost always signalled by raising an exception rather than through a return status mechanism. The second, perhaps more important, downside is that the detail behind the error, stored in errno or available via GetLastError() on Windows, is often lost by the time the Tcl interpreter returns to the script level.

Error annotationsConcepts, Top, Main, Index

Two additional sets of type annotations are provided to solve these issues. The first set of annotations is used to define the error check conditions to be applied to function return values. The second set is used to specify how the error detail is to be retrieved.

The annotations for error checking are:

zeroThe value must be zero.
nonzeroThe value must be non-zero.
nonnegativeThe value must be zero or greater.
positiveThe value must be greater than 0.

These annotations may only be used with integer types except nonzero which can also be used for pointer, string and unistring types. In the case of the latter two, returned char * values that are NULL pointers are transformed to empty strings by default. The use of the nonzero annotation will force an exception to be generated instead.

At most one of the above annotations can be attached to a return type. The function value is then checked whether the corresponding condition is met. Failure to do so is treated as an error condition.

An error condition results in an exception being generated unless the onerror annotation is specified (see below). However, the default error message generated is generic and does not provide detail about why the error occured. The following error retrieval annotations specify how detail about the error is to be obtained.

errnoThe POSIX error is stored in errno. The error message is generated using the C runtime strerror function.
lasterror(Windows only). The error code and message is retrieved using the Windows GetLastError and FormatMessage functions.
winerror(Windows only). The numeric return value is itself the Windows error code and the error message is generated with FormatMessage. This annotation can only be used with the zero error checking annotation.

Any of these annotations can be applied to integer types while the errno and lasterror can be used with pointer types as well.

In addition, the onerror annotation provides a means for customizing error handling when the error is from a library and not a system error. The annotation takes an additional argument which is a command prefix to be invoked when an error checking annotation is triggered. When this command prefix is invoked, three additional arguments are appended to it:

  • the function return value that failed the error condition check
  • a dictionary mapping all in and inout parameter names to the values passed in to the called function
  • a dictionary mapping all inout and out parameter names to the values returned on output by the function. These only include output parameters marked as storealways or storeonerror.

The result of the handler execution is returned as the function call result and may be a normal result or a raised exception. The handler may use upvar for access to the calling script's context including any input or output arguments to the original function call.

This onerror facility may be used to ignore errors, provide default values as well as raise exceptions with more detailed library-specific information.

NOTE: Note that the use of a onerror handler that returns normally is not the same as not specifying any error checking annotations because the function return is still treated as an error condition in terms of the output variables as described below.

Errors and output parametersConcepts, Top, Main, Index

An important consideration in the presence of errors is how the called function deals with output (including input-output) parameters. There are three possibilities:

  • The function only writes to the output parameter on success
  • The function always writes to the output parameter
  • The function only writes to the output parameter on error, for example an error code.

The distinction is particularly crucial for non-scalar output. Output parameters that have not been written to may result in corruption or crashes if the memory is accessed for conversion to Tcl script level values.

By default, script level output variables are only written to when the error checks pass (including the case where none are specified). This is the first case above. If the storealways annotation is specified for a parameter, it is stored irrespective of whether an error check failed or not. This is the second case. Finally, the storeonerror annotation targets the third case. The output parameter is stored only if an error check fails.

Note that an error checking annotation must be present for any of these to have an effect.

Prototypes and function pointersConcepts, Top, Main, Index

The function wrapping methods function and stdcall described earlier bind a function type definition consisting of the return type and parameters with the address of a function as specified by its name. For some uses, it is useful to be able to independently specify the function type information independent of the function address. The ::cffi::prototype function and ::cffi::prototype stdcall commands are provided for this purpose. They take a very similar form to the corresponding methods:

cffi::prototype function NAME RETTYPE PARAMS
cffi::prototype stdcall NAME RETTYPE PARAMS

where RETTYPE and PARAMS are as described in Function wrappers. The commands result in the creation of a function prototype NAME which can be used as tags for pointers to functions. The ::cffi::call command can then be used to invoke the pointer target.

For example, consider the following C fragment

typedef int ADDER(int, int);
ADDER *function_returning_adder();
ADDER *fnptr = function_returning_adder();
fnptr(1,2);

This would be translated into Tcl as

cffi::prototype function ADDER int {x int y int}
DLLOBJ function function_returning_adder pointer.ADDER {}
set fnptr [function_returning_adder]
cffi::call $fnptr 1 2
Document generated by Ruff!