System Calls

User threads run with a reduced set of privileges than supervisor threads: certain CPU instructions may not be used, and they have access to only a limited part of the memory map. System calls (may) allow user threads to perform operations not directly available to them.

When defining system calls, it is very important to ensure that access to the API’s private data is done exclusively through system call interfaces. Private kernel data should never be made available to user mode threads directly. For example, the k_queue APIs were intentionally not made available as they store bookkeeping information about the queue directly in the queue buffers which are visible from user mode.

APIs that allow the user to register callback functions that run in supervisor mode should never be exposed as system calls. Reserve these for supervisor-mode access only.

This section describes how to declare new system calls and discusses a few implementation details relevant to them.

Components

All system calls have the following components:

  • A C prototype for the API, declared in some header under include/ and prefixed with __syscall. This prototype is never implemented manually, instead it gets created by the scripts/gen_syscalls.py script. What gets generated is an inline function which either calls the implementation function directly (if called from supervisor mode) or goes through privilege elevation and validation steps (if called from user mode).
  • An implementation function, which is the real implementation of the system call. The implementation function may assume that all parameters passed in have been validated if it was invoked from user mode.
  • A handler function, which wraps the implementation function and does validation of all the arguments passed in.

C Prototype

The C prototype represents how the API is invoked from either user or supervisor mode. For example, to initialize a semaphore:

__syscall void k_sem_init(struct k_sem *sem, unsigned int initial_count,
                          unsigned int limit);

The __syscall attribute is very special. To the C compiler, it simply expands to ‘static inline’. However to the post-build parse_syscalls.py script, it indicates that this API is a system call. The parse_syscalls.py script does some parsing of the function prototype, to determine the data types of its return value and arguments, and has some limitations:

  • Array arguments must be passed in as pointers, not arrays. For example, int foo[] or int foo[12] is not allowed, but should instead be expressed as int *foo.
  • Function pointers horribly confuse the limited parser. The workaround is to typedef them first, and then express in the argument list in terms of that typedef.
  • __syscall must be the first thing in the prototype.

The preprocessor is intentionally not used when determining the set of system calls to generate. However, any generated system calls that don’t actually have a handler function defined (because the related feature is not enabled in the kernel configuration) will instead point to a special handler for unimplemented system calls. Data type definitions for APIs should not have conditional visibility to the compiler.

Any header file that declares system calls must include a special generated header at the very bottom of the header file. This header follows the naming convention syscalls/<name of header file>. For example, at the bottom of include/sensor.h:

#include <syscalls/sensor.h>

Invocation Context

Source code that uses system call APIs can be made more efficient if it is known that all the code inside a particular C file runs exclusively in user mode, or exclusively in supervisor mode. The system will look for the definition of macros __ZEPHYR_SUPERVISOR__ or __ZEPHYR_USER__, typically these will be added to the compiler flags in the build system for the related files.

  • If CONFIG_USERSPACE is not enabled, all APIs just directly call the implementation function.
  • Otherwise, the default case is to make a runtime check to see if the processor is currently running in user mode, and either make the system call or directly call the implementation function as appropriate.
  • If __ZEPHYR_SUPERVISOR__ is defined, then it is assumed that all the code runs in supervisor mode and all APIs just directly call the implementation function. If the code was actually running in user mode, there will be a CPU exception as soon as it tries to do something it isn’t allowed to do.
  • If __ZEPHYR_USER__ is defined, then it is assumed that all the code runs in user mode and system calls are unconditionally made.

Implementation Details

Declaring an API with __syscall causes some code to be generated in C and header files by scripts/gen_syscalls.py, all of which can be found in the project out directory under include/generated/:

  • The system call is added to the enumerated type of system call IDs, which is expressed in include/generated/syscall_list.h. It is the name of the API in uppercase, prefixed with K_SYSCALL_.
  • A prototype for the handler function is also created in include/generated/syscall_list.h
  • An entry for the system call is created in the dispatch table _k_sycall_table, expressed in include/generated/syscall_dispatch.c
  • A weak handler function is declared, which is just an alias of the ‘unimplemented system call’ handler. This is necessary since the real handler function may or may not be built depending on the kernel configuration. For example, if a user thread makes a sensor subsystem API call, but the sensor subsystem is not enabled, the weak handler will be invoked instead.

The body of the API is created in the generated system header. Using the example of k_sem_init(), this API is declared in include/kernel.h. At the bottom of include/kernel.h is:

#include <syscalls/kernel.h>

Inside this header is the body of k_sem_init():

K_SYSCALL_DECLARE3_VOID(K_SYSCALL_K_SEM_INIT, k_sem_init, struct k_sem *,
                        sem, unsigned int, initial_count,
                        unsigned int, limit);

This generates an inline function that takes three arguments with void return value. Depending on context it will either directly call the implementation function or go through a system call elevation. A prototype for the implementation function is also automatically generated. In this example, the implementation of the K_SYSCALL_DECLARE3_VOID() macro will be:

#if !defined(CONFIG_USERSPACE) || defined(__ZEPHYR_SUPERVISOR__)

#define K_SYSCALL_DECLARE3_VOID(id, name, t0, p0, t1, p1, t2, p2) \
        extern void _impl_##name(t0 p0, t1 p1, t2 p2); \
        static inline void name(t0 p0, t1 p1, t2 p2) \
        { \
                _impl_##name(p0, p1, p2); \
        }

#elif defined(__ZEPHYR_USER__)
#define K_SYSCALL_DECLARE3_VOID(id, name, t0, p0, t1, p1, t2, p2) \
        static inline void name(t0 p0, t1 p1, t2 p2) \
        { \
                _arch_syscall_invoke3((u32_t)p0, (u32_t)p1, (u32_t)p2, id); \
        }

#else /* mixed kernel/user macros */
#define K_SYSCALL_DECLARE3_VOID(id, name, t0, p0, t1, p1, t2, p2) \
        extern void _impl_##name(t0 p0, t1 p1, t2 p2); \
        static inline void name(t0 p0, t1 p1, t2 p2) \
        { \
                if (_is_user_context()) { \
                        _arch_syscall_invoke3((u32_t)p0, (u32_t)p1, (u32_t)p2, id); \
                } else { \
                        compiler_barrier(); \
                        _impl_##name(p0, p1, p2); \
                } \
        }
#endif

The header containing K_SYSCALL_DECLARE3_VOID() is itself generated due to its repetitive nature and can be found in include/generated/syscall_macros.h. It is created by scripts/gen_syscall_header.py.

The final layer is the invocation of the system call itself. All architectures implementing system calls must implement the seven inline functions _arch_syscall_invoke0() through _arch_syscall_invoke6(). These functions marshal arguments into designated CPU registers and perform the necessary privilege elevation. In this layer, all arguments are treated as an unsigned 32-bit type. There is always a 32-bit unsigned return value, which may or may not be used.

Some system calls may have more than six arguments. The number of arguments passed via registers is fixed at six for all architectures. Additional arguments will need to be passed in a struct, which needs to be treated as untrusted memory in the handler function. This is done by the derived functions _syscall_invoke7() through _syscall_invoke10().

Some system calls may return a value that will not fit in a 32-bit register, such as APIs that return a 64-bit value. In this scenario, the return value is populated in a memory buffer that is passed in as an argument. For example, see the implementation of _syscall_ret64_invoke0() and _syscall_ret64_invoke1().

Implementation Function

The implementation function is what actually does the work for the API. Zephyr normally does little to no error checking of arguments, or does this kind of checking with assertions. When writing the implementation function, validation of any parameters is optional and should be done with assertions.

All implementation functions must follow the naming convention, which is the name of the API prefixed with _impl_. Implementation functions may be declared in the same header as the API as a static inline function or declared in some C file. There is no prototype needed for implementation functions, these are automatically generated.

Handler Function

The handler function runs on the kernel side when a user thread makes a system call. When the user thread makes a software interrupt to elevate to supervisor mode, the common system call entry point uses the system call ID provided by the user to look up the appropriate handler function for that system call and jump into it.

Handler functions only run when system call APIs are invoked from user mode. If an API is invoked from supervisor mode, the implementation is simply called.

The purpose of the handler function is to validate all the arguments passed in. This includes:

  • Any kernel object pointers provided. For example, the semaphore APIs must ensure that the semaphore object passed in is a valid semaphore and that the calling thread has permission on it.
  • Any memory buffers passed in from user mode. Checks must be made that the calling thread has read or write permissions on the provided buffer.
  • Any other arguments that have a limited range of valid values.

Handler functions involve a great deal of boilerplate code which has been made simpler by some macros in kernel/include/syscall_handlers.h. Handler functions should be declared using these macros.

Argument Validation

Several macros exist to validate arguments:

  • Z_SYSCALL_OBJ() Checks a memory address to assert that it is a valid kernel object of the expected type, that the calling thread has permissions on it, and that the object is initialized.
  • Z_SYSCALL_OBJ_INIT() is the same as Z_SYSCALL_OBJ(), except that the provided object may be uninitialized. This is useful for handlers of object init functions.
  • Z_SYSCALL_OBJ_NEVER_INIT() is the same as Z_SYSCALL_OBJ(), except that the provided object must be uninitialized. This is not used very often, currently only for k_thread_create().
  • Z_SYSCALL_MEMORY_READ() validates a memory buffer of a particular size. The calling thread must have read permissions on the entire buffer.
  • Z_SYSCALL_MEMORY_WRITE() is the same as Z_SYSCALL_MEMORY_READ() but the calling thread must additionally have write permissions.
  • Z_SYSCALL_MEMORY_ARRAY_READ() validates an array whose total size is expressed as separate arguments for the number of elements and the element size. This macro correctly accounts for multiplication overflow when computing the total size. The calling thread must have read permissions on the total size.
  • Z_SYSCALL_MEMORY_ARRAY_WRITE() is the same as Z_SYSCALL_MEMORY_ARRAY_READ() but the calling thread must additionally have write permissions.
  • Z_SYSCALL_VERIFY_MSG() does a runtime check of some boolean expression which must evaluate to true otherwise the check will fail. A variant Z_SYSCALL_VERIFY exists which does not take a message parameter, instead printing the expression tested if it fails. The latter should only be used for the most obvious of tests.
  • Z_SYSCALL_DRIVER_OP() checks at runtime if a driver instance is capable of performing a particular operation. While this macro can be used by itself, it’s mostly a building block for macros that are automatically generated for every driver subsystem. For instance, to validate the GPIO driver, one could use the Z_SYSCALL_DRIVER_GPIO() macro.

If any check fails, the macros will return a nonzero value. The macro Z_OOPS() can be used to induce a kernel oops which will kill the calling thread. This is done instead of returning some error condition to keep the APIs the same when calling from supervisor mode.

Handler Declaration

All handler functions have the same prototype:

u32_t _handler_<API name>(u32_t arg1, u32_t arg2, u32_t arg3,
                          u32_t arg4, u32_t arg5, u32_t arg6, void *ssf)

All handlers return a value. Handlers are passed exactly six arguments, which were sent from user mode to the kernel via registers in the architecture-specific system call implementation, plus an opaque context pointer which indicates the system state when the system call was invoked from user code.

To simplify the prototype, the variadic Z_SYSCALL_HANDLER() macro should be used to declare the handler name and names of each argument. Type information is not necessary since all arguments and the return value are u32_t. Using k_sem_init() as an example:

Z_SYSCALL_HANDLER(k_sem_init, sem, initial_count, limit)
{
    ...
}

After validating all the arguments, the handler function needs to then call the implementation function. If the implementation function returns a value, this needs to be returned by the handler, otherwise the handler should return 0.

Note

Do not forget that all the arguments to the handler are passed in as unsigned 32-bit values. If checks are needed on parameters that are actually signed values, casts may be needed in order for these checks to be performed properly.

Using k_sem_init() as an example again, we need to enforce that the semaphore object passed in is a valid semaphore object (but not necessarily initialized), and that the limit parameter is nonzero:

Z_SYSCALL_HANDLER(k_sem_init, sem, initial_count, limit)
{
    Z_OOPS(Z_SYSCALL_OBJ_INIT(sem, K_OBJ_SEM));
    Z_OOPS(Z_SYSCALL_VERIFY(limit != 0));
    _impl_k_sem_init((struct k_sem *)sem, initial_count, limit);
    return 0;
}

Simple Handler Declarations

Many kernel or driver APIs have very simple handler functions, where they either accept no arguments, or take one object which is a kernel object pointer of some specific type. Some special macros have been defined for these simple cases, with variants depending on whether the API has a return value:

  • Z_SYSCALL_HANDLER1_SIMPLE() one kernel object argument, returns a value
  • Z_SYSCALL_HANDLER1_SIMPLE_VOID() one kernel object argument, no return value
  • Z_SYSCALL_HANDLER0_SIMPLE() no arguments, returns a value
  • Z_SYSCALL_HANDLER0_SIMPLE_VOID() no arguments, no return value

For example, k_sem_count_get() takes a semaphore object as its only argument and returns a value, so its handler can be completely expressed as:

Z_SYSCALL_HANDLER1_SIMPLE(k_sem_count_get, K_OBJ_SEM, struct k_sem *);

System Calls With 6 Or More Arguments

System calls may have more than six arguments, however the number of arguments passed in via registers when the privilege elevation is invoked is fixed at six for all architectures. In this case, the sixth and subsequent arguments to the system call are placed into a struct, and a pointer to that struct is passed to the handler as its sixth argument.

See include/syscall.h to see how this is done; the struct passed in must be validated like any other memory buffer. For example, for a system call with nine arguments, arguments 6 through 9 will be passed in via struct, which must be verified since memory pointers from user mode can be incorrect or malicious:

Z_SYSCALL_HANDLER(k_foo, arg1, arg2, arg3, arg4, arg5, more_args_ptr)
{
    struct _syscall_9_args *margs = (struct _syscall_9_args *)more_args_ptr;

    Z_OOPS(Z_SYSCALL_MEMORY_READ(margs, sizeof(*margs)));

    ...

 }

It is also very important to note that arguments passed in this way can change at any time due to concurrent access to the argument struct. If any parameters are subject to enforcement checks, they need to be copied out of the struct and only then checked. One way to ensure this isn’t optimized out is to declare the argument struct as volatile, and copy values out of it into local variables before checking. Using the previous example:

Z_SYSCALL_HANDLER(k_foo, arg1, arg2, arg3, arg4, arg5, more_args_ptr)
{
    volatile struct _syscall_9_args *margs =
                    (struct _syscall_9_args *)more_args_ptr;
    int arg8;

    Z_OOPS(Z_SYSCALL_MEMORY_READ(margs, sizeof(*margs)));
    arg8 = margs->arg8;
    Z_OOPS(Z_SYSCALL_VERIFY_MSG(arg8 < 12, "arg8 must be less than 12"));

    _impl_k_foo(arg1, arg2, arg3, arg3, arg4, arg5, margs->arg6,
                margs->arg7, arg8, margs->arg9);
    return 0;
 }

System Calls With 64-bit Return Value

If a system call has a return value larger than 32-bits, the handler will not return anything. Instead, a pointer to a sufficient memory region for the return value will be passed in as an additional argument. As an example, we have the system call for getting the current system uptime:

__syscall s64_t k_uptime_get(void);

The handler function has the return area passed in as a pointer, which must be validated as writable by the calling thread:

Z_SYSCALL_HANDLER(k_uptime_get, ret_p)
{
    s64_t *ret = (s64_t *)ret_p;

    Z_OOPS(Z_SYSCALL_MEMORY_WRITE(ret, sizeof(*ret)));
    *ret = _impl_k_uptime_get();
    return 0;
}

Configuration Options

Related configuration options:

APIs

Helper macros for creating system call handlers are provided in kernel/include/syscall_handler.h:

  • Z_SYSCALL_HANDLER()
  • Z_SYSCALL_HANDLER1_SIMPLE()
  • Z_SYSCALL_HANDLER1_SIMPLE_VOID()
  • Z_SYSCALL_HANDLER0_SIMPLE()
  • Z_SYSCALL_HANDLER0_SIMPLE_VOID()
  • Z_SYSCALL_OBJ()
  • Z_SYSCALL_OBJ_INIT()
  • Z_SYSCALL_OBJ_NEVER_INIT()
  • Z_OOPS()
  • Z_SYSCALL_MEMORY_READ()
  • Z_SYSCALL_MEMORY_WRITE()
  • Z_SYSCALL_MEMORY_ARRAY_READ()
  • Z_SYSCALL_MEMORY_ARRAY_WRITE()
  • Z_SYSCALL_VERIFY_MSG()
  • Z_SYSCALL_VERIFY

Functions for invoking system calls are defined in include/syscall.h:

  • _arch_syscall_invoke0()
  • _arch_syscall_invoke1()
  • _arch_syscall_invoke2()
  • _arch_syscall_invoke3()
  • _arch_syscall_invoke4()
  • _arch_syscall_invoke5()
  • _arch_syscall_invoke6()
  • _syscall_invoke7()
  • _syscall_invoke8()
  • _syscall_invoke9()
  • _syscall_invoke10()
  • _syscall_ret64_invoke0()
  • _syscall_ret64_invoke1()