Neko 0.9.99
A portable framework for high-order spectral element flow simulations
|
Neko has a device abstraction layer (device.F90) to manage device memory, data transfer and kernel invocations directly from a familiar Fortran interface, targeting all supported accelerator backends.
Allocating device memory can be done via the low-level device::device_alloc function, which takes a C pointer and requested size (in bytes) as arguments (see below). The pointer points to the allocated device memory if the allocation is successful.
To deallocate device memory, call device::device_free as shown below.
It is often helpful to associate a Fortran array with an array allocated on the device. This can be done via the device::device_associate routine, which takes a Fortran array x
of type integer
, integer(i8)
, real(kind=sp)
or real(kind=dp)
(of maximum rank four) and a device pointer x_d
as arguments.
Once associated, the device pointer can be retrieved by calling the device::device_get_ptr function, which returns the pointer associated with an array x
.
device_get_ptr
returns a fatal error unless x
has been associated to a device pointer. If in doubt, use the routine device::device_associated to check the status of an array. Since allocation and association is such an ordinary operation, Neko provides a combined routine device::device_map which takes a Fortran array x
of type integer
, integer(i8)
, real(kind=sp)
or real(kind=dp)
, its size n
(Note number of entries, not size in bytes as for device_allocate
) and a C pointer x_d
to the (to be allocated) device memory.
To copy data between host and device (and device to use) use the routine device::device_memcpy which takes a Fortran array x
of type integer
, integer(i8)
, real(kind=sp)
or real(kind=dp)
, its size n
and the direction as the third argument which can either be HOST_TO_DEVICE
to copy data to the device, or DEVICE_TO_HOST
to retrieve data from the device. The fourth boolean argument, sync
, controls whether the transfer is synchronous (.true.
) or asynchronous (.false.
).
DEVICE_TO_DEVICE
. sync
must be true if synchronous transfers are needed. To offload work to a device, most routines in Neko have a device version prefixed with device_<name of routine>
. These routines have the same arguments as the host equivalent, but one must pass device pointers instead of Fortran arrays.
For example, we call the math::add2
routine to add two arrays together on the host.
To offload the computation to the device, one must obtain the device pointers of x
and y
, and instead call device_math::device_add2
device_get_ptr
call can often be omitted.However, for type bound procedures, such as computing the matrix-vector product derived from ax_t
, one should always call the same type bound procedure (in this case compute
) as on the host. This is because derived types contain all the logic such that the fastest backend is always selected or is instantiated as a backend-specific type during initialisation (see for example ax_helm_fctry.f90)