|
Neko 1.99.1
A portable framework for high-order spectral element flow simulations
|
Neko has a device abstraction layer (device.F90) to manage device memory, data transfer and kernel invocations directly from a familiar Fortran interface, targeting all supported accelerator backends.
Allocating device memory can be done via the low-level device::device_alloc function, which takes a C pointer and requested size (in bytes) as arguments (see below). The pointer points to the allocated device memory if the allocation is successful.
To deallocate device memory, call device::device_free as shown below.
It is often helpful to associate a Fortran array with an array allocated on the device. This can be done via the device::device_associate routine, which takes a Fortran array x of type integer, integer(i8), real(kind=sp) or real(kind=dp) (of maximum rank four) and a device pointer x_d as arguments.
Once associated, the device pointer can be retrieved by calling the device::device_get_ptr function, which returns the pointer associated with an array x.
device_get_ptr returns a fatal error unless x has been associated to a device pointer. If in doubt, use the routine device::device_associated to check the status of an array. Since allocation and association is such an ordinary operation, Neko provides a combined routine device::device_map which takes a Fortran array x of type integer, integer(i8), real(kind=sp) or real(kind=dp), its size n (Note number of entries, not size in bytes as for device_allocate) and a C pointer x_d to the (to be allocated) device memory.
To copy data between host and device (and device to use) use the routine device::device_memcpy which takes a Fortran array x of type integer, integer(i8), real(kind=sp) or real(kind=dp), its size n and the direction as the third argument which can either be HOST_TO_DEVICE to copy data to the device, or DEVICE_TO_HOST to retrieve data from the device. The fourth boolean argument, sync, controls whether the transfer is synchronous (.true.) or asynchronous (.false.).
DEVICE_TO_DEVICE. sync must be true if synchronous transfers are needed. To offload work to a device, most routines in Neko have a device version prefixed with device_<name of routine>. These routines have the same arguments as the host equivalent, but one must pass device pointers instead of Fortran arrays.
For example, we call the math::add2 routine to add two arrays together on the host.
To offload the computation to the device, one must obtain the device pointers of x and y, and instead call device_math::device_add2
device_get_ptr call can often be omitted.However, for type bound procedures, such as computing the matrix-vector product derived from ax_t, one should always call the same type bound procedure (in this case compute) as on the host. This is because derived types contain all the logic such that the fastest backend is always selected or is instantiated as a backend-specific type during initialisation (see for example ax_helm_fctry.f90)