Accelerator data partitioning

When the data partition schemes are complex and require a lot of computing resources, it can be more efficient to generate the data transfer lists on the accelerators. This is especially useful if the host computing resources can be used for other work or if the host does not have enough computing resources to compute data transfer lists for all of its work blocks.

Accelerator data partition APIs

Accelerated library developers must provide the alf_accel_input_dtl_prepare subroutine and the af_accel_output_dtl_prepare subroutine to do the data partition for input and output and generate the corresponding data transfer list. The alf_accel_input_dtl_prepare is the input data partitioning subroutine and the alf_accel_output_dtl_prepare is the output data subroutine.

Host memory addresses

The host does not generate the data transfer lists when using accelerator data partitioning, so the host addresses of input and output data buffers can be explicitly passed to the accelerator through the work block parameter and context buffer.

For an example, see Matrix add - accelerator data partitioning example