Data transfer list

Data transfer lists describe a work block's input and output data. You can choose to generate the data transfer lists for each task's work blocks' input and output data on either the host or the accelerator.

For many applications, the input data for a single compute kernel cannot be stored contiguously in the host memory. For example, in the case of a multi-dimensional matrix, the matrix is usually partitioned into smaller sub-matrices for the accelerators to process. For many data partitioning schemes, the data of the sub-matrices is scattered to different host memory locations. Accelerator memory is usually limited, and the most efficient way to store the submatrix is contiguously. Data for each row or column of the submatrix is put together in a contiguous buffer. For input data, they are gathered to the local memory of the accelerator from scattered host memory locations. With output data, the above situation is reversed, and the data in the local memory of the accelerator is scattered to different locations in host memory.

The ALF API uses data transfer list to represent the scattered input and output data in the host memory. A data transfer list contains entries that consist of the data size and a pointer to the host memory location of the data. The data in the local memory of the accelerator is always packed and is organized in the order of the entries in the list. For input data, the data transfer list describes a data gathering operation. For output data, the data transfer list describes a scattering operation. See Figure 1 for a diagram of a data transfer list.

Figure 1. Data transfer listThis figure shows how a data transfer list transfers data from accelerator memory to host memory and vice versa

To maximize accelerator performance, ALF employs a static memory allocation model per task execution on the accelerator. This means programmers need to explicitly specify the maximum number of entries a data transfer list in a task can have. This can be set through the alf_task_desc_set_int32 function with the ALF_TASK_DESC_NUM_DTL_ENTRIES function.

For information about data transfer list limitations for Cell BE implementations, see Data transfer list limitations.