Application partitioning

Programs running on the Cell Broadband Engine’s nine processor elements typically partition the work among the available processor elements.

In determining when and how to distribute the workload and data, take into account the following considerations:

processing-load distribution,
program structure,
program data flow and data access patterns,
cost, in time and complexity of code movement and data movement among processors, and
cost of loading the bus and bus attachments.

The main model for partitioning an application is PPE-centric, as shown in Figure 1.

Figure 1. Application partitioning model

In the PPE-centric model, the main application runs on the PPE, and individual tasks are off-loaded to the SPEs. The PPE then waits for, and coordinates, the results returning from the SPEs. This model fits an application with serial data and parallel computation.

In the SPE-centric model, most of the application code is distributed among the SPEs. The PPE acts as a centralized resource manager for the SPEs. Each SPE fetches its next work item from main storage (or its own local store) when it completes its current work.

There are three ways in which the SPEs can be used in the PPE-centric model:

the multistage pipeline model,
the parallel stages model, and
the services model.

The first two of these are shown in Figure 2.

If a task requires sequential stages, the SPEs can act as a multistage pipeline. The left side of Figure 2 shows a multistage pipeline. Here, the stream of data is sent into the first SPE, which performs the first stage of the processing. The first SPE then passes the data to the next SPE for the next stage of processing. After the last SPE has done the final stage of processing on its data, that data is returned to the PPE. As with any pipeline architecture, parallel processing occurs, with various portions of data in different stages of being processed.

Multistage pipelining is typically avoided because of the difficulty of load balancing. In addition, the multistage model increases the data-movement requirement because data must be moved for each stage of the pipeline.

Figure 2. PPE-centric multistage pipeline model and parallel stages model

PPE-centric multistage pipeline and parallel stages models

If the task to be performed is not a multistage task, but a task in which there is a large amount of data that can be partitioned and acted on at the same time, then it typically make sense to use SPEs to process different portions of that data in parallel. This parallel stages model is shown on the right side of Figure 2.

The third way in which SPEs can be used in a PPE-centric model is the services model. In the services model, the PPE assigns different services to different SPEs, and the PPE’s main process calls upon the appropriate SPE when a particular service is needed.

Figure 3 shows the PPE-centric services model. Here, one SPE processes data encryption, another SPE processes MPEG encoding, and a third SPE processes curve analysis. Fixed static allocation of SPU services should be avoided. These services should be virtualized and managed on a demand-initiated basis.

Figure 3. PPE-centric services model

For a more detailed view of programming models, see Programming models.