Programming the SPEs

The eight identical Synergistic Processor Elements (SPEs) are optimized for compute-intensive applications in which a program's data and instruction needs can be anticipated and transferred into the local store (LS) by DMA while the SPE computes using previously transferred data and instructions.

The streaming data sets in 3D graphics, media, and broadband communications are examples of applications that run well on SPEs. However, the SPEs are not optimized for running programs that have significant branching, such as an operating system. Each SPE supports only a single program context at any one time. Typically, the operating system runs on the PPE, and user-mode threads are execute on the SPEs.

The SPEs achieve high performance, in part, by eliminating the overhead of load and store address translation, hardware-managed caches, out-of-order instruction issue, and branch prediction. Instead, the SPEs capitalize on the high computational efficiencies that can be obtained for streaming-data applications by providing a large (128-entry by 128-bit) unified register file, dual-instruction issue, and high DMA bandwidth between the LS and main storage.

Each SPE supports the single-instruction, multiple-data (SIMD) instruction architecture, described in the SPU Instruction Set Architecture . Although details of this instruction set are given in the sections that follow, an SPE is normally programmed in a high-level language like C or C++. The SPU instruction set is supported by a rich set of language extensions for C/C++, described in the C/C++ Language Extensions for Cell Broadband Engine Architecture specification. These extensions define SIMD data types and intrinsics (commands, in the form of function calls) that map to one or more assembly-language instructions, giving programmers very convenient and productive control over code performance without the need for assembly-language programming.

SPE configuration
This section describes the main components of a Synergistic Processor Element (SPE).
SPU instruction set
The SPU Instruction Set Architecture (ISA) fully documents the instructions supported by the SPEs. This section summarizes the ISA.
SPU C/C++ language extensions (intrinsics)
A large set of SPU C/C++ language extensions (intrinsics) make the underlying SPU Instruction Set Architecture and hardware features conveniently available to C programmers. These intrinsics can be used in place of assembly-language code when writing in the C or C++ languages.
MFC commands
The MFC supports a set of MFC commands. These commands provide the main mechanism that enables code executing in an SPU to access main storage and maintain synchronization with other processors and devices in the system.
Coding methods and examples
The sections included here describe some coding methods, with examples in SPU assembly language, C language, SPU C-language intrinsics, and MFC commands, or in a combination thereof.
Porting SIMD code from the PPE to the SPEs
For some, it is easier to write SIMD programs by writing them first for the PPE, and then porting them to the SPEs. This approach postpones some SPE-related considerations of dealing with the local store (LS) size, data movements, and debug until after the port. The approach can also allow partitioning of the work into simpler (perhaps more digestible) steps on the SPEs.
Performance analysis
After a Cell Broadband Engine program executes without errors on the PPE and the SPEs, optimization through parameter-tuning can begin.
General SPE programming tips
This section contains a short summary of general tips for optimizing the performance of SPE programs.