Software tools like debuggers and profilers have proved their worth in helping write sequential programs. In parallel computing, tools are at least as important. Unfortunately, parallel software tools are newer and therefore less polished than their sequential versions. They also must solve some additional problems, beyond simply dealing with the new issues noted above.
- Parallel tools must handle multiple processors. For example, where a sequential debugger sets a checkpoint, a parallel debugger may need to set that checkpoint on many processors.
- Parallel tools must handle large data. For example, where a sequential trace facility may produce megabytes of data, the parallel trace may produce megabytes for each of hundreds of processors, adding up to gigabytes.
- Parallel tools must be scalable. Just as some bugs do not appear in sequential programs until they are run with massive data sets, some problems in parallel programs do not appear until they execute on thousands of processors.
- Parallel tools must avoid information overload. For example, where a sequential debugger may only need to display a single line number, its parallel counterpart may need to show line numbers on dozens of processors.
- Parallel tools must deal with timing and uncertainty. While it is rare for a sequential program’s behavior to depend on time, this is the common case for parallel programs.
Current parallel tools - for example
Cray Apprentice2 ,
HPCToolkit ,
TAU ,
TotalView , and
VTune - have solved some of these problems. For example, data visualization has been successful in avoiding information overload and understanding large data sets. However, other issues remain difficult research problems.
The Open Education Cup welcomes entries related to the theory and practice of parallel software tools. This includes descriptions of existing debuggers, profilers, and other tools; analysis and visualization techniques for parallel programs and data; experiences with parallel tools; and any other topic of interest to tool designers or users.
Accelerated computing
Accelerated computing is a form of
parallel computing that is rapidly gaining popularity. For many years, some general-purpose computer systems have included accelerator chips to speed up specialized tasks. For example, early microprocessors did not implement floating-point operations directly, so machines in the technical market often included floating-point accelerator chips. Today, microprocessors are much more capable, but some accelerators are still useful. The list below suggests a few of them.
- Graphics Processing Units (GPUs) provide primitives commonly used in graphics rendering. Among the operations sped up by these chips are texture mapping, geometric transformations, and shading. The key to accelerating all of these operations is parallel computing, often realized by computing all pixels of a display or all objects in a list independently.
- Field Programmable Gate Arrays (FPGAs) provide many logic blocks linked by an interconnection fabric. The interconnections can be reconfigured dynamically, thus allowing the hardware datapaths to be optimized for a given application or algorithm. When the logic blocks are full processors, the FPGA can be used as a parallel computer.
- Application Specific Integrated Circuits (ASICs) are chip designs optimized for a special use. ASIC designs can now incorporate several full processors, memory, and other large components. This allows a single ASIC to be a parallel system for running a particular algorithm (or family of related algorithms).
- Digital Signal Processor (DSP) chips are microprocessors specifically designed for signal processing applications such as sensor systems. Many of these applications are very sensitive to latency, so performace of computation and data transfer is heavily optimized. DSPs are able to do this by incorporating
SIMD parallelism at the instruction level and pipelining of arithmetic units.
- Cell Broadband Engine Architecture (CBEA, or simply Cell) is a relatively new architecture containing a general-purpose computer and several streamlined coprocessors on a single chip. By exploiting the
MIMD parallelism of the coprocessors and overlapping memory operations with computations, the Cell can achieve impressive performance on many codes. This makes it an attractive adjunct to even very capable systems.
As the list above shows, accelerators are now themselves parallel systems. They can also be seen as a new level in hierarchical machines, where they operate in parallel with the host processors. A few examples illustrate the possibilities.
- General Purpose computing on GPUs (usually abbreviated as GPGPU) harnesses the computational power of GPUs to perform non-graphics calculations. Because many GPU operations are based on matrix and vector operations, this is a particularly good match with linear algebra-based algorithms.
- Reconfigurable Computing uses FPGAs adapt high-speed hardware operations for the needs of an application. As the application goes through phases, the FPGA can be reconfigured to accelerate each phase in turn.
- The Roadrunner supercomputer gets most of its record-setting performance from Cell processors used as accelerators on its processing boards.
The Open Education Cup welcomes entries describing any aspect of accelerated computing in parallel systems. We are particularly interested in descriptions of systems with parallel accelerator components, experience with programming and running these systems, and software designs that automatically exploit parallel accelerators.