<< Chapter < Page | Chapter >> Page > |
Graphics processing units (GPUs) are rapidly gaining popularity as a platform for parallelized computations on massivesets of data. Since much of the computations in image processing and computer vision are easily parallelized, graphicsoperations on GPUs achieve significant speedups compared to those done on their serial, CPU counterparts. Further, SDKslike the NVIDIA CUDA framework provide developers easy APIs to take advantage of the parallel computing power ofGPUs. We take full advantage of the computational benefits of GPUs by implementing edge detection and motion detectionalgorithms in CUDA C, and making use of existing CUDA libraries for our facial recognition algorithm.In this paper, we first detail the theory for our edge detection, motion detection, and facial recognition algorithms in SectionsII, III, and IV, respectively. At the end of Sections II and III, we describe our GPU code implementation of these algorithmswith NVIDIA CUDA. At the end of Section IV, we comment on the performance we achieve with a prebuilt, CUDA-basedOpenCV GPU computation library, as opposed to that we achieve with a custom CUDA implementation as in SectionsII and III. We present speedup results achieved with our CUDA implementation with respect to a reference serial, CPUimplmentation in Section V. Finally, we conclude in Section VI.
Formally, CUDA (Compute Unified Device Architecture) is a parallel computing platform and programming model thatexposes familiar C-based APIs for parallelized computations. CUDA is NVIDIA’s platform for general-purpose computingon graphics processing units (commonly, GPGPU or GP2U), or the use of a GPU for computations traditionally handled bythe CPU. Generally, GPGPU is used to exploit the improved multithreaded performance and raw floating-point computationalability of GPUs over CPUs. For example, on modern hardware, an NVIDIA GeForce GTX 970 (1664 CUDA cores)exhibits peak single-precision floating point performance of nearly 3500 GFLOPS (floating-point operations per second),while an Intel Core i7 4790K (4C, 8T) achieves 100 GFLOPS. Our goal is to demonstrate the performance of GPU computingby solving a handful of existing problems in computer vision with CUDA: namely, edge detection, motion detection, andfacial recognition.
Notification Switch
Would you like to follow the 'Elec 301 projects fall 2015' conversation and receive update notifications?