<< Chapter < Page | Chapter >> Page > |
The below results were obtained by convolving the same image, resized to various image dimensions, with a separatedGaussian filter of kernel size k = 3.
As discussed previously, our CUDA implementation of nonmaximum suppression and selective thresholding is split intotwo parallel tasks that run one after the other: computation of the gradient magnitude and angle, followed by the actualedge selection algorithm. The computations for the speedup numbers below reflect the full procedure of calculating thegradient’s magnitude and angle, selecting edges, and finally copying the results to host memory.
The below results reflect the speedup in the complete procedure of calculating the difference matrix D', determining thethresholded difference matrix D, building a difference density matrix D', and finally generating an image representing theestimated motion area.
The below results reflect the speedup achieved using OpenCV’s object detection framework.
Our real-time motion detection implementation continuously repeats the following procedure in an infinite loop:
Table VII shows basic statistics on the frame rate results achieved on a 640 x 480 continuous video stream, running onthe same hardware as the benchmarks in the previous section. The frame rate was evaluated on a 10-second video streamin a well-lit environment with motion of a moderate number of edge pixels in front of the camera. All procedures (as described inthis paper) were executed in parallel on the GPU.
Our CUDA implementation of both the edge detection and motion detection algorithms demonstrate that parallelized,GPU computation results in significant speedups compared to a serial, CPU implementation. In all benchmarked cases (separableconvolution with a Gaussian filter, edge detection via nonmaximum suppression and selective thresholding, and motionarea estimation from a difference density map), we find that the GPU implementation, running on a mid-range graphics card,demonstrated anywhere between a 1.5x to 4.6x speedup over a relatively high-performance, overclocked CPU. Furthermore,we observe a similar speedup trend when comparing existing OpenCV CPU and GPU CUDA implementations of HAARbasedfacial recognition. In all cases, we find that the speedup asymptotically approaches a general range as the input sizeincreases.
Notification Switch
Would you like to follow the 'Elec 301 projects fall 2015' conversation and receive update notifications?