<< Chapter < Page | Chapter >> Page > |
We wish to approximate the movement of the feature points by an affine transform, because it can account for rotation, zooming, and panning, all of which are common features in videos. The coordinates of a feature in the old frame are written as and in the new frame as . Then an affine transform can be written as:
However, this form needs some modification to deal with multiple point pairs at once, and needs rearranging to find , , , , , and . It can be easily verified that the form below is equivalent to the one just given:
With this form, it is easy to add multiple feature points by stacking two additional rows on the left and on the right. Denoting the pairs of points as , , , etc, the matrices will now look like:
So long as there are more than three points, the system of equations will be overdetermined. Therefore the objective is to find the solution in the least squares sense. This is done using the pseudoinverse of the matrix on the left.
The affine transforms produced above only relate one video frame to the one immediately after it. The problem with this is that if the video is jerky, it will take several consecutive frames to have a good idea of what the average position of the camera is during this time. Then the difference between the current location and the moving-average location can be used to correct the current frame to be in this average position.
When the features are tracked frame-to-frame, it constitutes an implicit differentiation in terms of measuring the overall movement of the camera. In order to track changes across many frames, we sequentially accumulate the frame-to-frame differences. This is akin to an integral operator. Unfortunately, when integrating imperfect data, errors will build up linearly in time, and that is true here. However, since the stream of integrated affine transforms is not used directly, these errors are not as important.
Once the stream of integrated affine transforms is generated, the goal is to undo high-frequency motions, while leaving the low-frequency motions intact. This is done by treating the coefficients of the stream of integrated affine transforms as independent, and applying six high pass filters, one for each stream of coefficients. Although this technique works, it is hoped that a more elegant way of handling the filtering may be developed in the future.
Since a high pass filter is being used, it is important to not have large phase offsets created by the filter. If the transform which ideally stabilized frame #5 was instead applied to frame #10, and so forth, the delay would wholly invalidate the offsets, and the resulting video would be more jerky than before, instead of less. Therefore, we decided to use the zero phase filtering technique of applying a filter in both the forward and reverse time directions sequentially. This is handled by the Matlab function filtfilt.
Initially, we tried various order-4 to order-8 IIR filters with cutoff frequencies around 0.1 pi. However, the unit step response of nearly all IIR filters involves a significant amount of overshoot and ringing. Since our signal is best viewed as a time-domain signal instead of a frequency-domain signal, we sought to avoid this overshoot. Therefore, we switched to a truncated Gaussian FIR filter, which averages across a bit more than one second worth of video at a time. This removed the overshoot and ringing which had been visible with the IIR filters.
In the algorithm we used, the high pass filter is implicitly generated by using a low pass filter, then subtracting the low-pass version from the original. It would be mathematically equivalent to simply change the impulse response of the filter and skip the subtraction step.
The last wrinkle is that for affine transforms, the identity transform has the a and d coefficients equal to one, instead of zero. The high pass filter will create a stream of transforms which are centered around having all the coefficients zero. Therefore, after the high pass filter, we added 1 back to the a and d coefficients of the stream of affine transforms, so they would be centered on the identity transform.
Notification Switch
Would you like to follow the 'Fall 2009 elec301 group project report: video stabilization' conversation and receive update notifications?