RANSAC is an algorithm that finds the inliers in a set of data with many gross outliers. For it to work, there must be an underlying model that can be fit to some of the data points. RANSAC repeatedly instantiates this model, using small, random subsets of the data, until a model is found that is consistent with a large subset of the data.

It works as follows:

- Fit the model to a randomly chosen subset of data points (the subset should be of the minimum necessary size to define the model (ex. 2 for a line, 3 for a circle, 4 for a homography)
- Using this tentatively-fit model, count the number data points in the original set that are actually consistent with this model (within some error bounds)
- If the number of data points in agreement is above a certain threshold, use those data points to perform a final model fit. Otherwise, if too few points agree to consider the model good, start over at step 1 with a new random set.

I applied RANSAC to the problem of recovering original camera position (otherwise known as camera pose) from a picture containing some particular colored markers. RANSAC and a method for determining camera pose from 4 coplanar points were both discussed in the original paper.

To determine pose using 4 coplanar points, their locations must be known both in the image plane and in the plane in 3D space (known as the object plane). I used several pictures of the same grid, and drew colored markers in each picture at the same grid locations in each (see original images below). The reason RANSAC is needed is that there are more markers than there are distinct colors.

As can be seen, there are 2 blue markers, and 4 red markers. When a blue marker is identified in the image, it's image coordinates become known, but it's object coordinates could be either 5-down, 0-over, or 6-down, 1-over. In other words, the program sees a blue marker, but doesn't know which one it is. The problem is worse with the red markers. Every time any red marker is found, 4 data points are generated with identical image coordinates, but each has the object coordinates of one possible red marker.

Since there are multiple markers of each color, each marker generates a data point for every marker it could actually be, but only one data point is correct. Each blue generates 2 possible object locations, each red generates 4, etc.

All in all, there are 12 markers in each image, but because colors are duplicated there are 36 possible image-coord / object-coord pairs. That means each pair has a 1/3 probability of being a good one. My program searches for 4 good pairs, ones that give rise to a camera pose that then in turn sees at least 11 points in the right place in the image when they are projected using that pose. It then uses all the points to determine a final camera pose. It works like this:

- Already know: the object locations of colored markers
- Find the markers in the image. For every blue marker image location, add a datum for every possible object location. Repeat for red, yellow, etc.
- RANSAC:

- Choose 4 object-coord / image-coord pairs at random
- Use them to compute a homography between the object plane and the image plane
- Project every point in the dataset onto the image plane using the homography
- Compute the distance between every data point's projected location in the image plane and the location at which it was actually found
- Compute the maximum allowable "error" distance each point can be from its actual location by perturbing the original 4 chosen points and recomputing the perturbed homography, then reprojecting all points using the perturbed homography. The distance each point moved when the homography was perturbed is it's maximum allowable "error".
- If at least 11 points were projected closer to their actual locations than their allowable errors, stop iterating
- If less than 11 points were correctly projected, repeat from step 1 (choose a new set)

- Use all 11 (or more) points to generate a final homography
- Compute camera location and angle using that homography
- Display the results

RANSAC: Was described above.

Computing homographies:

- I implemented an analytic method to compute a homography from four points, using matrix math, as described in the original paper.
- For the final homography from 11 or 12 points, I used openCV's cvFindHomography function.

Computing pose from a homography:

- I also implemented an analytic method for recovering pose from a homography. By projecting the lines at infinity and the origin from the image plane to the object plane and vise versa, the camera position and angle can be found using right triangles. This technique is also described in the original paper.

Finding colored markers: I rely on the hue of each color being fairly predictable.

- Convert the image to HSV (hue/saturation/value)
- Create a mask that contains only pixels whose Saturation and Value are above 50%. This immediately eliminates everything that is not a bright color.
- From the Hue channel, create a mask containing only pixels in the hue range of a particular color to search for.
- OR the hue mask with the bright-colors mask, leaving a mostly black image with white circles where markers of the particular color were.
- Use openCV's cvFindGoodFeatures function to find the locations of each circle.
- Repeat the last 3 steps for each color to search for.

Detected marker locations. The larger colored circles are the markers, which are part of the original image, and their detected locations are shown by the small circles near their edges. | Final set in agreement with all 12 points. The blue circles show the reprojected data point locations (their radii are the error bounds). |

Recovered camera pose. The small blue squares are the markers, reoriented to be level. The large blue square is the object plane. The red square is the image plane of the camera that took the picture, and the red line passing through it is the direction in which the camera is looking. |

Consensus set and result for another image |

This image was perspective-warped with the Gimp before recovering pose. |

This image was also perspective-warped before recovering pose. |

Original Paper:

http://portal.acm.org/citation.cfm?id=358692&dl=

M.A. Fischler and R.C. Bolles. "Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography." Communications of the ACM. June 1981. Vol. 24, No. 6.

Last modified