DUE: 11:45am, Friday November 12
Worth: 10% of your final grade
The course policies page outlines academic honesty as applied to this course, including policies for working with other students on assignments. It also gives the rules for on-time homework submission.
For this assignment you have the choice to do do either or both problems. Grading and turn-in details are given below.
cvFindExtrinsicCameraParams2()
to track the rigid transform that represents the pose of a chessboard calibration object relative to a camera. You will also use the theory of the pinhole camera model to draw simple 3D graphics which depict the chessboard and camera from an arbitrary external viewpoint.cvdemo
and cvundistort
from HW2 do.cvundistort
example program; otherwise, use that as a reference to develop your own equivalent. Use these undistorted images for the remainder of the problem.cvFindChessboardCorners()
and cvFindExtrinsicCameraParams2()
to calculate a cvFindChessboardCorners()
returns nonzero, which indicates there is a high likelihood that the whole chessboard was detected correctly in the image.For frames where you calculate , draw a live reconstruction of the chessboard in magenta (
) by calling
cvLine()
to draw lines in a grid covering the interior chessboard corners. (Draw nothing for frames where you did not calculate
.)
There are a few ways to interpret this. In particular, you could just use the pixel results of cvFindChessboardCorners()
to get the image pixel locations of the endpoints of the lines, or you could calculate the 3D coordinates of those points in frame , transform to
, and then backproject into the image using the camera matrix. Because of what we are going to do next, please do the latter (*). Specifically, for each line (i) calculate the 3D endpoints
in
, (ii) transform from
to
by calculating
and similar for
, (iii) backproject into the image by calculating
and similar for
, where
is the
camera intrinsic matrix
and then perform the perspective division to get the pixel
where
appears, and similar for
.
(*) The extent to which these two different approaches should give the same result depends precisely on how well the camera calibration models your actual camera, and on how accurate cvFindChessboardCorners()
was.
On top of the drawing you made above, draw three colored line segments, all in 3D in the frame : one from
to
in red, another from
to
in green, and a third from
to
in blue. These represent the axis triad of
itself. Again, do this only for frames where you calculated
.
Now introduce one more rigid transform
into the projection chain. We will use this as a navigation transform to render a representation of both the chessboard and the camera from a simulated external viewpoint. This viewpoint will be modeled as if we had a second camera, with the same calibrated pinhole model as your actual camera, looking at the whole scene (chessboard and camera) from “outside”. Let
be the camera frame of this simulated camera. Then
is defined as the transform that takes points from
to
, so the composite transform
takes points from
to
.
To make things interesting, you will handle keypresses that can change , effectively allowing the simulated camera to “fly around”. We know that since
is a 3D rigid body transform (i.e. a transform in
) that it has six DoF, and that we can divide these up as three translations and three rotations. We will take the strategy of using six keys to control positive/negative translations along all three axes of
, and four more keys to control positive/negative rotations about the
and
axes of
(we don’t allow the virtual camera to rotate around the
axis of
, this is a common design in part because this kind of “rolling” camera motion can be visually confusing).
Set to the identity transform initially. Implement keyboard handlers (i.e. using
cvWaitKey()
, or if you are using one of the CvBase
implementations from HW2, just override handleKeyExt()
) for the i
and k
keys that pitch the virtual camera by a rotation of or
about the
axis of
, respectively, and similarly make
j,l
(that’s the letter j and the letter l keys) yaw by rotations about the
axis of
. Make
e,d
shift the virtual camera along the
axis of
,
s,f
shift by along the
axis of
, and
a,z
shift by along the
axis of
. Finally, make
r
reset to the identity transform. See below for an explanation of why we use these keys.
Add drawing code for (i) an RGB axes triad for frame (this is in addition to the one you already implemented for frame
), and (ii) a very simple representation of the actual camera as a unit-altitude rectangular pyramid in yellow (
), also in
, representing the horizontal and vertical FoV. I.e. the apex of this pyramid is
in
and the four base points are given by the four combinations
where are the dimensions of your images in pixels (e.g.
).
Whenever is not the identity matrix (i.e. whenever the virtual viewpoint does not correspond to the actual camera view), hide the actual captured images and make all your drawings on a white background. Conversely, whenever
is identity, show the captured images instead of a white background, but hide your axis triad and pyramid in frame
. As before, for all frames where you computed the transform
from
to
(i.e. whenever the chessboard was successfully detected, irrespective of
), draw the axis triad in frame
and the reconstructed chessboard grid.
Be sure that you can still change while capture is paused.
cvCalcOpticalFlowPyrLK()
to track feature points, and you will use cvKalmanPredict()
and cvKalmanCorrect()
to incorporate a simple model of point motion.Starting with a basic program that displays images from your camera (aim for , 10FPS as usual), such as
cvdemo
from HW2, add code that (i) adds and removes feature points where the user clicks the mouse on the image, displaying them as green dots, (ii) uses cvCalcOpticalFlowPyrLK()
to track the points from frame to frame, and (iii) removes existing feature points either when they are clicked (at their current position, wherever it may be) or when their tracking fails or degrades below a threshold that you determine is reasonable.
Clearly, the provided lkdemo.c
will be useful. Note that we will not need the ability to automatically initialize the feature points (cvGoodFeaturesToTrack()
) or the “night mode” of lkdemo
.
cvdemo
from HW2 do. Make sure that adding and removing track points by clicking with the mouse works even when capture is paused.Define a measurement covariance matrix
cvCalcOpticalFlowPyrLK()
. Though it is possible to estimate cvCalcOpticalFlowPyrLK()
; they are measured in pixels. Initialize e,d,s,f
keys increment/decrement printf()
or cout
or System.out.println()
) whenever either is changed.cvCalcOpticalFlowPyrLK()
, which we will now call the measurements. To do this you will need to define a confidence limit a,z
keys increment/decrement Starting with the third frame received after a feature is added, it is possible to predict (or at least to make a reasonable guess about) the new location of the feature given its two previous locations
and
(here we are using
to index the sequence of video frames). There are various ways to do this depending on the assumptions we make about object motion. We will simply model the point to move with locally (locally in the temporal domain, that is) constant velocity in the image. We will also assume that the amount of time that passes between adjacent captured frames is constant. Thus, we initialize an estimate of the velocity of the feature from its first two observed locations as
and we initialize an estimate of its position as
. The state of the feature is now the 4D vector
). (Note that we only explicitly calculate the state by these equations to initialize it; later state updates will be completely determined by the Kalman filter.)
Use cvKalmanPredict()
to calculate for every feature point that has been defined for at least 3 frames (because our model does not include any cross-correlation between features, you may use a separate
CvKalman
structure for each feature). Model the uncertainty in this prediction with a diagonal covariance matrix of the form
Initialize (10 times larger than the initial setting of
and
because we are significantly less certain of our simple model than of the measurements from
cvCalcOpticalFlowPyrLK()
), and make the i,k,j,l
keys increment/decrement (j-,l+) and
(i+,k-) by 0.1. (See below for an explanation of why we use these keys.) Clamp the minimum value of each to 0.1 and display the new settings for both on the console whenever either is changed.
For every feature that has been present for at least 3 frames, use cvKalmanCorrect()
to combine the predicted state and the observation
(the output of
cvCalcOpticalFlowPyrLK()
for that feature) into an estimated state and
estimate covariance matrix
. Display the estimated location
and its error ellipse (given by the upper left
submatrix of
and the current value of
) in blue, and use the predicted
values to set the input
prevFeatures
for the next call to cvCalcOpticalFlowPyrLK()
.
Both problems require you to implement keypress handlers that increment/decrement quantities related to coordinate frame axes. For such purposes it might seem natural to use the arrow keys common on many keyboards. Unfortunately, the cvWaitKey()
API that OpenCV provides to check for keyboard hits is fairly limited. In particular, it returns different values for the arrow keys on different platforms (OSX, Windows, Linux). Also, it is hard to detect modifier keys (shift, alt, ctrl, option, …) because they do not change the keycode emitted by other keys. (And OpenCV reports keypresses, including of modifier keys, but not releases, so you can’t even implement your own state tracking of the modifier keys.)
For these reasons, we chose arrangements of plain alphabetic keys which, in the standard QWERTY keyboard layout, approximate the up, down, left, right arrangement of the arrow keys. Specifically the groups e, d, s, f and i, k, j, l are used as two separate stand-ins for up, down, left, right. These groups also have the benefit that if you are a touch-typist you will typically have your index fingers on the f and j keys in “home” position (these are the keys with little nubs on them). There is some precedent for this design. Further, we can even extend our key groups into 3D: a, z and :, / (not letter keys, but still quite standard across platforms) can function as “in, out” keys for the right and left hand, respectively.
If you find these keybindings uncomfortable you may define other keyboard and mouse bindings in addition (your additional bindings may even be platform specific). E.g. you can implement a different mode that can be switched to by hitting some other key, or you can just use other non-overlapping keys, or even the mouse. But please make sure that by default the above keys have the intended effects so that we can easily run everyone’s code.
The connection between Gaussian distributions and ellipsoids is quite beautiful and draws together several topics we’ve studied in the course. We will develop a procedure now specifically to draw a 2D ellipse using cvEllipse()
given three ingredients:
Informally, gives the center of the ellipse,
gives its shape (how much it is squished in each of two orthogonal directions relative to a circle, which is a special case of an ellipse which is not squished at all) and also its orientation (how much it’s rotated compared to an ellipse in standard position, which is squished in exactly the
and
coordinate axes), and
gives the scale of the ellipse: larger
give larger ellipses.
will give an ellipse that has degenerated to a single point; we do not allow
to reach
, but if we did, that would effectively correspond to an ellipse of infinite size.
All of this extends in a mostly straightforward way to ellipsoids in arbitrary dimension (i.e. not just ). The details are given below
We will first review the derivation and meaning of a particular form for the implicit equation for an ellipse. Then, we’ll show how to use elements of that equation to actually call cvEllipse()
to draw an ellipse. Finally, we’ll show how to build such an equation from ,
, and
.
Start with the equation for a circle centered at the origin with radius :
The interpretation is that is on the circle iff the equation holds. Now we can squish (or stretch) the circle independently in the directions of the
and
coordinate axes by first applying a nonuniform scale transform
to every instance of (the reason for using reciprocal scales will be given shortly), which gives us
( since
is diagonal). If we define
then this expands out to
and from there it’s easy to calculate the horizontal and vertical radii and
of the ellipse: for the horizontal case, plug in
and similarly for the vertical case we get . The scale factors, as we have defined them, are thus the “stretchings” of the radius of the original circle to make the ellipse radii. What happens if one of the scale factors, say
, is zero? Then the corresponding radius is zero, but
will have an infinte entry. What happens if one of the diagonal elments of
is zero? Then the corresponding radius will be infinite. Both of these situations will have meaning below.
So far we have an axis-aligned ellipse centered at the origin. The next step is to allow the ellipse to be rotated so that its major and minor axes do not necessarily have to align to the coordinate system and
directions. We just need to multiply every instance of
by an arbitrary 2D rotation transform
where and
for an arbitrary CCW rotation
. We actually multiply by
to have the effect of rotating back to standard position, where the stretching is done:
(using since
is orthogonal). By the way we have constructed this equation, the columns of
are unit vectors that point along the major and minor axes of the ellipse. Specifically, if
(naming the two columns of
) and
then
is a unit vector pointing along the major axis, which has radius
, and
is a unit vector pointing along the minor axis, which has radius
. “major” and “minor” are just swapped in that sentence if
(and if
we have a circle, which is a special case of an ellipse).
The final step is to allow the ellipse to be centered at some point other than the origin. Specifically, to center the ellipse at point , just replace every instance of
with
:
We can write this a little more compactly by defining :
but note that we can’t just use any matrix for
—it has to be decomposable into the form
, where
is orthogonal and
has non-negative entries on its main diagonal and is 0 elsewhere. It can be shown that these constraints are equivalent to saying that
must be symmetric and positive semi-definite, i.e.
(symmetric) and
for any nonzero
(positive semi-definite).
This is further equivalent to the following statement: if you are given an equation in the form
then this is the equation for an ellipse if and only if can be decomposed into
and
such that
,
is orthogonal, and
is positive semi-definite, which is possible if and only if
is symmetric and positive semi-definite. How can you check if
has these properties? And how can you actually perform the decomposition to get
and
(you will need them to draw the ellipse)? It turns out that both questions can be answered by computing an eigendecomposition of
.
Any (i.e. square) matrix
can be thought of as a function that transforms an non-zero input vector
to some other vector
, by simple left multiplication:
. In general, the “output” vector
will not point in the same direction as
, nor will the length of the output vector be the same as the length of the input vector:
can both rotate and scale vectors. But it may happen that for some input vectors, the corresponding output vector is pointing in the same direction; i.e.,
might have only a scaling effect on some vectors, so that
for some scalar
It is not too hard to convert this statement into a prescription for finding such scaling values
, which are called the eigenvalues of
:
Where is the
identity matrix. Since we specified that
is non-zero, the only way this equation can hold is if the matrix
is not invertible, which is the same as saying its determinant is zero:
You may recall that there is a way to calculate determinants that boils down to finding the value of a degree- characteristic polynomial expression in terms of the entries in the matrix. Since
and
are constants, the only variable in this expression will be
, and the roots of the polynomial will give the various eigenvalues
. In general there may be up to
distinct
(there can be fewer because the characteristic polynomial can have multiple roots, in that case some
will have multiplicity greater than 1).
Given , the corresponding eigenvectors
are a basis for the space of solutions to the specific equation
. This is a homogeneous system (right hand side
) so it will, in general, have not just a single solution but an entire linear space of solutions.
We can finally define the eigendecomposition of . Calculate all the eigenvalues
and corresponding eigenvectors
of
(there are good library routines available that will compute both the eigenvalues and eigenvectors for you, for arbitrary
). Note that if
is an eigenvector then so is
for any
; so we may normalize all the eigenvectors (this step is not strictly necessary, but assume we do so). If all the eigenvectors are linearly independent then it can be shown there are precisely
of them and we may form an
invertible matrix
whose columns are simply the eigenvectors, which we can now index as
for
. If we re-index the corresponding eigenvalues as
(allowing for multiple copies of eigenvalues with multiplicity greater than 1), then we can factor
as
Where is an
diagonal matrix with diagonal entries
.
In fact, it can be shown that is symmetric and positive semi-definite exactly when all its eigenvalues are non-negative and
is orthogonal. This gives us not only a way to test if
is suitable for use in the ellipse equation, but also a way to decompose it:
Compute the eigenvalues and eigenvectors of (using
cvEigenVV()
or, preferably, cvSVD()
if you already know is symmetric and positive semi-definite; for the
case it is actually possible to avoid use of these functions at all since the characteristic polynomial is just a quadratic, and easily solved by the quadratic equation, but the numerical methods will work for any dimension).
If you’re not sure if was symmetric and positive semi-definite, check that the eigenvectors are linearly independent and that the eigenvalues are all positive.
Once is found to be symmetric and positive semi-definite, just set
and
.
For the two dimensional case let the two eigenvalues be . Assume we’ve already verified that both are non-negative. Now we can easily find
and
. NOTE: as we move on to covariance matrices below, will will find ourselves starting with an eigendecomposition of the inverse of
, so the scale factors will then be directly proportional to the square roots of the eigenvalues, not inversely proportional.
cvEllipse()
takes many parameters, but the geometry of the ellipse is effectively specified by just three: center
, axes
, and angle
. Given an ellipse equation in the form
with symmetric and positive definite, eigendecompose
to get the rotation matrix
and the eigenvalues
.
center
is just .
axes
is a CvSize
struct (for whatever reason) and can be constructed by calling cvSize
(,
), with the ellipse radii
calculated from
as described above.
angle
is the CCW angle in degrees from the positive axis of the image coordinate frame to the axis of the ellipse corresponding to
. Thus
angle
is .
We now make the connection to Gaussian distributions. Recall that given a 2D Gaussian distribution with mean value
and covariance matrix
, and a confidence limit
, we want to draw an ellipse such that a random variable drawn from
will be found inside the ellipse with probability
.
The definition of the Gaussian function gives the probability that any particular would be drawn:
(Any covariance matrix used in the definition of a Gaussian, as in this equation, must actually be positive definite, not just positive semi-definite. That ensures both that
exists and that
. The probability distribution associated with a covariance matrix that is only positive semi-definite is not strictly a Gaussian, however, it is still possible to draw an error ellipsoid for it. Our method for drawing ellipsoids allows this case; the ellipsoid will have one or more zero radii in the direction(s) corresponding to its zero eigenvalues.)
Note that the argument appears only in the exponent
(look familiar?). Clearly, level sets of (sets of
for which
is constant) are equivalent to level sets of this expression. Furthermore, the maximum value of
occurs at the maximum value of the exponent when
, and
monotonically decreases in all directions as the distance
increases.
What all of this means is that there is some value of the exponent for which any desired fraction
of the probability distribution is given by the
such that
(i.e. is some as-yet unspecified function of
).
is determined directly from the confidence limit
:
.
It can be shown that the covariance matrix for any Gaussian is symmetric and positive definite. Thus, it is not necessary to check for these conditions when dealing with covariance matrices—you may assume them to be true. Furthermore, if
is the eigendecomposition of
then the eigendecomposition of
is easy to get:
( is easy to compute since the inverse of a diagonal matrix is trivially formed by taking the reciprocal of each entry on the diagonal.) This also shows that the inverse of a symmetric positive definite matrix is itself symmetric and positive definite. (If the matrix is only positive semi-definite, then replace the inverse with the pseudoinverse
, which differs only in that
is formed by taking the reciprocal of each entry on the diagonal of
but with
taken to be zero.)
Now the connection to ellipses should be very clear: with and
is the error ellipse of the distribution for the confidence limit
. The eigenvectors of
are the same as the eigenvectors of
; the eigenvalues of
are just the reciprocals of the eigenvalues
of
. The ellipse is centered at
, has axis radii
(here
are the eigenvalues of
, so the ellipse radii are directly proportional to their square roots; note that a zero covariance matrix eigenvalue will correspond to a zero radius, and an infinite covariance matrix eigenvalue will correspond to an infinite radius), and the eigenvectors of
give the corresponding axis directions.
The final remaining piece of business is to determine the function . The story is a little odd. It turns out that for the two dimensional case,
(this can be derived from the development in Appendix A of [1]). But for other dimensions it is not even possible in general to write down
in closed form. Fortunately, the value of
for any given
can be evaluated, in any dimension, to a high precision by a fast algorithm. The details are given in (e.g.) Numerical Recipes in C by Press et al.
[1] Randall C. Smith, Peter Cheeseman, “On the Representation and Estimation of Spatial Uncertainty,” the International Journal of Robotics Research, Vol.5, No.4, Winter 1986. (This is an earlier version of the Smith and Cheeseman paper that we read in class.)
All of the development so far generalizes trivially to ellipsoids, the arbitrary-dimensional generalization of ellipses. A Gaussian distribution in just has
and an
covariance matrix
. The confidence limit
is always a scalar. The ellipse equation keeps the same form, but now the ellipsoid will have
axes. For example, a 3D ellipsoid is shaped something like an American football that has been flattened a little.
The only catch is the function that determines the scale of the ellipsoid given
. As stated above, there is not a closed form expression for
in general, but the required value can be computed by a fast algorithm.
Finally, it is frequently desired to show the error ellipse (or 3D error ellipsoid) for a subset of the variables in a higher dimensional Gaussian distribution. This turns out to be quite easy: just strike out the rows and columns corresponding to the unneeded variables from the higher dimensional covariance matrix to yield the required
(or
) covariance matrix. Similarly strike out the unneeded entries in the mean value
, and evaluate
for the target dimension (2 or 3).
Since both problems are coding work, you may work with a partner on this assignment. In fact, we encourage it. Follow the instructions on the assignments page to submit all your code.
100 points will be allocated according to how many problems you choose to complete. If you do both problems then we will grade each out of 100% and your score will be the maximum of the two.