A Study in Vision: Optical flow and velocity estimation

According to this, structure from motion solves two problems: correspondence for 2D motion in the image, and reconstruction for 3D motion of the camera.

Correspondence is often solved by optical flow. Let the pixel intensity be E(x,y,t), and suppose the brightness pattern is displaced by dx, dy over time dt, then the intensity of the displaced pixel should be the same as the original one: $E(x,y,t) = E(x+dx,y+dy,t+dt)$. In other words, the total derivative of E w.r.t time is zero: $\frac{\delta E}{\delta x}\frac{dx}{dt}+\frac{\delta E}{\delta y}\frac{dy}{dt}+\frac{\delta E}{\delta t} = 0$.

If we denote $E_x = \frac{\delta E}{\delta x}, E_y = \frac{\delta E}{\delta y}, E_t = \frac{\delta E}{\delta t}, v_x = \frac{dx}{dt}, v_y = \frac{dy}{dt}$, then the equation becomes $E_x v_x + E_y v_y + E_t = 0$, which is known as the brightness constancy equation (BCE). $E_x, E_y, E_t$ are measurable, while $v_x, v_y$ are the unknown flows. One BCE is insufficient to solve the flows. Therefore, to solve it we have to consider a small window of adjacent pixels. For instance, LK algorithm assumes the motion field at a given time is constant over a block of pixels. For the block of n pixels, there are n BCEs. This linear over-determined equation could be solved by least square estimation.

Once we have the flows, we can estimate the 3D motion. Homography describes the linear position relationship between feature points on two different planes. Suppose the camera takes images of the ground during flight, the visual features in two consecutive frames are related by a 3x3 matrix H, assuming the ground plane is flat. If R and T are the inter-frame rotation and translation, N is the unit normal vector of the ground plane in the camera frame, d is the altitude, then $H = R + \frac{1}{d}TN^T$, and thus $T=d(H-R)N$.

Based on this, when UAV is moving slowly, the rotation is negligible. Thus we can assume R is an identity matrix. As T is estimated, the velocity vector v in UAV body frame can be computed by $v = \frac{T}{\Delta t}$.

A simple linear KF could be used to estimate the UAV position. The measurement is the estimated velocity based on the homography, the linear states $x_k$ include the position and velocity of the UAV. The state x is updated using the prediction model: $x_k = F_k x_{k-1} + B_k u_k + w_k$. Essentially, it means $x_t = x_{t-1} + \dot{x_{t-1}} \delta t + \frac{1}{2}a \delta t^2 + N(0, Q_k)$ for position, and $\dot{x_t} = \dot{x_{t-1}} + a \delta t + N(0, Q_k)$. $u_k$ is the system input, i.e. the UAV acceleration. We assume the acceleration follows a zero mean normal distribution. $w_k$ is the process noise, which follows $N(0, Q_k)$.

The measurement model is $z_k = G_k x_k + v_k$, where $v_k$ is the measurement noise following $N(0, R_k)$.

A Study in Vision

Sunday, 20 December 2015

Optical flow and velocity estimation

No comments:

Post a Comment