Working notes related to the use of a plenoptic camera for estimating depth within a scene.
Outline
This working paper has two aims related to the implementation of plenoptic cameras as a situational-awareness diagnostic for autonomous vehicles. The first aim is to understand trade-offs in the plenoptic camera’s light field sampling. From a classical estimation perspective, we first wish to characterize the trade-off between the angular sampling of the camera and the lower bound on depth uncertainty. Secondly, from a Bayesian perspective, we wish to characterize the expected value of this uncertainty for typical scenes. Finally, we wish to answer the question of whether the appropriate use of light field priors will impact the optimal trade-off between angular and spatial sampling. The second aim is to develop an optimal algorithm for estimating depth from the sampled light field. There are many possible approaches to this problem. We plan to start by assessing the practicality of MLE and MAP estimators. Provided that such estimators are workable, we will compare them with more heuristic approaches found within the literature. Finally, approaches based on compressive sensing and machine learning will be addressed.
Plenoptic Camera Trade-offs
The objective of this section is to determine the light field measurement architecture with ideal properties for the application of infrared light field imaging to autonomous vehicles.
Background
The plenoptic camera samples the 4D light field function, which contains features correlating to the depth of objects within a scene. However, there also exist other types of computational imagery architectures which measure different linear projections of the 4D light field. Coupled with modern statistical image / light field priors, these architectures can also be used to estimate the original light field as a statistical inference problem. A logically preliminary goal would be to assess which architectures’ projections optimally make use of light field priors, as in Levin1. However, we confine ourselves to looking only at the trade-offs contained in the plenoptic camera architecture.
Light Field Sampling
The plenoptic camera samples the 4D radiance function at a single plane within camera, commonly referred to as a light field. Commonly, this radiance function is parameterized by the points $(s,t)$ in the image plane ($I$) and $(u,v)$ in the pupil plane ($P$) such that
\begin{equation} L_I(s,t,u,v)\frac{dsdtdudv}{l^2} \end{equation}
gives the radiant flux along the ray from the region $dsdt$ to the region $dudv$.
A more appropriate parameterization for the task of estimating depth makes use of the point $(u,v)$ on the pupil plane and the point $(S,T)$ on the conjugate of the image plane $(C)$, located a distance $L$ away from the pupil, where
\begin{equation} \frac{1}{L}+\frac{1}{l}=\frac{1}{f} \end{equation}
By conservation of energy, $L_I(s,t,u,v) = L_C(s/M,t/M,u,v)$ where $M = L/l$.
Next, we consider how the radiance from a point $(x,y,z)$ is represented within $L_O$. Formally, given a point at $(x,y,z)$ on a Lambertian surface with radiance $L_O$, what is the set $(S,T,u,v)$ such that $L_C(S,T,u,v) = L_O$? The figure shows that this set will contain all elements satisfying the equation
\begin{equation} \frac{u-x}{z} = \frac{x-S}{L-z}, \end{equation}
or
\begin{equation} \label{eq:objectslope} S = x(1-\alpha) + u\alpha, \end{equation}
where $\alpha = 1-L/z$. Moreover, because $S = s/M$, it can be shown that
\begin{equation} s = x_i(1-\beta)+u\beta = s_0 + u\beta, \end{equation}
where $\beta = M\alpha = 1 - l/z_i$ and $s_0$ is the center of the circle of confusion from the point on the $I$ plane. To summarize,
\begin{equation} \label{eq:lfrelationships} L_I(s_o+u\beta,t_0+v\beta,u,v) = L_C(x(1-\alpha) + u\alpha,y(1-\alpha) + v\alpha,u,v) = L_O(x,y,z). \end{equation}
The plenoptic camera samples the light field along the $s$, $t$, $u$, and $v$ dimensions, subject to the constraint that the product of the number of samples along each dimension is equal to the total number of detectors, considered to be fixed for the purposes of all analysis that follows, i.e.,
\begin{equation} N = N_sN_tN_uN_v. \end{equation}
We assume additive white Gaussian noise, so that the signal can be modeled as
\begin{equation} \label{eq:model} \Phi_{ijkl} = \mu_{ijkl} + w_{ijkl}, \end{equation}
where the expected value $\mu_{ijkl}$ is integrated radiance over the 4D sample size
\begin{equation} \label{eq:lfexpectedvalue} \mu_{ijkl} = \frac{1}{l^2}\int\int\int\int{}L_I(s,t,u,v)dsdtdudv. \end{equation}
Classical Estimation Paradigm
To begin our depth uncertainty analysis, we will employ a classical estimation framework to lower bound the depth estimation variance for an unbiased estimator. For this to be feasible, we will need to assume a simple model for the scene. We will begin with a model consisting of a simple edge of uniform depth and known magnitude. Given this model, we can easily determine how changing the depth of the edge affects the information contained in the sampled light field, allowing for the determination of the Fisher information and Cramer-Rao Lower Bound (CRLB). Outside of the simple edge model, it will be difficult to select object-space models leading to tractable determinations of the CRLB. Instead, we will move toward modeling the sampled light field itself, and looking at the CRLB for the light field slope estimation. Even given this simplification, it will not be possible to estimate the uncertainty for a complex scene due to the confounding effects of neighboring pixels. At this point, we will move to a Bayesian estimation framework, using generated light fields to implicitly model the light field prior distribution.
Simple Edge Model
Given (\ref{eq:model}), the probability of a given light field observation $\mathbf{\Phi}$ is given by
\begin{equation} p(\mathbf{\Phi};z) = \frac{1}{(2\pi^2\sigma^2)^{N/2}}\mathrm{exp}\left(-\frac{1}{2\sigma^2}\sum_{ijkl}(\Phi_{ijkl}-\mu_{ijkl})^2\right). \end{equation}
We wish to determine the Fisher information, $I(z)$, which is given by
\begin{equation} \label{eq:fisherinfo} I(z) = -E\left(\frac{\partial^2\mathrm{ln}p(\mathbf{\Phi};z)}{\partial{z}^2}\right) = \frac{1}{\sigma^2}\sum_{ijkl}\left(\frac{\partial\mu_{ijkl}}{\partial{z}}\right)^2. \end{equation}
We will then have that the variance of any estimator $\hat{z}$ is bounded by the reciprocal of $I(z)$, i.e.,
\begin{equation} \mathrm{var}(\hat{z}) \geq \frac{1}{I(z)} \end{equation}
In order to evaluate the derivative in (\ref{eq:fisherinfo}), one approach is to recast (\ref{eq:lfexpectedvalue}), as in
\begin{equation} \label{eq:objectmodel} \mu_{ijkl} = \frac{1}{z^2}\int_{u_k}^{u_{k+1}}\int_{v_l}^{v_{l+1}}\int_{x_i(u,z)}^{x_{i+1}(u,z)}\int_{y_i(v,z)}^{y_{j+1}(v,z)}L_O(x,y,z)dxdy, \end{equation}
such that it explicitly shows the sampling of the radiance from an object located at a fixed depth, $z$. However, taking the derivative of (\ref{eq:objectmodel}) reveals a number of competing radiometric effects, namely the decreasing solid angle of the pupil vs the increasing field of view of the detector. Demonstrating that these effects cancel out is cumbersome. It is much easier to use (\ref{eq:lfexpectedvalue}) directly, making use of the equivalence of radiance indicated in (\ref{eq:lfrelationships}). To evaluate the integral, we define the edge model,
We can then use (\ref{eq:objectslope}) to show that $L_I$ is equal to
Note that for samples which do not contain the edge, the expected value of $\mu_{ijkl}$ does not change with $z$. For the $N_tN_v$ samples which contain the edge, the expected value simplifies to
\begin{equation} \mu_{ijkl} = \frac{\Delta{t}\Delta{v}}{l^2}\int_{u_k}^{u_{k+1}}\int_{s_i}^{u\left(\frac{l}{L}-\frac{l}{z}\right)}L_0dsdu = \frac{L_0\Delta{t}\Delta{v}}{l^2}\int_{u_k}^{u_{k+1}}\left[u\left(\frac{l}{L}-\frac{l}{z}\right)-s_i\right]du \end{equation}
Differentiating with respect to $z$ gives
This can be rewritten as follows,
\begin{equation} \frac{\partial\mu_{ijkl}}{\partial{z}} = \frac{L_0\Delta{t}\Delta{s}\Delta{u}\Delta{v}}{l^2}\frac{l}{z^2}\frac{\Delta{u}}{\Delta{s}}k = L_0\Delta{q}^2(FN)^2\frac{l}{z^2}\frac{\Delta{u}}{\Delta{s}}k, \end{equation}
where the factor $\Delta{L}\Delta{q}^2(FN)^2$ can be considered the average signal detected due to the step $\Delta{L}$ given a conventional camera with detector size $\Delta{q}$. We can now solve for the fisher information via (\ref{eq:fisherinfo}),
where we have interpretted the first factor in parenthesis as the SNR resulting from the edge discrepancy in a conventional camera. This is the information about the depth of the entire surface as a whole. Since this information was derived from the information in the $N_t$ pixels containing the edge, we would do better to divide the information up among these pixels giving $I(z) = I_{total}(z)/N_t$ if we wish to come closer to the understanding the performance of independent point estimation.
The final step is to invert the Fisher information and substitute for $\Delta{u}/\Delta{s}$ to obtain the lower bound on variance:
\begin{equation} \mathrm{var}(\hat{z}) \geq \frac{N_u/N_v}{\frac{1}{3}+\frac{1}{N_u}+\frac{11}{12}\frac{1}{N_u^2}}\left(\frac{z^2\Delta{q}/Dl}{\mathrm{SNR}}\right)^2 \end{equation}
Interestingly, for the edge model, the achievable estimation performance is not improved by increasing angular samples.
-
Levin, A., Freeman, W. T., & Durand, F. (2008, October). Understanding camera trade-offs through a Bayesian analysis of light field projections. In European Conference on Computer Vision (pp. 88-101). Springer, Berlin, Heidelberg. ↩︎