Regeneration of elemental images in integral imaging for occluded objects using a plenoptic camera

Min-Chul Lee; Kotaro Inoue; Cheol-Su Kim; Myungjin Cho

doi:10.3788/COL201614.121101

Abstract

In this Letter, we propose an elemental image regeneration method of three-dimensional (3D) integral imaging for occluded objects using a plenoptic camera. In conventional occlusion removal techniques, the information of the occlusion layers may be lost. Thus, elemental images have cracked parts, so the visual quality of the reconstructed 3D image is degraded. However, these cracked parts can be interpolated from adjacent elemental images. Therefore, in this Letter, we try to improve the visual quality of reconstructed 3D images by interpolating and regenerating virtual elemental images with adjacent elemental images after removing the occlusion layers. To prove our proposed method, we carry out optical experiments and calculate performance metrics such as the mean square error (MSE) and the peak signal-to-noise ratio (PSNR).

Integral imaging, which was first proposed by Lippmann in 1908^[1], has been used to develop next-generation three-dimensional (3D) imaging and display techniques. To obtain and visualize 3D images, two main steps are required: pickup and reconstruction. In pickup, rays from 3D objects can be captured through lenslet array on an image sensor such as charge-coupled device (CCD). These captured rays are multiple two-dimensional (2D) images with different perspectives for 3D objects, which are referred to as elemental images. In the reconstruction or display stage, these elemental images are printed or displayed on a display device, such as a liquid crystal display (LCD), through the homogeneous lenslet array used in pickup so 3D images can be observed without special viewing glasses. Integral imaging does not require the coherent light source that is used in holography. In addition, it provides full color, full parallax, and continuous viewing points of 3D objects. Especially, it provides depth information of 3D objects using a passive imaging system. Therefore, it can be applied to occlusion removal techniques for 3D objects^[2–11].

Since integral imaging can obtain multi-view information of 3D objects, a depth map may be generated by varying the ratios of the viewing point and elemental images. Thus, the occlusion may be removed by classifying between objects and occlusion layers using the depth map and elemental images. However, this method has two main problems. The first problem is the resolutions of the elemental images and depth map are very low due to the lenslet array based integral imaging. Another problem is the information of elemental images may be lost for the occlusion removal.

In this Letter, to solve these problems, we propose an elemental image regeneration method of 3D integral imaging for occluded objects using a plenoptic camera. A plenoptic camera, which is a modified version of an integral imaging system, can record a light field (location and direction of rays) by placing the main imaging lens in front of the lenslet array. It can capture a depth map with high resolution and an all-in-focus image in a single shot, and it can simplify the process for the conventional occlusion removal technique. To record elemental images with high resolutions, in this Letter, we used synthetic aperture integral imaging (SAII)^[12] by Jang et al.

Sign up for Chinese Optics Letters TOC. Get the latest issue of Chinese Optics Letters delivered right to you！Sign up now

SAII can capture an elemental image with the same resolution as the one of the image sensor by replacing the lenslet array with a camera array for to improve the resolution of elemental images. Finally, the cracked parts of the elemental images may be interpolated from adjacent elemental images to enhance them. Since elemental images have multi-view information, it is possible to interpolate the cracked parts from adjacent elemental images, which can be carried out by inverse computational integral imaging reconstruction (CIIR)^[2,13–16].

First, we present our proposed method. A light field is a vector function presented for the position and angle of rays. In general, a light field can be defined as 5 dimensions that consist of 3D space coordinates and 2D angles. This is referred to as 5D light field. However, light intensity is invariant in an optical system, according to the brightness invariant principle. Therefore, a 5D light field can be redefined as a 4D light field. A plenoptic camera records this 4D light field so it can adjust the position of the focal plane for the image or estimate the depth map.

A 4D light field function is shown in Fig. 1(a). It is rays from the objects, which can be recorded as the intersection points of two 2D planes. That is, $L (x, y, u, v)$ are coordinates $(x, y)$ and $(u, v)$ on the XY and UV planes, respectively. They are the same as the intersection coordinates on one of the 2D planes and the angles of two axes. The concept of a 4D light field for a plenoptic camera is illustrated in Fig. 1(b). The difference between a plenoptic camera and a conventional camera is the lenslet array placed in front of the main imaging lens. In a conventional camera, rays can be recorded on the coordinates of the image sensor (i.e., 2D information). However, in a plenoptic camera, the intersection coordinates of rays on two planes can be found by imaging object rays through both the main lens and the lenslet array.

Figure 1.4D light field function. (a) Overview and (b) plenoptic camera.

Plenoptic cameras can reconstruct the image focused at a certain position by recording the 4D light field. This technique is called refocusing, which can be implemented by shifting the virtual image sensor plane. It is very simple and is carried out by shifting and averaging sub-aperture images, as shown in Fig. 2^[17]. It is similar to CIIR, but its equations are different because it uses the light field function.

Figure 2.Sub-aperture images.

For simplicity, let us consider the movement of 2D virtual image sensor plane X, as shown in Fig. 3. The X plane is placed a distance $F$ from the U plane, and light field $L (x, u)$ is rays passing through the $u$ coordinate of the U plane and the $x$ coordinate of the X plane. For refocusing, when the X plane is moved to the $X^{'}$ plane, where it is a distance $F^{'}$ from the U plane, the recorded light field $L^{'}$ on the $X^{'}$ plane can be described as the movement of the $x$ coordinate on $L$ . When the expanding coefficient is $α = F^{'} / F$ , $L$ can be written as follows^[18]: $L^{'} (x^{'}, u) = L (u + \frac{x^{'} - u}{α}, u) .$ (1)This equation can be extended as a 4D light field: $L^{'} (x^{'}, y^{'}, u, v) = L (u + \frac{x^{'} - u}{α}, v + \frac{y^{'} - v}{α}, u, v) .$ (2)Moving the virtual image sensor plane is the same as moving the position of the XY plane in the recording coordinates of the light field. It is well known that a 2D image can be transformed from a 4D light field by integrating the light field. Therefore, the image at distance $F^{'}$ , $E_{F^{'}}$ can be described as follows: $E_{F^{'}} (x^{'}, y^{'}) = \frac{1}{F^{' 2}} \iint L^{'} (x^{'}, y^{'}, u, v) d u d v .$ (3)By substituting Eq. (2) into Eq. (3), we can obtain the following equation: $E_{F^{'}} (x^{'}, y^{'}) = \frac{1}{α^{2} F^{' 2}} \iint L [u (1 - \frac{1}{α}) + \frac{x^{'}}{α}, v (1 - \frac{1}{α}) + \frac{y^{'}}{α}, u, v] d u d v .$ (4)From Eq. (4), it can be seen that the image can be reconstructed by shifting and averaging the $x$ , $y$ coordinates of light field $L$ and expanding the image with expanding coefficient $α$ . That is, when the $u$ th column and $v$ th row sub-aperture image is $L (u, v)$ , the 2D image can be transformed by shifting and averaging sub-aperture images instead of elemental images in integral imaging. Therefore, Eq. (4) can be rewritten^[18] $E_{F^{'}} (x^{'}, y^{'}) = \frac{1}{α^{2} F^{' 2}} \iint L^{(u, v)} [u (1 - \frac{1}{α}) + \frac{x^{'}}{α}, v (1 - \frac{1}{α}) + \frac{y^{'}}{α}] d u d v .$ (5)

Figure 3.Refocusing.

The depth map can be estimated using the light field. This function is included in Lytro software, but the algorithm has not been opened to the public. Thus, in this Letter, we present our own depth map estimation. The depth map has a 16-bit grayscale and its brightness is determined by the Lambda used for refocusing. LambdaMin is brightness 0, and LambdaMax is brightness $2^{16} - 1$ . To estimate the physical distance from these Lambdas, the calibration process is required because the brightness of the depth map is different from the physical distance.

Regeneration of elemental images has two main stages: occlusion removal, and interpolation of cracked parts caused by occlusion removal. Table 1 describes the system parameters and their definition, and the regenerated images are shown in Fig. 4.

Figure 4.Image of equation ( $k = 3$ , $l = 3$ ): (a) EI, (b) $D$ , (c) OL, (d) OREI, (e) VEI, and (f) REI.

Parameter	Definition
C	Size of image sensor
D	Depth maps
EI	Elemental image
M	Movement of virtual object
N	Number of pixels for each elemental image
OL	Occlusion layer
OREI	Occlusion removed elemental image
REI	Regenerate elemental image
Sg	Number of shifted pixels for regeneration
Sr	Number of shifted pixels for reconstruction
Th	Threshold for occlusion layer
VEI	Virtual elemental image
d	Reconstruction distance
f	Focal length of image sensor
p	Moving gap between image sensor

Table 1. Definition of Parameters

View all Tables

Occlusion removal can be implemented by using a certain threshold of the depth map. Let the depth map be $D$ and the occlusion layer be OL. Then, OL can be written as ${OL}_{(k, l)} (x, y) = {\begin{matrix} 1, & D_{(k, l)} (x, y) \geq Th \\ 0, & otherwise \end{matrix},$ (6)where $OL (k, l)$ is the $k$ th column and the $l$ th row occlusion layer, $x$ , $y$ give the pixel position, and Th is the threshold value for occlusion removal. Then, occlusions can be removed from the elemental images. Let the elemental image be EI and elemental image with occlusions removed be OREI. OREI is written as ${OREI}_{(k, l)} (x, y) = {\begin{matrix} {EI}_{(k, l)} (x, y), & {OL}_{(k, l)} (x, y) = 1 \\ 0, & otherwise \end{matrix} .$ (7)Since the elemental image has a lot of zero brightness after occlusion removal, the visual quality of the elemental image may be degraded. Thus, in this Letter, the elemental image is interpolated and regenerated by using adjacent elemental images. Regeneration can be carried out by shifting adjacent elemental images and generating the virtual elemental image.

OREI can be split by intensities of the depth map, as follows: ${Sliced}_{(k, l)} (x, y, t) = {\begin{matrix} {OREI}_{(k, l)} (x, y), & D_{(k, l)} (x, y) = t \\ 0, & otherwise \end{matrix} (Th < t \leq t_{\max}),$ (8)where $t$ is the intensity of the depth map, and $t_{\max}$ is the maximum intensity of the depth map. Then, the movement of elemental images is calculated as depicted in Fig. 5(a). Let $(k, l)$ be the coordinates of the current regenerated image and $(m, n)$ be the coordinates of the image for interpolation. The movements of the elemental image in the $x$ and $y$ directions, $M_{x}$ and $M_{y}$ , are written by $M_{x} (k, m) = (k - m) p, M_{y} (l, n) = (l - n) p,$ (9)where $p$ is the distance between cameras for SAII. Using these movements, shifting pixels for each elemental image as shown in Fig. 5(b) can be written as follows: ${Sg}_{x} (k, m, t) = \frac{{EI}_{x} f M_{x} (k, m)}{c_{x} z (t)}, {Sg}_{y} (l, n, t) = \frac{{EI}_{y} f M_{y} (l, n)}{c_{y} z (t)},$ (10)where ${EI}_{x}$ and ${EI}_{y}$ are the number of pixels for the image sensor in the $x$ and $y$ directions, and $z (t)$ is the function transformed by the intensity of the depth map in relation to the physical distance. This function depends on the specifications of the plenoptic camera and the calibration methods. Thus, the virtual elemental image VEI can be written as ${VEI}_{(k, l)} (x, y) = \frac{1}{O (x, y)} \sum_{t = Th + 1}^{t_{\max}} \sum_{m = 1}^{K} \sum_{n = 1}^{L} {Sliced}_{(m, n)} (x + {Sg}_{x} (k, m, t), y + {Sg}_{y} (l, n, t), t),$ (11)where $O (x, y)$ is the superposition matrix for CIIR. Equation (11) is the inverse of CIIR. Finally, the regenerated elemental image can be obtained from OREI and VEI as follows: ${REI}_{(k, l)} (x, y) = {\begin{matrix} {OREI}_{(k, l)} (x, y), & {OL}_{(k, l)} (x, y) = 1 \\ {VEI}_{(k, l)} (x, y), & otherwise \end{matrix} .$ (12)To prove our proposed method, we carried out computer simulations. The parameters are described in Table 1. In CIIR, the shifting pixels of each elemental image ${Sr}_{x}$ and ${Sr}_{y}$ are as follows: ${Sr}_{x} = \frac{N_{x} p f}{c_{x} d}, {Sr}_{y} = \frac{N_{y} p f}{c_{y} d} .$ (13)Finally, the reconstructed 3D image at distance $d$ can be obtained by the following equation: $I (x, y, d) = \frac{1}{O (x, y)} \sum_{k = 0}^{K - 1} \sum_{l = 0}^{L - 1} {REI}_{(k, l)} (x - k {Sr}_{x}, y - l {Sr}_{y}) .$ (14)Next, we show the experimental results. The depth map from the Lytro software cannot present the physical depth. Thus, we need to place the reference object at a fixed distance and measure the distance by stereo matching. Therefore, we can estimate the relation between the intensity of the depth map and the physical depth. In the pickup stage, LYTRO ILLUM is used, where the resolution of the camera is $2022 (H) \times 1404 (V)$ , the focal length of camera lens $f = 70 mm$ , the refocus range is 400–750 mm, and the distance between cameras $p = 10 mm$ . When we know the shifting pixels ( $S$ ) between the two elemental images, we can calculate the depth $d$ using the following equation: $d = \frac{N_{x} p f}{c_{x} S} = \frac{2022 \times 10 \times 70}{36 \times S} .$ (15)

Figure 5.Overview of the algorithm. (a) Movement of elemental images and (b) shifted pixels for regeneration.

Table 2 and Fig. 6 show the measurement results. These depths are converted to the intensity of the depth map $t$ . As shown in Fig. 6, the relation between the intensity of the depth map and the physical depth is linear. Thus, the linear approximation can be found by the least squares method. Therefore, the experimental equation for the transformation between the intensity of the depth map and the physical depth is $z (t) = 7.33 t - 522.1 .$ (16)This equation is used to calculate ${Sg}_{x}$ and ${Sg}_{y}$ in the regeneration of elemental images.

Figure 6.Linear approximation results.

	Near	Middle	Far
S (px)	105	71	57
d (mm)	374.4	553.8	689.8
t	118	149	164

Table 2. Calibration Results

View all Tables

In our experiment, there are two pickup scenarios: with occlusion and without occlusion. 3D objects without occlusion are used to calculate the mean square error (MSE) and the peak signal-to-noise ratio (PSNR) as follows: $MSE = E [{(Ref - I_{input})}^{2}],$ (17) $PSNR = 20 \log_{10} (\frac{{MAX}_{I}}{\sqrt{MSE}}),$ (18)where $E [\cdot]$ is the expectation operator, Ref is the reference image, $I_{input}$ is the reconstruction results, and ${MAX}_{I}$ is the maximum pixel intensity of the image. The occlusion is placed in front of the left shoulder of the object.

Figure 7 shows the reconstructed 3D images at the reconstruction distance $d = 620 mm$ for each method. The reconstructed 3D images using the elemental images without occlusion shown in Fig. 7(a) are the references for the MSE and PSNR. Figures 7(b) and 7(c) show the reconstructed 3D images using elemental images with the occlusion removed conventionally and the reconstructed 3D images by our proposed method, respectively. As shown in the enlarged figure, the characters “BF-37” on the shoulder of the object can be easily recognized in Fig. 7(f). To evaluate the visual quality of the reconstructed 3D images, we calculate the MSE and PSNR as shown in Fig. 8. Thus, we see our proposed method can obtain better results. (The MSE is improved by 60%, and the PSNR is improved by $- 7 dB$ ).

Figure 7.Experimental results at $d = 620 mm$ : (a) original, (b) conventional occlusion removal, (c) proposed method, (d)–(f) enlarged views of (a)–(c), respectively.

Figure 8.Quality of image using MSE and PSNR.

In this Letter, we propose a regeneration technique for elemental images in integral imaging using a plenoptic camera after removing the occlusions. In conventional methods, the image information may be lost after occlusion removal. On the other hand, in our proposed method, the image information can be interpolated by adjacent elemental images after occlusion removal. However, our method has some problems. The visual quality of the regenerated elemental images depends on the accuracy of the depth map or calibration. In addition, our method uses an averaging process for the 3D reconstruction, so high spatial frequencies may be lost. We will look for a solution to these problems in the future.

References

[1] G. Lippmann. C. R. Acad. Sci., 146, 446(1908).

[2] S.-H. Hong, J.-S. Jang, B. Javidi. Opt. Express, 12, 483(2004).

[3] S.-H. Hong, B. Javidi. Opt. Express, 14, 12085(2006).

[4] M. Cho, B. Javidi. Opt. Lett., 33, 2737(2008).

[5] D.-H. Shin, B.-G. Lee, J.-J. Lee. Opt. Express, 16, 16294(2008).

[6] B.-G. Lee, , D.-H. Shin. Opt. Comm., 283, 2084(2010).

[7] M. Zhang, Y. Piao, E.-S. Kim. Appl. Opt., 49, 2571(2010).

[8] J.-J. Lee, B.-G. Lee, H. Yoo. Appl. Opt., 50, 1889(2011).

[9] Z. Zhou, Y. Yuan, X. Bin, Q. Wang. Chin. Opt. Lett., 9, 041002(2011).

[10] Y. Piao, M. Zhang, E.-S. Kim. Opt. Lasers Eng., 50, 1602(2012).

[11] M. Zhang, Y. Piao, J.-J. Lee, D. Shin, B.-G. Lee. Opt. Commun., 313, 204(2014).