Real-time spatiotemporal division multiplexing electroholography for 1,200,000 object points using multiple-graphics processing unit cluster

Hiromi Sannomiya; Naoki Takada; Kohei Suzuki; Tomoya Sakaguchi; Hirotaka Nakayama; Minoru Oikawa; Yuichiro Mori; Takashi Kakue; Tomoyoshi Shimobaba; Tomoyoshi Ito

doi:10.3788/COL202018.070901

Abstract

Computationally, the calculation of computer-generated holograms is extremely expensive, and the image quality deteriorates when reconstructing three-dimensional (3D) holographic video from a point-cloud model comprising a huge number of object points. To solve these problems, we implement herein a spatiotemporal division multiplexing method on a cluster system with 13 GPUs connected by a gigabit Ethernet network. A performance evaluation indicates that the proposed method can realize a real-time holographic video of a 3D object comprising ～1,200,000 object points. These results demonstrate a clear 3D holographic video at 32.7 frames per second reconstructed from a 3D object comprising 1,064,462 object points.

Keywords

graphics processing unit multiple-graphics processing unit cluster real-time electroholography spatiotemporal division multiplexing electroholography

Real-time electroholography based on computer-generated holograms (CGHs) is expected to become the ultimate three-dimensional (3D) television [1, 2] . However, computationally, the CGH calculation rapidly becomes prohibitively expensive because real-time electroholography requires processing extremely high floating-point arithmetic. The image quality of holographic video deteriorates when reconstructed from a point-cloud model comprising a huge number of object points. Two proposals to suppress this deterioration are the time multiplexing for two-dimensional reconstruction [3] and the spatiotemporal division multiplexing for clear 3D holographic video playback [4] . Large-scale electroholography using the spatiotemporal division multiplexing approach [4] implemented on the Horn-8 system has been reported [5] .

A modern graphics processing unit (GPU) is a cost-effective processor capable of high-floating-point arithmetic processing and fast computer-graphics processing. Thus, GPUs accelerate CGH calculations and directly display the calculated CGH on a spatial light modulator (SLM) [6 - 14] . Conversely, spatiotemporal division multiplexing uses moving image features [15] . This approach accelerates CGH calculations several-fold.

A PC cluster consisting of multiple PCs with multiple GPUs is called a multi-GPU cluster and can be used to significantly accelerate large-pixel-count CGH calculations [16 - 20] . Reference [16] directly connected the GPUs of a multi-GPU cluster to multiple SLMs to show that a multi-GPU cluster is suitable for real-time electroholography involving a large-pixel-count CGH. However, such a multi-GPU cluster system with multiple SLMs is very expensive. Real-time electro-holography using a multi-GPU cluster with a single SLM is low cost, but requires the CGH data transfer between the nodes, which prevents real-time electroholography. To address this problem, we used a high-speed InfiniBand network in a multi-GPU cluster system and applied this system to real-time electroholography [21] and fast time-division color electroholography [22] . We also realized real-time color electroholography by using a multi-GPU cluster system with three SLMs combined with an InfiniBand network [23] . Furthermore, we proposed a packing and unpacking method to reduce CGH data transfer between the nodes of the multi-GPU cluster [24] . We demonstrated real-time electroholography by using a multi-GPU cluster with 13 GPUs (NVIDIA GeForce 1080 Ti) connected by a gigabit Ethernet and a single SLM.

Sign up for Chinese Optics Letters TOC. Get the latest issue of Chinese Optics Letters delivered right to you！Sign up now

In this Letter, we propose clear real-time electro-holography based on spatiotemporal division multiplexing using moving image features and a multi-GPU cluster system connected by a gigabit Ethernet network. The proposed method does not use cache memory.

i - 1

Figure 1.Spatiotemporal division multiplexing approach for suppressing the deterioration of a 3D holographic video reconstructed from a point-cloud model comprising a huge number of object points.

Figure 2.Spatiotemporal division multiplexing approach using moving image features.

i

As shown in Fig. 2, the spatiotemporal division multiplexing approach using the moving image features uses only one of the divided objects. Here, a different divided object is selected for every three frames. In each frame, the number of object points contributing to one divided object is one-third of that contributing to the original 3D object. Thus, the subsequent CGH calculation is three times faster than that using the original 3D video. However, long CGH calculations for each frame prevent smooth real-time reconstruction of moving 3D images. As a result, we have never applied the spatiotemporal multiplexing approach using moving image features to a point-cloud model comprising a huge number of object points.

x_{h}, y_{h}, 0

1920 \times 1024

70 mm \times 50 mm \times 50 mm

Figure 3.Reconstructed 3D image from a 3D object “fountain” comprising 1,064,462 object points.

T

Figure 4.Multi-GPU cluster system with multiple GPUs connected by a gigabit Ethernet network and a single SLM.

CPU	Intel Core i7 7800X (clock speed: 3.5 GHz)
Main memory	DDR4-2666 16 GB
OS	Linux (CentOS 7.6 x86_64)
Software	NVIDIA CUDA 10.1 SDK, OpenGL, MPICH 3.2
GPU	NVIDIA GeForce GTX 1080 Ti

Table 1. Specifications of Each Node in the Multi-GPU Cluster System

View all Tables

Figure 5.Pipeline processing for the spatiotemporal electroholography system shown in Fig. 2.

The time required to read the coordinate data of the object points from auxiliary storage becomes non-negligible when the number of the 3D-object points is huge. We investigated the total time required to display twelve-frame sequences because, by using pipeline processing, all GPUs of the CGH calculation nodes generated twelve CGH data in each cycle. In each of the CGH calculation nodes, we used two codes for serial computing [see Fig. 6(a)] and for parallel computing [see Fig. 6(b)]. The object data in the process “read object data,” which means to read object data from the NFS server, are the coordinates of the object points expressed as binary data. Fig. 7 shows the total display time for sets of twelve frames when using the serial computing scheme shown in Fig. 6(a) and when using the parallel computing scheme shown in Fig. 6(b) for 1,200,000 object points. Here, no cache memory was used when reading the coordinate data. Twelve CGHs for twelve frames were calculated by using twelve GPUs on the CGH calculation nodes. In Fig. 7, “SSD” and “HDD” refer to a solid-state drive and a hard disk drive, respectively, on the NFS server to store the coordinates of the object points. We used a Western Digital WD20EZAZ-RT (2 TB) HDD and an Intel Optane 900P (280 GB) SSD. The result shown in Fig. 7 indicates that the serial computing outlined in Fig. 6(a) is substantially affected by HDD access time when the HDD serves as the storage for the NFS server. When using parallel computing [Fig. 6(b)], the time required to read the object-point coordinates is completely hidden within the time required to do each CGH calculation using a GPU from the CGH calculation nodes, regardless of whether the HDD or SSD is used.

Figure 6.Read data processing and CGH calculation on each CGH calculation node in the multi-GPU cluster system shown in Fig. 4. (a) Serial computing. (b) Parallel computing.

Figure 7.Comparison of the total display time for every 12 frames using serial computing shown in Fig. 6(a) with that using parallel computing shown in Fig. 6(b) when the number of object points is 1,200,000.

T

Figure 8.Display-time interval $T$ shown in Fig. 5 plotted versus the number of object points when using the spatiotemporal division multiplexing approach using moving image features implemented on the multi-GPU cluster system shown in Fig. 4.

Figure 9 shows snapshots of the reconstructed 3D video (Video 1) from the original 3D video “fountain” comprising 1,064,462 object points and with six space divisions. Table 2 lists the frame rate of the reconstructed 3D video from the original 3D video “fountain” comprising 1,064,462 object points and for the number of space divisions. We obtained a clear holographic 3D video reconstructed from a 3D object comprising 1,064,462 object points at 32.7 fps with six space divisions.

Figure 9.Snapshot of a reconstructed 3D video (Video 1).

Number of Space Divisions	Object Points	Frame Rate (fps)
No division	1,064,462	5.43
Two divisions	532,231	10.86
Four divisions	266,116	21.70
Six divisions	177,411	32.70

Table 2. Frame Rate of the Reconstructed 3D Video from the Original 3D Video “Fountain” Comprising 1,064,462 Object Points Against the Number of Space Divisions