Statistical-Based Adaptive Background Modeling Algorithm for Grayscale Video

Jiawen Wu; Shiyong Wang

doi:10.3788/CJL202148.0309001

Abstract

Objective Target detection is an active research field of computer vision and forms the basis of applications such as video surveillance and live streaming. Numerous algorithms are used in target detection including traditional methods such as optical flow, frame difference, background modeling, and deep learning, the latter of which has developed rapidly in recent years. However, with the exception of background modeling, most of these algorithms are limited by factors such as budget constraints as well as low processor performance, real-time capability, and accuracy. Thus, background modeling is widely used in real applications. This method also has limitations, however. For example, background modeling will lead to poor video quality and texture and a lot of noise when processing grayscale images such as infrared video. In the present study, we address these limitations by proposing a novel statistical-based adaptive background modeling algorithm that shows high accuracy and an improved recall rate.

Methods We build a grayscale histogram for each pixel in an 8-bit grayscale video and update it on each frame. Then, we estimate the true background from the histogram mode and obtain the noise threshold by using an improved three-frame difference method. From the results, we divide each frame into two regions: a static region and a dynamic region. For the static region, we simply apply single Gaussian modeling to the obtained noise threshold to determine the foreground. For the dynamic region, we utilize an algorithm based on kernel density estimation (KDE) for the detection. Moreover, we resolve the illumination shift and shadow by calculating the grayscale difference between the background and the input frame.

Results and Discussions In the proposed algorithm, a period of 13 ms is required to handle one frame with 320 pixel×240 pixel on an i7-7700HQ platform, which fits the real-time requirement. To prove the high performance of the proposed algorithm, we compare the results of other traditional background modeling algorithms including CodeBook (CB), ViBe, KDE, the Gaussian mixture model (GMM), and the pixel-based adaptive segmenter (PBAS) on public test dataset CDnet2014 (Fig. 8) with the infrared dataset obtained in the present study (Fig. 9). These datasets contain various scenes including mall, highway, avenue, office, river, infrared park, and infrared sea. Target detection under these scenes faces several common challenges such as shadow, static target, complex background, dynamic background, and low-quality images. The results (Table 1) show some of the advantages of the proposed algorithm. In particular, (1) the new algorithm shows the highest recall rate, at 90% in most scenes, which efficiently ensures the integrity of the target. (2) In the mall and office scenes, the previous algorithms except for CB absorb the static foreground into the background. (3) In the river scene, a slow, large, and homogeneous color foreground caused by a canoe led KDE, PBAS, and ViBe to incorrectly consider the front part of the canoe as the background, thus causing incomplete detection of the back part. (4) In the mall and avenue scenes, abundant shadows are present that interfere with the detection result. The proposed algorithm not only preserves the intact foreground but also eliminates the shadow to some extent. (5) Numerous dynamic background areas are present in the highway and river scenes that result in huge amounts false positives. The proposed algorithm effectively decreases the false alarm rate. (6) In the office scene, foreground motion causes the camera exposure to adjust several few times. At the moment of exposure adjustment, CB and PBAS are unable to rapidly adapt to the new illumination, which causes false positives that last for long periods. (7) The images in the infrared park and infrared sea scenes are low in contrast, resulting in extremely poor image quality. Only the proposed algorithm can effectively distinguish the foreground from background. The maximum recall rate of the previous method is 0.671, and the false positives are abundant; that of the proposed algorithm is 0.900, and the false alarm rate is significantly decreased. All these results show that proposed algorithm can significantly increase the detection recall rate and integrity of the target with a low false alarm rate and considerably rapid processing speed.

Conclusions According to the experiment results, the proposed algorithm adopts various detection strategies for dynamic and static regions and combines the advantages of various algorithms to significantly improve the recall rate of target detection in grayscale video. Moreover, it can also detect targets with complex motion pattern with no prior knowledge, which makes this method highly robust in many applications. This robustness results in an overall better index compared with that in common background modeling algorithms. Further, the proposed algorithm can effectively enhance the target detection performance in monochromatic and infrared video. However, its shadow elimination is inadequate, and the accuracy of some scenes requires further optimization.