
- Chinese Optics Letters
- Vol. 22, Issue 2, 020604 (2024)
Abstract
1. Introduction
In recent years, few-mode fibers (FMFs) have received increasing attention due to their potential applications in high-power fiber lasers[1], space-division multiplexing transmission[2,3], and imaging. Moreover, FMFs are regarded as ideal platforms for the study of spatiotemporal mode-locking mechanisms[4–6] and Kurr nonlinear beam cleaning[7]. However, the mode-coupling phenomenon in FMFs is inevitable, which significantly impacts their performance. Therefore, it is crucial to understand the mode properties of FMFs to suppress higher-order mode generation or optimize their design. The mode-decomposition (MD) technique is a fundamental measurement method that allows for the decomposition of the amplitude and phase information of each eigenmode in FMFs. It plays a critical role in studying mode properties and transmission characteristics in FMFs. Currently, MD techniques are commonly used for measuring fiber mode transfer matrices[8], implementing adaptive mode control[9], analyzing fiber mode coupling[10], studying fiber bending losses[11], and measuring beam quality[12].
Early MD methods were primarily based on experimental techniques[13–17], where the complete distribution of the optical field was directly measured by using sophisticated experimental devices. However, these methods suffer from high equipment costs, high accuracy requirements, complex experimental procedures, heavy workloads, and vulnerability to environmental influences. Subsequently, numerically based MD techniques[18–22] were proposed, which effectively reduce the cost and equipment requirements and only require simple experiments. However, these methods are susceptible to problems such as initial value sensitivity, convergence to local minima, high computational effort, and long convergence time during a large number of iterations. To solve these problems, some noniterative numerical decomposition methods, such as fractional-level Fourier systems[23] and matrix-inversion methods[24], have emerged, which avoid the above problems and show excellent performance.
Recently, neural network-based MD methods have shown feasibility and are emerging as a significant research direction. An et al. achieved the first high-precision real-time MD of five modes using VGG-16 convolutional neural networks in 2019[25]. Subsequently, Fan et al. improved the convolutional neural network in 2020 by adding additional loss functions associated with near-field and far-field spot maps to achieve high-precision MD in the case of six-mode superposition[26]. Zhu et al. successfully achieved high-precision MD with six modes using a ResNet-18 convolutional neural network in 2021[27]. Rothe et al. utilized Dense-Net convolutional neural network with up to 121 layers to achieve high-precision MD with eight modes superimposed[28]. Also artificially designed neural network-based methods[29] and multitask deep-learning methods[30] to achieve MD have been proposed successively and have shown better performance. However, all of the above methods use traditional convolutional neural networks for MD, which have problems such as long training time, high computer equipment requirements, and excessive computing resources consumption. In addition, the large number of parameters in traditional convolutional neural network models makes them unsuitable for deployment on portable devices, like popular Android smartphones. To address this challenge, lightweight convolutional neural networks have been proposed and shown promising results in image classification[31]. Among them, Google proposed the lightweight neural network model MobileNetV3 in 2019[32]. MobileNetV3 employs the neural architecture search (NAS) parametric search method and redesigns the time-consuming layer structure and activation function. This greatly reduced the number of training parameters while maintaining high accuracy, leading to significantly reduced training time for the entire network. Overall, MobileNetV3 makes the whole image classification network more lightweight and efficient.
Sign up for Chinese Optics Letters TOC. Get the latest issue of Chinese Optics Letters delivered right to you!Sign up now
In this paper, we propose a fast MD method based on the improvement of MobileNetV3. The proposed algorithm uses depth-separable convolution instead of conventional convolution, redesigns the activation function, and reduces the repetitive layer structure without any pretraining process. The method can quickly and accurately predict the mode weights of the eigenmodes and the phase differences between the fundamental and higher-order modes. Simulation test results show that the average mode weight error of modes is less than 0.56%, the average relative phase error is less than 0.85%, and the average correlation between simulated and reconstructed near-field optical field maps is as high as 0.9995 under the condition of FMFs supporting six LP modes (LP01, LP11e, LP11o, LP21e, LP21o, LP02). The MD of this method speed is about 6 ms per frame, the real-time processing is very strong, and the network model size is merely 6.5 MB, which has the advantages of fast decomposition speed, low experimental equipment requirement, and easy deployment compared with other deep-learning methods. Most importantly, this lightweight model facilitates the reduction of the need for storage devices and computational resources and is easy to deploy on portable devices such as cell phones and sensors.
2. Implementation Method
The propagation field within the FMFs can be expressed as a linear superposition of several eigenmodes, as shown in Eq. (1)[33],
Figure 1 illustrates the entire MD process using MobileNetV3_Light. First, the eigenmode is calculated based on the known fiber structure parameters, and the mode weights along with relative phase coefficients are randomly generated. Then, the near-field optical field image is generated by eigenmode superposition theory simulation. It is worth noting that, although the phase sign cannot be determined from the near-field optical field image alone, it is feasible to use only the near-field optical field image for MD in fiber laser studies, where in most cases only the mode ratios of the individual eigenmodes are of interest[25]. During the training phase of the neural network, we take the near-field light-field image as input and process the generated mode weights and relative phase coefficients as label vectors. The label vector consists of
Figure 1.Pattern decomposition based on MobileNetV3_Light neural network.
Here,
3. Neural Network Model Design
Convolutional neural network models face two major challenges in applications: one is the storage issue, as hundreds of network layers contain a large number of parameters, leading to high storage requirements for the device. The other is the speed issue, where prediction is usually required to be done within milliseconds in order to meet the practical standards of mobile applications. To address these performance issues, model compression is a common solution. It involves retraining an already trained model to reduce the number of parameters in the network, thus solving the storage problem. Unlike dealing with already trained models, lightweight models are designed using a more efficient “network computation method” (mainly used in convolutional methods) to reduce network parameters without affecting performance. Representative examples in this regard include Squeeze-Net, Mobile-Net, Shuffle-Net, and Xception.
In this paper, MobileNetV3 is used as the initial model, and MobileNetV3_Light is obtained by fine-tuning it. The performance improvement of this model can be mainly attributed to the use of depth-separable convolution instead of the traditional convolutional computation. As described in the literature[34], the depth-separable convolution decomposes the traditional convolution into two parts: a depth-wise convolution and a
Figure 2.Traditional convolution and depth-separable convolution.
The computation effort of the conventional convolution is
It can be seen that the number of parameters in the lightweight neural network model is 2.5 million and the number of parameters in the traditional network model is 21.875 million. Deep-separable convolution requires only 1/8 to 1/9 of the computational effort of traditional convolution. It achieves this by decomposing the traditional convolutional factorization into a deep convolution and a point-by-point convolution, which significantly reduces the computational effort of the neural network model.
In addition, MobileNetV3 further reduces the computational cost of the model by employing the hard-sigmoid function (instead of the sigmoid function) and by simplifying the repetitive folding operation. Experiments in the literature[35] demonstrate that hard-sigmoid functions play almost the same role as sigmoid functions in mobile devices, but are typically less computationally intensive. The activation functions used in this paper include Hard-Swish and ReLU functions, whose expressions are shown in Eqs. (4) and (5), respectively,
The
Figure 3.MobileNetV3_Light network structure.
The first module consists of a traditional convolutional layer, followed by the use of a Hard-Swish activation function. The second module comprises nine MobileNetV3 blocks, each structured as depicted in Fig. 4[32]. Within each MobileNetV3 block, the input feature matrix is first up-dimensioned by a
Figure 4.MobileNetV3 block network structure diagram.
Finally, a
The detailed parameters of the whole network model structure are shown in Table 1. The “Input” column denotes the size of the input feature matrix. The “Operator” column represents various operations, where “Conv2d” indicates the convolutional layer, “Bneck,
Input | Operator | Exp size | #out | SE | NL | s |
---|---|---|---|---|---|---|
2242 × 3 | Conv2d | – | 8 | – | HS | 2 |
1122 × 8 | Bneck, 3 × 3 | 16 | 16 | √ | RE | 2 |
562 × 16 | Bneck, 3 × 3 | 72 | 24 | – | RE | 2 |
282 × 24 | Bneck, 3 × 3 | 88 | 24 | – | RE | 1 |
282 × 24 | Bneck, 5 × 5 | 96 | 40 | √ | HS | 2 |
142 × 40 | Bneck, 5 × 5 | 240 | 40 | √ | HS | 1 |
142 × 40 | Bneck, 5 × 5 | 120 | 48 | √ | HS | 1 |
142 × 48 | Bneck, 5 × 5 | 144 | 48 | √ | HS | 1 |
142 × 112 | Bneck, 5 × 5 | 288 | 96 | √ | HS | 2 |
72 × 96 | Bneck, 5 × 5 | 576 | 96 | √ | HS | 1 |
72 × 96 | Conv2d, 1 × 1 | – | 576 | √ | HS | 1 |
72 × 576 | AvgPool, 7 × 7 | – | – | – | – | 1 |
12 × 576 | Conv2d, 1 × 1, NBN | – | 1024 | – | HS | 1 |
12 × 1024 | Conv2d, 1 × 1, NBN | – | K | – | – | 1 |
Table 1. Detailed Parameter Settings of MobileNetV3_Light Network Model Structure
4. Experimental Results and Discussion
All experiments reported in this paper were run on a desktop computer with an AMD R7 5800X CPU and an NVIDIA GeForce RTX 3070 GPU. First, a data set comprising 100,000 near-field light-field maps with a resolution of
The flow of the whole test is shown in Fig. 5. MobileNetV3_Light can directly predict the weight coefficients of the obtained patterns
Figure 5.Test flow chart.
We have evaluated the performance simulation at 1073 nm using a step refractive index fiber with a core radius of 11.8 µm and NA of 0.064 as an example. The normalized frequency of this fiber is about 4.43, so it can support six modes, which can be sequentially arranged as LP01, LP11e, LP11o, LP21e, LP21o, and LP02 modes. Due to the simplicity of the modes[30], there are three possible cases of modes propagated by this fiber, which are the first three, five, and six modes. As the number of modes increases, the combination of eigenmodes becomes more complex and the number of near-field optical field images with different mode coefficients increases[25–30]. Therefore, under the condition of supporting six modes with FMF, we generated 1000 random near-field light-field images for testing the MobileNetV3_Light network model for different training periods.
We then calculated the average correlation between the simulated and reconstructed near-field light-field intensities for the test samples after each period of training; the results are shown in Fig. 6. It can be found that the correlation only increases rapidly to above 0.9910 within the first 15 periods, and then starts to converge by the 50th period. We chose to stop training at 100 training periods, when the correlation approaches 0.9995. In addition, we tested the network after 100 training periods, comparing the pattern weights and the relative phase errors. The average errors of the individual pattern weights and relative phases are shown in Tables 2 and 3. It can be seen that the average mode weight error is less than 0.56%, and the average relative phase error is less than 0.85% for all six modes. Compared with the literature[25–29], the scheme proposed in this paper can achieve similar decomposition accuracy. It can be concluded that the trained MobileNetV3_Light can learn the relationship between pattern coefficients and near-field light-field intensity images. It should be emphasized that the mode weight errors and relative phase errors described in this experiment are the average of 1000 samples.
Average weights error | 0.47% | 0.48% | 0.42% | 0.48% | 0.53% | 0.55% |
Table 2. Average Error of the Six Model Weights
Average weights error | 0.47% | 0.48% | 0.42% | 0.48% | 0.53% |
Table 3. Average Error of the Relative Phase of the Six Modes
Figure 6.Average correlations across training periods for the six model cases.
To visually evaluate the method, simulated near-field light-field maps, reconstructed near-field light-field maps, residual maps, and their correlation are collected for multiple sets of samples in Fig. 7. It can be found that the reconstructed near-field light-field maps and the simulated near-field light-field maps are highly similar with very small residuals, which further confirms the accuracy and effectiveness of the method described in this work.
Figure 7.Simulated near-field light-field map, reconstructed near-field light-field map, residual images, and their correlation.
In order to measure the size of the memory footprint of the devices required for the MD technique and the speed of decomposition, we computed the time taken by the network to perform each stage of MD using GPUs and the size of the number of parameters modeled by the network, under the condition of 1000 test legends. The time spent in each stage is reported in detail in Table 4. In Table 4, T1 indicates the training time of the MobileNetV3_Light network, T2 indicates the time of image preprocessing, T3 indicates the calculation time of pattern weights and relative phases, and T4 indicates the time of picking the most suitable phase combination. From Table 4, we can find that it takes 6.27 s to complete MD for 1000 samples using the trained MobileNetV3_Light neural network, of which 2.41 s is required for image preprocessing, and 3.86 s is required for modal weight and relative phase calculation. Note that a single sample takes only 6.27 ms to perform MD. We can see that a near-field optical field image takes only about 6 ms to complete MD, which is much lower than that of the method in the literature[25] that uses the VGG-16 model for MD, which demonstrates that the method has high performance. In addition, such high decomposition efficiency makes the method also capable of potential real-time MD. In Table 5, the sizes of some of the proposed neural network models are compared, where “Parameters” is the number of parameters. It can be seen from Table 5 that the number of parameters of our proposed MobileNetV3_Light network model is only 2.5 million and the size of the network model is 6.5 MB. Compared to the neural network model size of some of the proposed pattern decompositions[36,37], the decomposition scheme proposed in this paper has obvious advantages. Currently proposed neural network methods for MD generally have the problem of large model size, which we avoid better by designing a lightweight neural network. This gives our model a greater advantage on a portable mobile device.
T1 | T2 | T3 | T4 | |
---|---|---|---|---|
Predicting model weight and phase | 267.5 min | 2.41 s | 3.86 s | 36.24 s |
Table 4. Time Spent in Different Phases of Testing
MobileNetV3_Light | MoblieNetV2 | Xception | Resnet50 | VGG-16 | |
---|---|---|---|---|---|
Parameters | 2.5 × 106 | 3.4 × 106 | 22.85 × 106 | 25.56 × 106 | 138.36 × 106 |
Mode size | 6.5 MB | 14.2 MB | 88 MB | 98 MB | 528 MB |
Table 5. Parameter Size of Different Neural Network Models
To access the feasibility of our method in a case with more modes, we extended our investigations to train network models for 8 and 10 modes. As the number of modes increases, the combination of eigenmodes becomes more complex and the number of similar near-field light-field images with different mode coefficients increases. Therefore, to ensure the accuracy of decomposition, we optimized the network by stacking the number of MobileNetV3 blocks in the MobileNetV3_Light network in order to improve the ability of network learning. Furthermore, we augmented the training data set size and improved the resolution of the near-field light-field images, which benefited the fitting process.
In our work, the number of MobileNetV3 blocks corresponding to 8 and 10 patterns was increased to 11 and 15, respectively. The training data set size was extended to 150,000 and 200,000 images, respectively, with the mean resolution of images increased to
Figure 8 depicts the relationship between the number of supported modes and the correlation. It can be found that the correlation decreases as the number of supported modes increases, reaching 0.98 when there are 10 modes. The decomposition scheme based on MobileNetV3_Light does not show an advantage in supporting more modes. The reason may be that the expansion of modes will lead to an increase in the number of similar near-field light-field maps with different mode coefficients, which will introduce ambiguities. A promising way to reduce this error is to introduce far-field light-field images. Far-field light-field images corresponding to similar near-field light-field images with different mode coefficients exhibit significant differences. Therefore, by combining near-field and far-field light-field images, MobileNetV3_Light is expected to accurately predict the mode coefficients with almost no blurring. Figure 8 reveals that our proposed scheme is feasible when the number of modes is less than or equal to six.
Figure 8.Relation between the mode number and correlation.
Finally, we investigate the robustness of MobileNetV3_Light by adding Gaussian noise to the near-field light-field map. For the noise generation, the simulated near-field light-field map is used as the root, which is achieved by multiplying each pixel of the simulated near-field light-field map by a noise function
Figure 9.Simulated and reconstructed images under the influence of different intensities of noise and their correlation.
5. Summary
We propose a complete MD technique based on lightweight neural networks that offer high accuracy, high performance, and low experimental equipment requirements. The proposed algorithm uses depth-separable convolution instead of conventional convolution without any pretraining, which both reduces the network model size and improves the speed of decomposition while maintaining high accuracy in MD conditions. The results show that for the FMF supporting six LP modes (LP01, LP11e, LP11o, LP21e, LP21o, LP02), our trained neural network achieves an average mode weight error of less than 0.56% and the average relative phase error of less than 0.85%. The MD speed reaches about 6 ms per frame, and the model size of the network is only about 6.5 MB, making it feasible for real-time MD on portable mobile devices. Additionally, our proposed method demonstrates robustness, even in the presence of high noise intensity, up to 0.36.
References
[31] F. Chollet. Xception: deep learning with depthwise separable convolutions. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1800(2017).
[32] A. Howard, M. Sandler, B. Chen et al. Searching for MobileNetV3. IEEE/CVF International Conference on Computer Vision (ICCV), 1314(2019).
[33] A. W. Snyder, J. D. Love. Optical Waveguide Theory(1983).
[34] A. G. Howard, M. Zhu, B. Chen et al. Mobilenets: efficient convolutional neural networks for mobile vision applications(2017).
[35] P. Ramachandran, B. Zoph, Q. V. Le. Searching for activation functions(2017).

Set citation alerts for the article
Please enter your email address