• Photonics Research
  • Vol. 10, Issue 6, 1491 (2022)
Zhan Li1、2, Shuaishuai Yang1、3, Qi Xiao1、2, Tianyu Zhang1、2, Yong Li1、2, Lu Han1、2, Dean Liu1、4、*, Xiaoping Ouyang1、5、*, and Jianqiang Zhu1
Author Affiliations
  • 1Key Laboratory of High Power Laser and Physics, Shanghai Institute of Optics and Fine Mechanics, Chinese Academy of Sciences, Shanghai 201800, China
  • 2Center of Materials Science and Optoelectronics Engineering, University of Chinese Academy of Sciences, Beijing 100049, China
  • 3Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China
  • 4e-mail: liudean@siom.ac.cn
  • 5e-mail: oyxp@siom.ac.cn
  • show less
    DOI: 10.1364/PRJ.455493 Cite this Article Set citation alerts
    Zhan Li, Shuaishuai Yang, Qi Xiao, Tianyu Zhang, Yong Li, Lu Han, Dean Liu, Xiaoping Ouyang, Jianqiang Zhu. Deep reinforcement with spectrum series learning control for a mode-locked fiber laser[J]. Photonics Research, 2022, 10(6): 1491 Copy Citation Text show less
    GNLSE simulation result from the NPE-based mode-locking laser system. (a) Spectral evolution when EPC is in TL. (b) Spectral evolution when EPC is in TH. (c) Spectral evolution when EPC is in TL initially and then converted to TH after 400 round trips. (d) Light transmittance caused by NPE when EPC is in TL (orange line) and TH (purple line). (e) Spectrum output after 800 round trips when EPC is in TL (orange line), TH (purple line), and Tm (green line). (f) Temporal output after 800 round trips when EPC is in TL (orange line), TH (purple line), and Tm (green line).
    Fig. 1. GNLSE simulation result from the NPE-based mode-locking laser system. (a) Spectral evolution when EPC is in TL. (b) Spectral evolution when EPC is in TH. (c) Spectral evolution when EPC is in TL initially and then converted to TH after 400 round trips. (d) Light transmittance caused by NPE when EPC is in TL (orange line) and TH (purple line). (e) Spectrum output after 800 round trips when EPC is in TL (orange line), TH (purple line), and Tm (green line). (f) Temporal output after 800 round trips when EPC is in TL (orange line), TH (purple line), and Tm (green line).
    Feedback time-series spectrum control model.
    Fig. 2. Feedback time-series spectrum control model.
    MDRL agent layout.
    Fig. 3. MDRL agent layout.
    MDRL environment layout. LD, laser diode; WDM, 980/1060 nm wavelength division multiplexer; YDF, ytterbium-doped fiber; C, coupler; SMF, single-mode fiber; P, polarizer; I, isolator; EPC, electrical polarization controller; SF, optical spectrum filter; D, diagnostic optical spectrum analyzer.
    Fig. 4. MDRL environment layout. LD, laser diode; WDM, 980/1060 nm wavelength division multiplexer; YDF, ytterbium-doped fiber; C, coupler; SMF, single-mode fiber; P, polarizer; I, isolator; EPC, electrical polarization controller; SF, optical spectrum filter; D, diagnostic optical spectrum analyzer.
    Spectrum and time-wave evolution during MDRL search. (a) Spectrum evolution data from the spectrum analyzer. (b) Time-wave evolution data from the high-speed photodetector and oscilloscope. (c) Obtained reward at each search step. (d) Direct autocorrelation output (blue line) and autocorrelation output after dispersion compensation (orange square, purple line).
    Fig. 5. Spectrum and time-wave evolution during MDRL search. (a) Spectrum evolution data from the spectrum analyzer. (b) Time-wave evolution data from the high-speed photodetector and oscilloscope. (c) Obtained reward at each search step. (d) Direct autocorrelation output (blue line) and autocorrelation output after dispersion compensation (orange square, purple line).
    Mode-locked state switch by MSP. (a) Mode-locked state switch by minimizing the difference between PMSP(Wt) (purple line) and PMSP(Wc). (b) Pump power control error LMSP(Wc) (blue line) and MSP predicted error (green dashed line). (c), (g) Typical spectrum and temporal output in FML state. (d), (h) Typical spectrum and temporal output in HML state. (e), (i) Typical spectrum and temporal output in QML state. (f), (j) Typical spectrum and temporal output in QS output.
    Fig. 6. Mode-locked state switch by MSP. (a) Mode-locked state switch by minimizing the difference between PMSP(Wt) (purple line) and PMSP(Wc). (b) Pump power control error LMSP(Wc) (blue line) and MSP predicted error (green dashed line). (c), (g) Typical spectrum and temporal output in FML state. (d), (h) Typical spectrum and temporal output in HML state. (e), (i) Typical spectrum and temporal output in QML state. (f), (j) Typical spectrum and temporal output in QS output.
    Algorithm performance. (a) Total search step from 100 random initial states to the mode-locked state using MDRL (purple solid circle), DDPG (orange solid square), and genetic algorithm (green solid triangle). (b) Search stability test at different temperatures with MDRL (purple), DDPG (orange), and genetic algorithm (green).
    Fig. 7. Algorithm performance. (a) Total search step from 100 random initial states to the mode-locked state using MDRL (purple solid circle), DDPG (orange solid square), and genetic algorithm (green solid triangle). (b) Search stability test at different temperatures with MDRL (purple), DDPG (orange), and genetic algorithm (green).
    Search stability test at different temperatures with MDRL (purple), DDPG (orange), and genetic algorithm (green).
    Fig. 8. Search stability test at different temperatures with MDRL (purple), DDPG (orange), and genetic algorithm (green).
    AlgorithmAverage TimeAverage Search Step
    Genetic algorithm [7]30 min6000
    HLA [6]3.1 s3100
    DDPG [18]1.948 s
    DDPG in this environment5.8 s116.1
    MDRL in this environment0.69 s13.8
    Table 1. Time Consumption Comparison with Recent Works
    Zhan Li, Shuaishuai Yang, Qi Xiao, Tianyu Zhang, Yong Li, Lu Han, Dean Liu, Xiaoping Ouyang, Jianqiang Zhu. Deep reinforcement with spectrum series learning control for a mode-locked fiber laser[J]. Photonics Research, 2022, 10(6): 1491
    Download Citation