• Photonics Research
  • Vol. 8, Issue 11, 1792 (2020)
Yanan Han1, Shuiying Xiang1、2、*, Yang Wang1, Yuanting Ma1, Bo Wang1, Aijun Wen1, and Yue Hao2
Author Affiliations
  • 1State Key Laboratory of Integrated Service Networks, Xidian University, Xi'an 710071, China
  • 2State Key Discipline Laboratory of Wide Band Gap Semiconductor Technology, School of Microelectronics, Xidian University, Xi'an 710071, China
  • show less
    DOI: 10.1364/PRJ.403319 Cite this Article Set citation alerts
    Yanan Han, Shuiying Xiang, Yang Wang, Yuanting Ma, Bo Wang, Aijun Wen, Yue Hao. Generation of multi-channel chaotic signals with time delay signature concealment and ultrafast photonic decision making based on a globally-coupled semiconductor laser network[J]. Photonics Research, 2020, 8(11): 1792 Copy Citation Text show less

    Abstract

    We propose and demonstrate experimentally and numerically a network of three globally coupled semiconductor lasers (SLs) that generate triple-channel chaotic signals with time delayed signature (TDS) concealment. The effects of the coupling strength and bias current on the concealment of the TDS are investigated. The generated chaotic signals are further applied to reinforcement learning, and a parallel scheme is proposed to solve the multiarmed bandit (MAB) problem. The influences of mutual correlation between signals from different channels, the sampling interval of signals, and the TDS concealment on the performance of decision making are analyzed. Comparisons between the proposed scheme and two existing schemes show that, with a simplified algorithm, the proposed scheme can perform as well as the previous schemes or even better. Moreover, we also consider the robustness of decision making performance against a dynamically changing environment and verify the scalability for MAB problems with different sizes. This proposed globally coupled SL network for a multi-channel chaotic source is simple in structure and easy to implement. The attempt to solve the MAB problem in parallel can provide potential values in the realm of the application of ultrafast photonics intelligence.

    1. INTRODUCTION

    Since its advent, the laser has been applied in many fields due to the advantages of rapid response and rich dynamics [1]. For example, it is used in high-speed random bit generators [2,3], optical secure communication, and secret key distribution that requires synchronized chaotic signals [47]. Recently, photonic technologies have also been developed as efficient ways of solving some conventional problems in the area of artificial intelligence (AI) calculation such as reservoir computing [8,9], reinforcement learning [1012], and brain-inspired photonic neuromorphic computing [1316].

    The security of information transmission has always been a focus of attention. In optical communication systems, chaotic signals can be generated by means of delayed optical feedback, optical injection, and other external disturbances [1722]. However, a time delay signature (TDS) can be introduced (typically by external cavity feedback) and cause internal periodicity of chaotic oscillations [23,24]. This feature can be analyzed by methods like permutation entropy (PE), delayed mutual information, autocorrelation functions (ACF), etc., and utilized for reconstruction of chaotic systems [2529], which seriously threaten the security of communication. Many methods have been reported to complicate and suppress the TDS. For example, Lee et al. first proposed to complicate the TDS in a semiconductor laser (SL) subject to double optical feedback [30], and the result was experimentally demonstrated later by Wu et al. [31]. We also numerically achieved the suppression of TDS in a mutually coupled ring network with heterogeneous time delays [32]. Very recently, Jiang et al. proposed a new scheme for the generation of wideband laser chaos with excellent TDS suppression by using parallel-coupling ring resonators as reflector [33].

    As one of the fundamental problems in reinforcement learning, adequate decision making in a dynamically changing environment is also required in frequency and channel assignments in communication networks [12,34,35]. The multiarmed bandit (MAB) problem is one of the most important issues in decision making. One remarkable method to solve the MAB problem was proposed by Kim et al., called the tug-of-war (TOW) method, which was inspired by the unicellular amoeba of true slime mold [36,37]. In recent years, several works on ultrafast decision making have been reported based on the TOW method [3841]. In our previous work, we have already proposed to solve a four-armed bandit problem in parallel by sampling dual-channel TDS-concealed chaotic signals simultaneously and found it works more efficiently [42]. However, the threshold value (TV) for each channel is set and adjusted dependently; therefore, the scheme is not completely parallel.

    In this paper, we propose a scheme for the generation of laser chaos with TDS concealment and demonstrate its application in reinforcement learning. Our contribution includes three aspects. First, the new proposed scheme for the generation of complex laser chaos is simple in structure and easy to implement. Second, we propose a scheme to solve the MAB problem in parallel via using the generated laser chaos and verify its scalability and adaptability. Third, in order to solve the MAB problem in parallel, we propose a modified strategy and demonstrate its effectiveness.

    2. SYSTEM MODEL AND RESULTS

    A. Experimental Setup

    The experimental setup of three globally coupled SLs is presented in Fig. 1. Here, three distributed feedback (DFB) lasers are driven by laser diode controllers (LDCs) to control the current and temperature of the SLs. The wavelengths of free-running DFB lasers are precisely matched by adjusting the current and temperature. In this setup, the optical output from each DFB laser is divided into two parts through a 10:90 fiber coupler (FC). The smaller part is sent to the measure module, where the optical signal can be detected by a high-speed photodiode (PD, HP11982A, 15 GHz) and analyzed by a real-time oscilloscope (OSC) with 8-bit analog-to-digital converter (Keysight DSOV334A, 33 GHz, 80 GS/s), or directly sent to an optical spectrum analyzer (OSA, AndoAQ6317). The rest of the parts are combined into one with an FC through fiber jumpers with different lengths, then pass through a variable optical attenuator (VOA), and feed back to all the three DFB lasers via an optical circulator (OC). Thus, the coupling strength and feedback strength can be adjusted simultaneously by the VOA. For simplicity, they are referred to as coupling strength in the following.

    Experimental setup of three globally coupled SLs. DFB1, DFB2, DFB3, three distributed feedback lasers; LDC; laser diode controller; FC, fiber coupler; OC, optical circulator; VOA, variable optical attenuator; τ11,τ22,τ33, feedback delay time; PD, photodiode; OSC, oscilloscope; OSA, optical spectrum analyzer.

    Figure 1.Experimental setup of three globally coupled SLs. DFB1, DFB2, DFB3, three distributed feedback lasers; LDC; laser diode controller; FC, fiber coupler; OC, optical circulator; VOA, variable optical attenuator; τ11,τ22,τ33, feedback delay time; PD, photodiode; OSC, oscilloscope; OSA, optical spectrum analyzer.

    B. Experimental Results

    The ACF is one of the effective methods for identifying the TDS of the measured chaotic signals [29,32], as defined in Eq. (1), Cm(Δt)=[Im(t+Δt)Im(t+Δt)][Im(t)Im(t)][Im(t+Δt)Im(t+Δt)]2[Im(t)Im(t)]2,where Cm(Δt) is the ACF value of the chaotic time series Im(t) at time lag Δt, measured from DFBm (m = 1, 2, 3). means time average. The TDS concealment can be reflected by the most pronounced residual peak, denoted as ρm(m=1, 2,3),in the ACF. Better TDS concealment is indicated by a lower value of ρm [32].

    To identify the TDS of DFB1, we turn off DFB2 and DFB3, and calculate the ACF of the output intensity; the round-trip feedback time delay of DFB1 is indicated by the location of ρm in the ACF. By this method, the feedback time delays for DFB1, DFB2, and DFB3 are determined to be 97.4, 97.53, and 97.38 ns, respectively. Note that the time delay values are close, introduced by slightly different propagation paths, and need not be precisely adjusted by the variable optical delay line (VODL). The wavelengths of free-running DFB lasers are precisely set as 1552.250, 1552.265, and 1552.255 nm, respectively, by carefully adjusting the current and temperature.

    Figure 2 shows the measured chaotic time series from the three DFB lasers, the calculated ACF as a function of Δt, as well as the power spectrum. The chaotic dynamics of the three SLs can be revealed by the time series shown in Figs. 2(a1)–2(a3) and the power spectrum in Figs. 2(c1)–2(c3). As can be seen in Figs. 2(b1)–2(b3), no pronounced peaks can be found in the ACFs except for that at time lag 0, which means the TDS is greatly concealed in all three channels.

    (a1)–(a3) The chaotic time series from the three DFB lasers; (b1)–(b3) the ACFs; (c1)–(c3) the power spectra. The attenuation is 9 dB, I1,I2,I3=28.34,24.5,26.6 mA, T1,T2,T3=27.75,15.5,18°C.

    Figure 2.(a1)–(a3) The chaotic time series from the three DFB lasers; (b1)–(b3) the ACFs; (c1)–(c3) the power spectra. The attenuation is 9 dB, I1,I2,I3=28.34,24.5,26.6  mA, T1,T2,T3=27.75,15.5,18°C.

    Then, in order to illustrate the effect of coupling strength on TDS concealment, the ρm as a function of attenuation is presented in Fig. 3(a). Region I (III) indicates that all three DFB lasers are in a quasi-periodic state (chaotic state). Region II represents the transition region where the states of the three DFB lasers can be quasi-periodic, weakly chaotic, and chaotic, but not identical. Examples of the time series and the power spectrum of signals in each state are shown in Fig. 4. It can be seen that ρm is less than 0.1 when the attenuation is larger than 5.0 dB and increases with the decrease of attenuation, indicating that better TDS concealment can be achieved when the attenuation is large, namely, when the coupling strength is relatively small. The influence of bias currents on the TDS concealment is further investigated, as shown in Fig. 3(b). Here, the bias currents of the three DFBs are adjusted at the same time, and we simply present ρm as a function of I2 (which varies from 18.6 to 34.6 mA). It can be seen that ρm is less than 0.1 when I<30.6  mA, indicating that low TDS can be obtained in that region. However, when I>30.6  mA, the ρm values are larger than 0.1 and get larger with the increase of I2, indicating reduced concealment of TDS for all three DFB lasers.

    (a) ρm as a function of attenuation; (b) ρm as a function of I2.

    Figure 3.(a) ρm as a function of attenuation; (b) ρm as a function of I2.

    (a1)–(a3) Time series of signals at states I, II, and III, respectively; (b1)–(b3) the corresponding power spectrum.

    Figure 4.(a1)–(a3) Time series of signals at states I, II, and III, respectively; (b1)–(b3) the corresponding power spectrum.

    C. Numerical Results

    In addition, we also numerically verified the concealment of TDS in the proposed scheme. To model the dynamics of the three DFB lasers, the well-known Lang–Kobayashi equations are adopted, which describe the slowly varying complex electric-field Em(t) and the carrier density Nm(t) in the active region [31,32]. The rate equations of our scheme can be written as dEm(t)dt=1+iα2[Gm(t)1τp]Em(t)+krnn=13En(tτnm)ei(ωnτiΔωnmt)+2DNmξm(t),dNm(t)dt=ImqNm(t)τeGm(t)|Em(t)|2,Gm(t)=g[Nm(t)N0]1+s|Em(t)|2,where m(n)=1,2,3 denotes the three SLs. α=5 is the linewidth enhancement factor [32,43], g=1.5×108  ps1 is the differential gain coefficient, s=5×107 is the nonlinear gain saturation coefficient, and τp=2  ps and τe=2  ns stand for the photon lifetime and the carrier lifetime, respectively. N0=1.5×108 is the transparency carrier number. The variable Im represents the bias current and krn describes the coupling strength. The coupling time delay τnm(nm) from SLn to SLm can be calculated from the feedback time delay τnm(n=m) by τnm=τmn=(τnn+τmm)/2.

    In Fig. 5, we present the time series, the ACF, and the power spectrum of the numerical results as in Fig. 2. The results show that the TDS can be concealed in such a scheme if the parameters are properly selected. Note that the mismatch of parameters is important to improve the concealment of the TDS. When the currents are the same for the three SLs, the region in which the TDS is concealed is quite narrow. To find a proper bias current, we can fix the currents of two SLs and change the other. In this way, we find that a current mismatch of 0.5–3.5 mA allows better TDS concealment in all three SLs. We choose a mismatch of 2.5 mA.

    (a1)–(a3) The chaotic time series from the three SLs; (b1)–(b3) the ACFs, (c1)–(c3) the power spectra. The parameters are: Im=20,22.5,20 mA; krm=11.7,16.7,11.7 ns−1; τmm=2,2.02,2.04 ns; m=1,2,3.

    Figure 5.(a1)–(a3) The chaotic time series from the three SLs; (b1)–(b3) the ACFs, (c1)–(c3) the power spectra. The parameters are: Im=20,22.5,20  mA; krm=11.7,16.7,11.7  ns1; τmm=2,2.02,2.04  ns; m=1,2,3.

    For a further exploration of the parameters’ scope in which the TDS can be better suppressed, we show in Figs. 6(a1)–6(a3) the two-dimensional map of ρm for the three SLs as functions of the coupling strength and bias current of SL2 (for simplicity). The parameter region for ρm<0.2 is considered to have better TDS concealment and is marked by a white dotted line [32]. It can be seen that the evolution patterns for three ρm are similar, and the parameters for low TDS are mainly in the diagonal region, meaning that concealment is affected by both the current and the strength. The PE is also calculated as an indicator of the dynamical state of SLs [44] and is presented in Figs. 6(b1)–6(b3). The dynamics of the SL is in chaotic oscillation when the PE value is larger than 0.99, marked by a black dotted line. As PE decreases, the dynamics goes through chaos to weak chaos and finally enters quasi-periodic oscillation.

    (a1)–(a3) The two-dimensional map of ρm as functions of the coupling strength kr2 and bias current I2 of DFB1, DFB2, and DFB3, respectively; (b1)–(b3) the PE of DFB1, DFB2, and DFB3, respectively. I1=I3,I2=I1+2.5 mA; kr1=kr3,kr2=kr1+5 ns−1; τmm=2,2.02,2.04 ns.

    Figure 6.(a1)–(a3) The two-dimensional map of ρm as functions of the coupling strength kr2 and bias current I2 of DFB1, DFB2, and DFB3, respectively; (b1)–(b3) the PE of DFB1, DFB2, and DFB3, respectively. I1=I3,I2=I1+2.5  mA; kr1=kr3,kr2=kr1+5  ns1; τmm=2,2.02,2.04  ns.

    Moreover, time delay is also an important factor that affects the dynamics of a system, and different time delays may cause different sensitivities to parameter mismatches. Hence, it is necessary to consider different coupling delays in the investigation of TDS concealment. Figures 7(a)–7(c) depict the ρm as a function of I with three different cases of time delay. We can see that in all three cases, the TDS can be concealed with properly selected parameters. Typically, we find that for a larger time delay, stronger coupling strength is required to achieve better TDS concealment. In Fig. 7(d), we further show the ρm as a function of τ11 for all three SLs. It can be seen that for fixed current and coupling strength, the values of ρm remain relatively small as τ11 varies from 1 to 8 ns. The results indicate that in this scheme, the TDS concealment can be achieved with different time delays.

    TDS concealment with different time delays. I1=I3. (a) I2=I1−1 mA, krm=12,10,12 ns−1, τmm=3,3.02,3.06 ns,m=1,2,3; (b) I2=I1−1 mA, krm=12,11,12 ns−1, τmm=3,3.1,3.2 ns, (c) I2=I1−2 mA,krm=13.3,12.3,13.3 ns−1; τmm=4,4.07,4.13 ns, (d) ρm as a function of τ11. τ22=τ11+0.3 ns, τ33=τ11+0.7 ns, Im=21,19,21 mA, krm=12,10,12 ns−1, m=1,2,3.

    Figure 7.TDS concealment with different time delays. I1=I3. (a) I2=I11  mA, krm=12,10,12  ns1, τmm=3,3.02,3.06  ns,m=1,2,3; (b) I2=I11  mA, krm=12,11,12  ns1, τmm=3,3.1,3.2  ns, (c) I2=I12  mA,krm=13.3,12.3,13.3  ns1; τmm=4,4.07,4.13  ns, (d) ρm as a function of τ11. τ22=τ11+0.3  ns, τ33=τ11+0.7  ns, Im=21,19,21  mA, krm=12,10,12  ns1, m=1,2,3.

    3. APPLICATION IN DECISION MAKING

    In this section, we utilize the triple-channel chaotic signals generated from the above scheme to solve an eight-armed bandit problem in parallel. By choosing one of eight slot machines, there is a chance of getting a reward. The reward probabilities are different and unknown to users [40]. Users need to explore the slot machines to find the one that has the highest reward probability, which we call the target machine. Due to the trade-off known as the exploration-exploitation dilemma [40,41], the exploration needs to be effective so that the target machine can be found as quickly as possible and without the risk of missing it.

    A. Scheme of Solving MAB Problem in Parallel

    For an N-armed bandit problem, where N=2k with k being a natural number, k-bit binary number [D1,D2,,Dk] can be used to distinguish the N slot machines [41]. When N=8 (k=3), the eight slot machines can be encoded by [D1,D2,D3]. Figure 8 gives the schematic diagram for solving the eight-armed bandit problem in parallel. We propose a modified strategy for the implementation of the parallel scheme, in which the triple-channel chaotic signals s1,s2,s3 are simultaneously sampled and are, respectively, compared with the threshold values TH1,TH2,TH3 of each channel. Before sampling, the signals are standardized and normalized. A decision is made according to the comparison result, that is, if si(t)THi, Di=0, else Di=1. To be specific, suppose that the triple-channel chaotic signals sampled at t1 are s1(t1),s2(t1),s3(t1); then they are compared with the threshold values TH1,TH2,TH3, respectively. If s1(t1)TH1, the most significant bit is determined as D1=0; if s2(t1)TH2, the second-most significant bit is D2=0; if s3(t1)TH3, the last-significant bit is D3=0. Therefore, the slot machine 1, marked by D=[0,0,0], is chosen. If a reward is given by choosing slot machine 1, then the threshold values are adjusted so that the same decision is more likely to be made in the next cycle. Otherwise, if no reward is yielded, the threshold values are adjusted to reduce the probability of making the same choice the next time.

    Architecture for the eight-armed bandit problem processed in parallel based on triple-channel chaos.

    Figure 8.Architecture for the eight-armed bandit problem processed in parallel based on triple-channel chaos.

    B. Threshold Value Adjustment

    The threshold values of the three channels are independently updated according to THi=kTVi,i=1,2,3, where TVi is the threshold adjuster and takes the integer value from [L,L]. L is a constant integer. Here we set L=10. k is a constant factor to limit the range of THi. The threshold values are adjusted as follows.

    If the selected slot machine yields a reward at t, the TV value is updated at t+1 by {TVi(t+1)=+Δ+αTViif  Di=0TVi(t+1)=Δ+αTViif  Di=1.

    If the selected slot machine yields no reward at t, the TV value is updated at t+1 by {TVi(t+1)=Ωi+αTViif  Di=0TVi(t+1)=+Ωi+αTViif  Di=1,where the increment parameter Δ is fixed unity [41], α=0.99 is a constant memory parameter, Ωi is determined based on the history of getting rewards, and is given by [42] Ωi=P^Di=0+P^Di=1,P^Di=k=NDi=k,hitNDi=k,total.

    NDi=k,total is the total number of times selecting Di=k(i=1,2,3;k=0,1). NDi=k,hit is the number of times that one gets a reward by selecting Di=k. The initial value of TVi is set to 0. Note that for an N-armed bandit problem where N=2k, it only requires k-channel signals and k-threshold values, which greatly simplifies the implementation compared with the previous method that requires 2k1 threshold values [41,42].

    C. Results and Discussion

    To describe the decision-making performance, we define convergence cycle (CC) as the number of the first cycle that reaches a correct decision rate (CDR) of 0.9, where CDR=Nhit/Ntotal is the ratio of the times of getting a reward and the total number of selections. In practice, the average accuracy rate is often adopted to describe a short-time behavior, as the environment is always changing [45]. Here, the CDR is averaged over 400 repeated runs.

    Due to the parallel structure of our scheme, the cross correlation among the triple-channel chaotic signals should be taken into account. The cross-correlation function is introduced as [5] Cmn(Δt)=[Im(t+Δt)Im(t+Δt)][In(t)In(t)][Im(t+Δt)Im(t+Δt)]2[In(t)In(t)]2,where Cmn(Δt) is the cross-correlation coefficient of signals from SLm and SLn at time lag Δt. The zero lag correlation (Δt=0) can be accurately controlled by shifting the signals in the time domain.

    Three channels of zero-lag synchronized chaotic signals may cause an ultrafast convergence when the target is encoded as [0,0,0], making it nearly impossible to recognize the target machine [0,1,0]. For simplicity, to investigate the impact of correlation on the performance of decision making, we only consider the effect of C12(Δt), and the values of C13(Δt) and C23(Δt) are kept close to 0. In Fig. 9, we show the CC as a function of C12(Δt) for three sets of numerically generated signals with different correlations, where Δt=0. Additionally, the result of the one-channel scheme is also calculated for a brief comparison. Here, the distribution of reward probability is P=[0.2,0.2,0.8,0.2,0.2,0.2,0.2,0.2]. It can be seen that as the cross correlation decreases, the CC of the triple-channel scheme is smaller and becomes less than that of the one-channel scheme when C12(Δt)<0.8. This critical value may change with different distributions of reward probability and with different signals. The result shows, obviously, that the performance of the triple-channel scheme could outstrip the one-channel scheme when the correlation of the signals is quite low (which is easy to realize for chaos signals). Therefore, in order to reduce the impact of correlation among the triple-channel signals, we properly shift each set of signals in the time domain so that their cross-correlation coefficient at zero-time lag is around 0. Here, the time lags for the three signals to avoid the cross correlation are 0, 1, and 2 ns, respectively.

    Evolution of CDR for the triple-channel signals with different correlations and for the one-channel scheme. The vertical bars indicate the standard deviation around the mean value for three sets of simulated signals. P=[0.2,0.2,0.8,0.2,0.2,0.2,0.2,0.2].

    Figure 9.Evolution of CDR for the triple-channel signals with different correlations and for the one-channel scheme. The vertical bars indicate the standard deviation around the mean value for three sets of simulated signals. P=[0.2,0.2,0.8,0.2,0.2,0.2,0.2,0.2].

    Next, we compare the decision-making performance of the one-channel scheme and the triple-channel scheme by calculating the CC with different sampling intervals. The results are illustrated in Fig. 10. It can be seen that for both schemes, it converges quickly when the sampling interval is as small as 10 ps, which requires the highest sampling rate that is currently available, but slows down with the increase of sampling interval. Hence, we choose a sampling rate of 10 ps in the following. Also note that the CC value of the triple-channel scheme is statistically lower and grows more slowly than that of the one-channel scheme, which means that in the proposed scheme, it can converge more quickly to the desired accuracy, and the performance is relatively stable against the variation of sampling interval. Note that in Fig. 10 and the following, the CC value of the one-channel scheme is the average of the results of three channel signals.

    CC with different sampling intervals for the one-channel and triple-channel schemes, respectively. The vertical bars indicate the standard deviation around the mean value for eight sets of simulated signals. P=[0.8,0.2,0.2,0.2,0.2,0.2,0.2,0.2].

    Figure 10.CC with different sampling intervals for the one-channel and triple-channel schemes, respectively. The vertical bars indicate the standard deviation around the mean value for eight sets of simulated signals. P=[0.8,0.2,0.2,0.2,0.2,0.2,0.2,0.2].

    Then the experimentally generated signals with varying attenuation are utilized to investigate the influence of TDS on the decision-making performance. The CC as a function of attenuation is presented in Fig. 11(a), and in Fig. 11(b) we show the result of ρm for ease of comparison. The laser dynamics is clarified, as in Fig. 3. It is obvious that when ρm>0.3, especially when it reaches about 0.6, the cycle to reach a CDR of 0.9 is quite large. When ρm<0.3, the change of CC is not directly linked with ρm, but overall, a smaller CC appears with lower ρm. Note that the signals are normalized during preprocessing, so it is not the amplitude of the signals but the characteristics that affect the result. In addition, for a deeper understanding of the influence of TDS suppression on the decision-making performance, we statistically investigate the evolution of CDR using numerical signals with different TDS concealments, where the value of ρm is controlled by slightly changing the bias current, the coupling strength, or the coupling delay of the three SLs. In Fig. 11(c), we show the CDR as a function of the learning cycles using 11 sets of signals with ρm<0.2 and ρm>0.3, respectively. It can be seen that there exist signals with larger ρm that still converge more quickly than those with lower ρm, showing that the decision-making performance does not entirely depend on the suppression of TDS. However, on the whole, it converges faster for signals with lower ρm in a decision-making problem, which indicates that the concealment of TDS can be helpful for better decision-making performance.

    (a), (b) CC and ρm as functions of attenuation; (c) CDR as a function of learning cycles. The vertical bars indicate the standard deviation around the mean value for 11 sets of signals with ρm<0.2 and ρm>0.3, respectively. The sampling interval is 10 ps. P=[0.3,0.2,0.8,0.1,0.2,0.3,0.5,0.4].

    Figure 11.(a), (b) CC and ρm as functions of attenuation; (c) CDR as a function of learning cycles. The vertical bars indicate the standard deviation around the mean value for 11 sets of signals with ρm<0.2 and ρm>0.3, respectively. The sampling interval is 10 ps. P=[0.3,0.2,0.8,0.1,0.2,0.3,0.5,0.4].

    Next, we compare the decision-making performance of the one-channel scheme, the previously proposed parallel scheme [42], and the triple-channel scheme by calculating the CC, where experimentally generated signals with different bias currents are adopted. The results are illustrated in Fig. 12. Triple-channel1 and Triple-channel2 represent the new scheme and the previously proposed scheme, respectively. Three channels of signals are used to solve the eight-armed bandit problem. However, in the Triple-channel2 scheme, the adopted algorithm for threshold adjustment is the same as in the one-channel scheme. It can be seen that for both the triple-channel schemes, the CC is quite stable against the variation of bias current, and the performance is quite similar, whereas for the one-channel scheme, it takes more cycles to reach the desired CDR, and the CC value fluctuates more obviously with the change of bias current, indicating that the one-channel scheme may be more sensitive to the dynamics of signals.

    CC as a function of bias current, for a comparison of the triple-channel scheme (red solid line), the previously investigated parallel scheme (blue dotted line), and the one-channel scheme (black solid line). The vertical bars indicate the standard deviation around the mean value for three runs. P=[0.3,0.2,0.8,0.1,0.2,0.3,0.5,0.4].

    Figure 12.CC as a function of bias current, for a comparison of the triple-channel scheme (red solid line), the previously investigated parallel scheme (blue dotted line), and the one-channel scheme (black solid line). The vertical bars indicate the standard deviation around the mean value for three runs. P=[0.3,0.2,0.8,0.1,0.2,0.3,0.5,0.4].

    In addition, it is necessary to make decisions accurately in a dynamically changing environment, where the slot machine with the highest reward probability may change with time. Figure 13(a) illustrates the evolution of the CDR in a changing environment. We suppose that the target machine changes from slot machine 1 to 3 at the 600th cycle, and slot machines with different probability distributions are considered for comparison. It can be seen that after the sudden change of the target machine, the CDR drops to zero, and then increases rapidly. Meanwhile, one can see that it takes longer time to reach a CDR of 0.9 for P2 than that for P1, because the former has less difference in the distribution of reward probability [12,46]. To further reveal the underlying process of the reinforcement learning, the adaption of the threshold values during the 1200 cycles is presented in Fig. 13(b). In the first 600 cycles where the target slot machine is encoded as [0,0,0], the threshold values TH1, TH2, and TH3 all increase until they eventually fluctuate around a maximum value of 0.5. Hence, the chaotic signals s1(t),s2(t),s3(t) are more likely to be lower than the threshold values THi(i=1,2,3), and the three significant bits [D1,D2,D3] are more likely to be determined as [0,0,0]. When the target machine changes to [0,1,0], after temporary fluctuation around 0, the values of TH1 and TH3 return to about 0.5. The value of TH2 is reduced to about 0.5, which makes it more possible for s2(t) to be larger than TH2, and further results in an increase in the likelihood of choosing the slot machine [0,1,0].

    (a) Evolution of the CDR for different distributions of reward probability in a changing environment. P1=[0.8,0.2,0.2,0.2,0.2,0.2,0.2,0.2], P2=[0.7,0.2,0.3,0.2,0.2,0.2,0.2,0.2]. (b) Threshold value adaption for P2.

    Figure 13.(a) Evolution of the CDR for different distributions of reward probability in a changing environment. P1=[0.8,0.2,0.2,0.2,0.2,0.2,0.2,0.2], P2=[0.7,0.2,0.3,0.2,0.2,0.2,0.2,0.2]. (b) Threshold value adaption for P2.

    Scalability is also very important for a decision-making scheme. Due to the chaotic dynamics of signals, it can be assumed that arbitrarily selected k-channel chaotic signals that are generated from the scheme as in Fig. 1 can be utilized to solve the N-armed bandit problem successfully. To demonstrate this, three channels of experimentally generated signals with varying bias current are randomly selected to solve the eight-armed bandit problem. The evolution of the CDR is presented in Fig. 14, denoted by a red solid line, and the vertical bars indicate the standard deviation around the mean value for 10 different selections. It can be seen that the average CDR is about 330, similar to the result in Fig. 12. Meanwhile, eight different selections of four-channel signals are successfully used to solve a 16-armed bandit problem. The evolution of the CDR is also shown in Fig. 14, represented by the dashed blue line. These results show that random combination of chaotic signals is capable of solving the MAB problem efficiently, and the scalability of our scheme to larger decision problems is verified.

    Evolution of the averaged CDR with randomly selected signals for eight-armed and 16-armed bandit problems. P=[0.3,0.2,0.8,0.1,0.2,0.3,0.5,0.4], and P=[0.3,0.2,0.8,0.1,0.2,0.5,0.2,0.2,0.2,0.3,0.3,0.4,0.5,0.1,0.1,0.2], respectively.

    Figure 14.Evolution of the averaged CDR with randomly selected signals for eight-armed and 16-armed bandit problems. P=[0.3,0.2,0.8,0.1,0.2,0.3,0.5,0.4], and P=[0.3,0.2,0.8,0.1,0.2,0.5,0.2,0.2,0.2,0.3,0.3,0.4,0.5,0.1,0.1,0.2], respectively.

    4. CONCLUSION

    In conclusion, we propose a simple scheme of achieving triple-channel chaotic signals with TDS concealment and demonstrate it via experiment and numerical analysis. The parameters’ range that contributes to better TDS concealment is explored by systematically changing the bias current and the coupling strength. Moreover, we utilize the generated triple-channel chaotic signals and a modified strategy for the realization of an eight-armed bandit problem in parallel; the influences of the signal correlation between each channel, the TDS concealment, and the sampling interval on the performance of decision making are investigated. In the proposed decision-making scheme, the simplified algorithm compared with the one-channel scheme and the previously studied parallel scheme makes it easier for implementation. However, it can perform even better given that the mutual-correlation is relatively low. Moreover, it has stabler performance for different sampling rates than the one-channel scheme. The proposed system is scalable to varying size of MAB problems and is adaptable in changing environments. This work may be helpful for potential applications in the ultrafast processing of AI.

    References

    [1] J. Ohtsubo. Semiconductor Lasers: Stability, Instability and Chaos(2012).

    [2] P. Li, Y. Guo, Y. Q. Guo, Y. L. Fan, X. M. Guo, X. L. Liu, K. Y. Li, K. A. Shorel, Y. C. Wang, A. B. Wang. Ultrafast fully photonic random bit generator. J. Lightwave Technol., 36, 2531-2540(2018).

    [3] S. Y. Xiang, B. Wang, Y. Wang, Y. N. Han, A. J. Wen, Y. Hao. 2.24-Tb/s physical random bit generation with minimal post-processing based on chaotic semiconductor lasers network. J. Lightwave Technol., 37, 3987-3993(2019).

    [4] G. D. Van Wiggeren, R. Roy. Communication with chaotic lasers. Science, 279, 1198-1200(1998).

    [5] C. Posadas-Castillo, R. M. López-Gutiérrez, C. Cruz-Hernández. Synchronization of chaotic solid-state Nd:YAG lasers: application to secure communication. Commun. Nonlinear Sci. Numer. Simul., 13, 1655-1667(2008).

    [6] N. Jiang, W. Pan, L. S. Yan, B. Luo, S. Y. Xiang, L. Yang, D. Zheng, N. Q. Li. Chaos synchronization and communication in multiple time-delayed coupling semiconductor lasers driven by a third laser. IEEE J. Sel. Top. Quantum Electron., 17, 1220-1227(2011).

    [7] C. Xue, N. Jiang, K. Qiu, Y. Lv. Key distribution based on synchronization in bandwidth-enhanced random bit generators with dynamic post-processing. Opt. Express, 23, 14510-14519(2015).

    [8] J. Vatin, D. Rontani, M. Sciamanna. Experimental reservoir computing using VCSEL polarization dynamics. Opt. Express, 27, 18579-18584(2019).

    [9] X. X. Guo, S. Y. Xiang, Y. H. Zhang, L. Lin, A. J. Wen, Y. Hao. Polarization multiplexing reservoir computing based on a VCSEL with polarized optical feedback. IEEE J. Sel. Top. Quantum Electron., 26, 1700109(2020).

    [10] M. Naruse, W. Nomura, M. Aono, M. Ohtsu, Y. Sonnefraud, A. Drezet, S. Huant, S. J. Kim. Decision making based on optical excitation transfer via near-field interactions between quantum dots. J. Appl. Phys., 116, 154303(2014).

    [11] T. Mihana, Y. Mitsui, M. Takabayashi, K. Kazutaka, S. Sunada, M. Naruse, A. Uchida. Decision making for the multi-armed bandit problem using lag synchronization of chaos in mutually-coupled semiconductor lasers. Opt. Express, 27, 26989-27008(2019).

    [12] M. Naruse, N. Chauvet, A. Uchida, A. Drezet, G. Bachelier, S. Huant, H. Hori. Decision making photonics: solving bandit problems using photons. IEEE J. Sel. Top. Quantum Electron., 26, 7700210(2020).

    [13] S. Y. Xiang, Y. Zhang, J. Gong, X. Guo, L. Lin, Y. Hao. STDP-based unsupervised spike pattern learning in a photonic spiking neural network with VCSELs and VCSOAs. IEEE J. Sel. Top. Quantum Electron., 25, 1700109(2019).

    [14] J. Feldmann, N. Youngblood, C. D. Wright, H. Bhaskaran, W. H. P. Pernice. All-optical spiking neurosynaptic networks with self-learning capabilities. Nature, 569, 208-214(2019).

    [15] S. Y. Xiang, Z. X. Ren, Y. H. Zhang, Z. W. Song, Y. Hao. All-optical neuromorphic XOR operation with inhibitory dynamics of a single photonic spiking neuron based on VCSEL-SA. Opt. Lett., 45, 1104-1107(2020).

    [16] S. Y. Xiang, Z. X. Ren, Y. H. Zhang, X. X. Guo, G. Q. Han, Y. Hao. Computing primitive of fully-VCSELs-based all-optical spiking neural network for supervised learning and pattern classification. IEEE Trans. Neural Netw. Learning Syst., 1-12(2020).

    [17] J. G. Wu, Z. M. Wu, X. Tang, X. D. Lin, T. Deng, G. Q. Xia, G. Y. Feng. Simultaneous generation of two sets of time delay signature eliminated chaotic signals by using mutually coupled semiconductor lasers. IEEE Photon. Technol. Lett., 23, 759-761(2011).

    [18] A. B. Wang, Y. B. Yang, B. J. Wang, B. B. Zhang, L. Li, Y. C. Wang. Generation of wide band chaos with suppressed time-delay signature by delayed self-interference. Opt. Express, 21, 8701-8710(2013).

    [19] N. Q. Li, W. Pan, S. Y. Xiang, L. S. Yan, B. Luo, X. H. Zou, L. Y. Zhang, P. H. Mu. Photonic generation of wide band time-delay-signature-eliminated chaotic signals utilizing an optically injected semiconductor laser. IEEE J. Sel. Top. Quantum Electron., 48, 1339-1345(2012).

    [20] T. Deng, Z. M. Wu, G. Q. Xia. Two-mode coexistence in 1550-nm VCSELs with optical feedback. IEEE Photon. Technol. Lett., 27, 2075-2078(2015).

    [21] J. G. Wu, S. W. Huang, Y. J. Huang, H. Zhou, J. H. Yang, J. M. Liu, M. B. Yu, G. Q. Lo, D. L. Kwong, S. K. Duan, C. W. Wong. Mesoscopic chaos mediated by Drude electron-hole plasma in silicon optomechanical oscillators. Nat. Commun., 8, 15570(2017).

    [22] N. Jiang, A. K. Zhao, S. Q. Liu, C. P. Xue, K. Qiu. Chaos synchronization and communication in closed-loop semiconductor lasers subject to common chaotic phase-modulated feedback. Opt. Express, 26, 32404-32416(2018).

    [23] M. J. Bünner, A. Kittel, J. Parisi, I. Fischer, W. Elsäßer. Estimation of delay times from a delayed optical feedback laser experiment. Europhys. Lett., 42, 353-358(1998).

    [24] S. S. Li, S. C. Chan. Chaotic time-delay signature suppression in a semiconductor laser with frequency-detuned grating feedback. IEEE J. Sel. Top. Quantum Electron., 21, 541-552(2015).

    [25] M. J. Bünner, M. Popp, T. Meyer, A. Kittel, J. Parisi. A tool to recover scalar time-delay systems from experimental time series. Phys. Rev. E, 54, 3082-3085(1996).

    [26] R. Hegger, M. J. Bünner, H. Kantz. Identifying and modeling delay feedback systems. Phys. Rev. Lett., 81, 558-561(1998).

    [27] B. P. Bezruchko, A. S. Karavaev, V. I. Ponomarenko, M. D. Prokhorov. Reconstruction of time-delay systems from chaotic time series. Phys. Rev. E, 64, 056216(2001).

    [28] M. C. Soriano, L. Zunino, O. A. Rosso, I. Fischer, C. R. Mirasso. Timescales of a chaotic semiconductor laser with optical feedback under the lens of a permutation information analysis. IEEE J. Quantum Electron., 47, 252-261(2011).

    [29] X. Porte, O. D’Huys, T. Jüngling, X. Porte, D. Brunner, M. C. Soriano, I. Fischer. Autocorrelation properties of chaotic delay dynamical systems: a study on semiconductor lasers. Phys. Rev. E, 90, 052911(2014).

    [30] M. W. Lee, P. Rees, K. A. Shore, S. Ortin, L. Pesquera, A. Valle. Dynamical characterisation of laser diode subject to double optical feedback for chaotic optical communications. IEE P-Optoelectron., 152, 97-102(2005).

    [31] J. G. Wu, G. Q. Xia, Z. M. Wu. Suppression of time delay signatures of chaotic output in a semiconductor laser with double optical feedback. Opt. Express, 17, 20124-20133(2009).

    [32] S. Y. Xiang, A. J. Wen, W. Pan, L. Lin, H. X. Zhang, H. Zhang, X. X. Guo, J. F. Li. Suppression of chaos time delay signature in a ring network consisting of three semiconductor lasers coupled with heterogeneous delays. J. Lightwave Technol., 34, 4221-4227(2016).

    [33] N. Jiang, Y. J. Wang, A. Zhao, S. Q. Liu, Y. Q. Zhang, L. Chen, B. C. Li, K. Qiu. Simultaneous bandwidth-enhanced and time delay signature-suppressed chaos generation in semiconductor laser subject to feedback from parallel coupling ring resonators. Opt. Express, 28, 1999-2009(2020).

    [34] L. Lai, H. ElGamal, H. Jiang, H. V. Poor. Cognitive medium access: exploration, exploitation, and competition. IEEE Trans. Mobile Comput., 10, 239-253(2011).

    [35] K. Kuroda, H. Kato, S.-J. Kim, M. Naruse, M. Hasegawa. Improving throughput using multi-armed bandit algorithm for wireless LANs. Nonlinear Theory Its Applications IEICE, 9, 74-81(2018).

    [36] K. Morihiro, N. Matsui, H. Nishimura. Chaotic exploration effects on reinforcement learning in shortcut maze task. Int. J. Bifurcation Chaos Appl. Sci. Eng., 16, 3015-3022(2006).

    [37] S. J. Kim, M. Aono, E. Nameda. Efficient decision-making by volume-conserving physical object. New J. Phys., 17, 083023(2015).

    [38] S. J. Kim, M. Naruse, M. Aono, M. Ohtsu, M. Hara. Decision maker based on nanoscale photo-excitation transfer. Sci. Rep., 3, 2370(2013).

    [39] M. Naruse, M. Berthel, A. Drezet, S. Huant, H. Hori, S. J. Kim. Single photon in hierarchical architecture for physical decision making: photon intelligence. ACS Photon., 3, 2505-2514(2016).

    [40] T. Mihana, Y. Terashima, M. Naruse, S. J. Kim, A. Uchida. Memory effect on adaptive decision making with a chaotic semiconductor laser. Complexity, 2018, 4318127(2018).

    [41] M. Naruse, T. Mihana, H. Hori, H. Saigo, K. Okamura, M. Hasegawa, A. Uchida. Scalable photonic reinforcement learning by time-division multiplexing of laser chaos. Sci. Rep., 8, 10890(2018).

    [42] Y. T. Ma, S. Y. Xiang, X. X. Guo, Z. W. Song, A. J. Wen, Y. Hao. Time-delay signature concealment of chaos and ultrafast decision making in mutually coupled semiconductor lasers with a phase-modulated Sagnac loop. Opt. Express, 28, 1665-1678(2020).

    [43] L. Zunino, O. A. Rosso, M. C. Soriano. Characterizing the hyperchaotic dynamics of a semiconductor laser subject to optical feedback via permutation entropy. IEEE J. Sel. Top. Quantum Electron., 17, 1250-1257(2011).

    [44] C. Bandt, B. Pompe. Permutation entropy: a natural complexity measure for time series. Phys. Rev. Lett., 88, 174102(2002).

    [45] S. J. Kim, M. Aono, M. Hara. Tug-of-war model for the two-bandit problem: nonlocally-correlated parallel exploration via resource conservation. Biosystems, 101, 29-36(2010).

    [46] M. Naruse, Y. Terashima, A. Uchida, S. J. Kim. Ultrafast photonic reinforcement learning based on laser chaos. Sci. Rep., 7, 8772(2017).

    Yanan Han, Shuiying Xiang, Yang Wang, Yuanting Ma, Bo Wang, Aijun Wen, Yue Hao. Generation of multi-channel chaotic signals with time delay signature concealment and ultrafast photonic decision making based on a globally-coupled semiconductor laser network[J]. Photonics Research, 2020, 8(11): 1792
    Download Citation