• Photonics Research
  • Vol. 9, Issue 3, B45 (2021)
Jianhui Ma1, Zun Piao1, Shuang Huang1, Xiaoman Duan1, Genggeng Qin1, Linghong Zhou1、2、*, and Yuan Xu1、3、*
Author Affiliations
  • 1School of Biomedical Engineering, Southern Medical University, Guangzhou 510515, China
  • 2e-mail: smart@smu.edu.cn
  • 3e-mail: yuanxu@smu.edu.cn
  • show less
    DOI: 10.1364/PRJ.413486 Cite this Article Set citation alerts
    Jianhui Ma, Zun Piao, Shuang Huang, Xiaoman Duan, Genggeng Qin, Linghong Zhou, Yuan Xu. Monte Carlo simulation fused with target distribution modeling via deep reinforcement learning for automatic high-efficiency photon distribution estimation[J]. Photonics Research, 2021, 9(3): B45 Copy Citation Text show less
    Automatic scatter estimation framework. The MC algorithm generates raw scatter signals in terms of the X-ray source energy spectrum and system geometry configuration. The DRL scheme (denoted by the dashed black arrow) employs a deep Q-network to interact with the statistical distribution model to yield a satisfactory scatter image.
    Fig. 1. Automatic scatter estimation framework. The MC algorithm generates raw scatter signals in terms of the X-ray source energy spectrum and system geometry configuration. The DRL scheme (denoted by the dashed black arrow) employs a deep Q-network to interact with the statistical distribution model to yield a satisfactory scatter image.
    Network architecture in the DDQN. The network takes a scatter image as input and predicts three possible actions for parameter adjustment. The number at the top denotes the feature map size and channel number, and the operations for each layer are presented at the bottom. For instance, the first hidden layer convolves 16 filters of 3×3 with stride four with the input layer followed by a rectified linear unit (ReLU) activation function, and the output layer is a fully connected linear layer with three outputs.
    Fig. 2. Network architecture in the DDQN. The network takes a scatter image as input and predicts three possible actions for parameter adjustment. The number at the top denotes the feature map size and channel number, and the operations for each layer are presented at the bottom. For instance, the first hidden layer convolves 16 filters of 3×3 with stride four with the input layer followed by a rectified linear unit (ReLU) activation function, and the output layer is a fully connected linear layer with three outputs.
    (a) is the primary projection of the head and neck (H&N) patient; (b)–(i) represent raw scatter projections that are separately calculated by the MC particle sampling algorithm with source photons of 5×105, 1×106, 5×106, 1×107, 1×108, 1×109, 1×1010, and 1×1012 for the same projection angle.
    Fig. 3. (a) is the primary projection of the head and neck (H&N) patient; (b)–(i) represent raw scatter projections that are separately calculated by the MC particle sampling algorithm with source photons of 5×105, 1×106, 5×106, 1×107, 1×108, 1×109, 1×1010, and 1×1012 for the same projection angle.
    (a)–(g) are the scatter images of Figs. 3(b)–3(h) smoothed by the over-relaxation smoothing algorithm; (h) corresponds to Fig. 3(i), which is considered a noise free scatter image and utilized as the ground truth.
    Fig. 4. (a)–(g) are the scatter images of Figs. 3(b)–3(h) smoothed by the over-relaxation smoothing algorithm; (h) corresponds to Fig. 3(i), which is considered a noise free scatter image and utilized as the ground truth.
    Intensity profiles of Fig. 4 along the (a) horizontal and (b) vertical directions as denoted by the orange lines in Fig. 4(h).
    Fig. 5. Intensity profiles of Fig. 4 along the (a) horizontal and (b) vertical directions as denoted by the orange lines in Fig. 4(h).
    From top to bottom: six testing results with 5×105, 1×106, 5×106, 1×107, 1×108, and 1×109 source photons. From left to right: primary signals, smoothed scatter signals restored by the over-relaxation algorithm with empirical parameters, smoothed scatter signals restored by the proposed framework, and the ground truth.
    Fig. 6. From top to bottom: six testing results with 5×105, 1×106, 5×106, 1×107, 1×108, and 1×109 source photons. From left to right: primary signals, smoothed scatter signals restored by the over-relaxation algorithm with empirical parameters, smoothed scatter signals restored by the proposed framework, and the ground truth.
    (a)–(d) Intensity profiles of the first, second, third, and last rows in Fig. 6. The locations of the profiles (a)–(d) are denoted by orange lines at the last column of Fig. 6.
    Fig. 7. (a)–(d) Intensity profiles of the first, second, third, and last rows in Fig. 6. The locations of the profiles (a)–(d) are denoted by orange lines at the last column of Fig. 6.
    (a)–(c) indicate boxplots of the metric difference of SSIM, PSNR, and RAE between Empirical and ASEF for all testing cases. metricdiff=metricEmpirical−metricASEF, where metric denotes SSIM, PSNR, and RAE, respectively. (d) is the boxplot of the SSIM comparison of Empirical and ASEF.
    Fig. 8. (a)–(c) indicate boxplots of the metric difference of SSIM, PSNR, and RAE between Empirical and ASEF for all testing cases. metricdiff=metricEmpiricalmetricASEF, where metric denotes SSIM, PSNR, and RAE, respectively. (d) is the boxplot of the SSIM comparison of Empirical and ASEF.
    Automatic scatter estimation process for a testing case. (a)–(c) are smoothed scatter images at Steps 1, 7, and 13, respectively. (d) and (e) separately plot the SSIM and RAE over steps.
    Fig. 9. Automatic scatter estimation process for a testing case. (a)–(c) are smoothed scatter images at Steps 1, 7, and 13, respectively. (d) and (e) separately plot the SSIM and RAE over steps.
    Different scatter images. From left to right: scatter projection input, the ground truth of the scatter image at the first column, and Grad-CAM heatmaps of three subnetworks {Wk,Wω,Wβ}.
    Fig. 10. Different scatter images. From left to right: scatter projection input, the ground truth of the scatter image at the first column, and Grad-CAM heatmaps of three subnetworks {Wk,Wω,Wβ}.
    From top to bottom: four prostate cases with 5×105, 1×106, 5×106, and 1×107 source photons. From left to right: primary signals, smoothed scatter signals restored by the over-relaxation algorithm with empirical parameters, smoothed scatter signals restored by the proposed framework, and the ground truth.
    Fig. 11. From top to bottom: four prostate cases with 5×105, 1×106, 5×106, and 1×107 source photons. From left to right: primary signals, smoothed scatter signals restored by the over-relaxation algorithm with empirical parameters, smoothed scatter signals restored by the proposed framework, and the ground truth.
    (a)–(d) Intensity profiles of the four prostate cases presented in Fig. 11. Profile locations are outlined by orange lines in the last column of Fig. 11.
    Fig. 12. (a)–(d) Intensity profiles of the four prostate cases presented in Fig. 11. Profile locations are outlined by orange lines in the last column of Fig. 11.
    1.Initialize main network weights W and target network weights W^
    2.Forepisode=1,2,, Nepisodedo
    3.Forprojection=1,2,,Nprjdo
    4.   Initialize {k0,ω0,β0}
    5.   Generate s1 using Eq. (10) with {k0,ω0,β0}
    6.   Fort=1,2,, Nstepdo
    7.    Randomly select one subnetwork from {Wk,Wω,Wβ}
    8.    With probability ε select action at randomly
    9.    Otherwise choose at=argmaxa[Qπ(st,a;W)]
    10.    Adjust parameters {kt,ωt,βt} according to at
    11.    Generate st+1 using Eq. (10) with {kt,ωt,βt}
    12.    Compute reward rt using Eq. (19)
    13.    Store dataset {st,at,rt,st+1} in experience replay D
    14.    Randomly sample a mini-batch of dataset from D
    15.    Compute the gradient of loss function in Eq. (17)
    16.    Update main network weights W={Wk,Wω,Wβ}
    17.    For every Nupdate steps, let W^=W
    18.   End For
    19.  End For
    20.End For
    Table 1. DDQN Training Process
    ParametersValuesDescriptions
    Nepisode100Number of training episodes
    Nprj45Number of training projections
    Nstep30Number of steps for each episode
    Nupdate20Number of steps for target network weights update
    D2000Capacity of experience replay memory
    ε[0.01, 1]Probability of random action in ε-greedy algorithm
    γ0.6Discount factor
    lr0.001Learning rate of gradient descent for main network
    Nbatch64Mini-batch samples for network training
    Table 2. Parameters in the DDQN Training Phase
    Photon NumberSSIM (1=Best)PSNR (dB)RAE (%)
    EmpiricalASEFEmpiricalASEFEmpiricalASEF
    avg.std.avg.std.avg.std.avg.std.avg.std.avg.std.
    5×1050.794.70×1020.942.36×10221.540.8526.551.3412.031.27×1025.621.27×102
    1×1060.883.73×1020.961.67×10223.990.7229.051.228.529.65×1034.226.53×103
    5×1060.978.83×1030.993.85×10330.260.9133.761.033.814.69×1032.423.25×103
    1×1070.984.31×1030.992.02×10333.190.8336.050.892.683.14×1031.872.35×103
    1×1080.994.96×1040.993.97×10443.030.8243.960.730.849.31×1040.747.36×104
    1×1090.994.84×1050.994.64×10552.970.9153.120.890.273.26×1040.263.06×104
    Table 3. SSIM, PSNR, and RAE Statistics (avg.±std.) among All Testing Casesa
     Computation Time (s)
    5×1051×1065×1061×1071×1081×1091×10101×1011
    MC0.430.450.570.835.9460.00633.956402.60
    DRL8.984.801.940.980.320.290.290.29
    Total9.415.252.511.816.2660.29634.246402.89
    Table 4. Computation Time for One Scatter Image of a Prostate Patient across Different Photon Numbers
    Jianhui Ma, Zun Piao, Shuang Huang, Xiaoman Duan, Genggeng Qin, Linghong Zhou, Yuan Xu. Monte Carlo simulation fused with target distribution modeling via deep reinforcement learning for automatic high-efficiency photon distribution estimation[J]. Photonics Research, 2021, 9(3): B45
    Download Citation