Monte Carlo simulation fused with target distribution modeling via deep reinforcement learning for automatic high-efficiency photon distribution estimation

Jianhui Ma; Zun Piao; Shuang Huang; Xiaoman Duan; Genggeng Qin; Linghong Zhou; Yuan Xu

doi:10.1364/PRJ.413486

Journals >Photonics Research >Volume 9 >Issue 3 >Page B45 > Article

Photonics Research
Vol. 9, Issue 3, B45 (2021)

Monte Carlo simulation fused with target distribution modeling via deep reinforcement learning for automatic high-efficiency photon distribution estimation

Jianhui Ma¹, Zun Piao¹, Shuang Huang¹, Xiaoman Duan¹, Genggeng Qin¹, Linghong Zhou^1、2、*, and Yuan Xu^1、3、*

Author Affiliations

¹School of Biomedical Engineering, Southern Medical University, Guangzhou 510515, China

²e-mail: smart@smu.edu.cn

³e-mail: yuanxu@smu.edu.cn

show less

DOI: 10.1364/PRJ.413486 Cite this Article Set citation alerts

Jianhui Ma, Zun Piao, Shuang Huang, Xiaoman Duan, Genggeng Qin, Linghong Zhou, Yuan Xu. Monte Carlo simulation fused with target distribution modeling via deep reinforcement learning for automatic high-efficiency photon distribution estimation[J]. Photonics Research, 2021, 9(3): B45 Copy Citation Text

EndNote(RIS)

BibTex

Plain Text

show less

Automatic scatter estimation framework. The MC algorithm generates raw scatter signals in terms of the X-ray source energy spectrum and system geometry configuration. The DRL scheme (denoted by the dashed black arrow) employs a deep Q-network to interact with the statistical distribution model to yield a satisfactory scatter image.

Fig. 1. Automatic scatter estimation framework. The MC algorithm generates raw scatter signals in terms of the X-ray source energy spectrum and system geometry configuration. The DRL scheme (denoted by the dashed black arrow) employs a deep Q-network to interact with the statistical distribution model to yield a satisfactory scatter image.

Download full size | View in the Article

Network architecture in the DDQN. The network takes a scatter image as input and predicts three possible actions for parameter adjustment. The number at the top denotes the feature map size and channel number, and the operations for each layer are presented at the bottom. For instance, the first hidden layer convolves 16 filters of 3×3 with stride four with the input layer followed by a rectified linear unit (ReLU) activation function, and the output layer is a fully connected linear layer with three outputs.

Fig. 2. Network architecture in the DDQN. The network takes a scatter image as input and predicts three possible actions for parameter adjustment. The number at the top denotes the feature map size and channel number, and the operations for each layer are presented at the bottom. For instance, the first hidden layer convolves 16 filters of 3×3 with stride four with the input layer followed by a rectified linear unit (ReLU) activation function, and the output layer is a fully connected linear layer with three outputs.

Download full size | View in the Article

Fig. 3. (a) is the primary projection of the head and neck (H&N) patient; (b)–(i) represent raw scatter projections that are separately calculated by the MC particle sampling algorithm with source photons of 5×105, 1×106, 5×106, 1×107, 1×108, 1×109, 1×1010, and 1×1012 for the same projection angle.

Download full size | View in the Article

Fig. 4. (a)–(g) are the scatter images of Figs. 3(b)–3(h) smoothed by the over-relaxation smoothing algorithm; (h) corresponds to Fig. 3(i), which is considered a noise free scatter image and utilized as the ground truth.

Download full size | View in the Article

Fig. 5. Intensity profiles of Fig. 4 along the (a) horizontal and (b) vertical directions as denoted by the orange lines in Fig. 4(h).

Download full size | View in the Article

Fig. 6. From top to bottom: six testing results with 5×105, 1×106, 5×106, 1×107, 1×108, and 1×109 source photons. From left to right: primary signals, smoothed scatter signals restored by the over-relaxation algorithm with empirical parameters, smoothed scatter signals restored by the proposed framework, and the ground truth.

Download full size | View in the Article

Fig. 7. (a)–(d) Intensity profiles of the first, second, third, and last rows in Fig. 6. The locations of the profiles (a)–(d) are denoted by orange lines at the last column of Fig. 6.

Download full size | View in the Article

Fig. 8. (a)–(c) indicate boxplots of the metric difference of SSIM, PSNR, and RAE between Empirical and ASEF for all testing cases. metricdiff=metricEmpirical−metricASEF, where metric denotes SSIM, PSNR, and RAE, respectively. (d) is the boxplot of the SSIM comparison of Empirical and ASEF.

Download full size | View in the Article

Fig. 9. Automatic scatter estimation process for a testing case. (a)–(c) are smoothed scatter images at Steps 1, 7, and 13, respectively. (d) and (e) separately plot the SSIM and RAE over steps.

Download full size | View in the Article

Fig. 10. Different scatter images. From left to right: scatter projection input, the ground truth of the scatter image at the first column, and Grad-CAM heatmaps of three subnetworks {Wk,Wω,Wβ}.

Download full size | View in the Article

Fig. 11. From top to bottom: four prostate cases with 5×105, 1×106, 5×106, and 1×107 source photons. From left to right: primary signals, smoothed scatter signals restored by the over-relaxation algorithm with empirical parameters, smoothed scatter signals restored by the proposed framework, and the ground truth.

Download full size | View in the Article

Fig. 12. (a)–(d) Intensity profiles of the four prostate cases presented in Fig. 11. Profile locations are outlined by orange lines in the last column of Fig. 11.

Download full size | View in the Article

1.	Initialize main network weights $W$ and target network weights $\hat{W}$
2.	For $episode = 1, 2, \dots$ , $N_{episode}$ do
3.	For $projection = 1, 2, \dots, N_{prj}$ do
4.	Initialize ${k_{0}, ω_{0}, β_{0}}$
5.	Generate $s_{1}$ using Eq. (10) with ${k_{0}, ω_{0}, β_{0}}$
6.	For $t = 1, 2, \dots$ , $N_{step}$ do
7.	Randomly select one subnetwork from ${W_{k}, W_{ω}, W_{β}}$
8.	With probability $ε$ select action $a_{t}$ randomly
9.	Otherwise choose $a_{t} = \arg \max_{a} [Q_{π} {(s}_{t}, a; W)]$
10.	Adjust parameters ${k_{t}, ω_{t}, β_{t}}$ according to $a_{t}$
11.	Generate $s_{t + 1}$ using Eq. (10) with ${k_{t}, ω_{t}, β_{t}}$
12.	Compute reward $r_{t}$ using Eq. (19)
13.	Store dataset ${s_{t}, a_{t}, r_{t}, s_{t + 1}}$ in experience replay $D$
14.	Randomly sample a mini-batch of dataset from $D$
15.	Compute the gradient of loss function in Eq. (17)
16.	Update main network weights $W = {W_{k}, W_{ω}, W_{β}}$
17.	For every $N_{update}$ steps, let $\hat{W} = W$
18.	End For
19.	End For
20.	End For

Table 1. DDQN Training Process

View in the Article

Parameters	Values	Descriptions
$N_{episode}$	100	Number of training episodes
$N_{prj}$	45	Number of training projections
$N_{step}$	30	Number of steps for each episode
$N_{update}$	20	Number of steps for target network weights update
$D$	2000	Capacity of experience replay memory
$ε$	[0.01, 1]	Probability of random action in $ε$ -greedy algorithm
$γ$	0.6	Discount factor
$l_{r}$	0.001	Learning rate of gradient descent for main network
$N_{batch}$	64	Mini-batch samples for network training

Table 2. Parameters in the DDQN Training Phase

View in the Article

Photon Number	SSIM ( $1 = Best$ )				PSNR (dB)				RAE (%)
	Empirical		ASEF		Empirical		ASEF		Empirical		ASEF
	avg.	std.	avg.	std.	avg.	std.	avg.	std.	avg.	std.	avg.	std.
$5 \times 10^{5}$	0.79	$4.70 \times 10^{- 2}$	0.94	${2.36 \times 10}^{- 2}$	21.54	0.85	26.55	1.34	12.03	$1.27 \times 10^{- 2}$	5.62	${1.27 \times 10}^{- 2}$
$1 \times 10^{6}$	0.88	$3.73 \times 10^{- 2}$	0.96	${1.67 \times 10}^{- 2}$	23.99	0.72	29.05	1.22	8.52	$9.65 \times 10^{- 3}$	4.22	${6.53 \times 10}^{- 3}$
$5 \times 10^{6}$	0.97	$8.83 \times 10^{- 3}$	0.99	${3.85 \times 10}^{- 3}$	30.26	0.91	33.76	1.03	3.81	$4.69 \times 10^{- 3}$	2.42	${3.25 \times 10}^{- 3}$
$1 \times 10^{7}$	0.98	$4.31 \times 10^{- 3}$	0.99	${2.02 \times 10}^{- 3}$	33.19	0.83	36.05	0.89	2.68	$3.14 \times 10^{- 3}$	1.87	${2.35 \times 10}^{- 3}$
$1 \times 10^{8}$	0.99	$4.96 \times 10^{- 4}$	0.99	${3.97 \times 10}^{- 4}$	43.03	0.82	43.96	0.73	0.84	$9.31 \times 10^{- 4}$	0.74	${7.36 \times 10}^{- 4}$
$1 \times 10^{9}$	0.99	$4.84 \times 10^{- 5}$	0.99	${4.64 \times 10}^{- 5}$	52.97	0.91	53.12	0.89	0.27	$3.26 \times 10^{- 4}$	0.26	${3.06 \times 10}^{- 4}$

Table 3. SSIM, PSNR, and RAE Statistics (

avg. \pm std.

) among All Testing Cases^a

View in the Article

	Computation Time (s)
	$5 \times 10^{5}$	$1 \times 10^{6}$	$5 \times 10^{6}$	$1 \times 10^{7}$	$1 \times 10^{8}$	$1 \times 10^{9}$	$1 \times 10^{10}$	$1 \times 10^{11}$
MC	0.43	0.45	0.57	0.83	5.94	60.00	633.95	6402.60
DRL	8.98	4.80	1.94	0.98	0.32	0.29	0.29	0.29
Total	9.41	5.25	2.51	1.81	6.26	60.29	634.24	6402.89