Monte Carlo simulation fused with target distribution modeling via deep reinforcement learning for automatic high-efficiency photon distribution estimation

Jianhui Ma; Zun Piao; Shuang Huang; Xiaoman Duan; Genggeng Qin; Linghong Zhou; Yuan Xu

doi:10.1364/PRJ.413486

Particle distribution estimation is an important issue in medical diagnosis. In particular, photon scattering in some medical devices extremely degrades image quality and causes measurement inaccuracy. The Monte Carlo (MC) algorithm is regarded as the most accurate particle estimation approach but is still time-consuming, even with graphic processing unit (GPU) acceleration. The goal of this work is to develop an automatic scatter estimation framework for high-efficiency photon distribution estimation. Specifically, a GPU-based MC simulation initially yields a raw scatter signal with a low photon number to hasten scatter generation. In the proposed method, assume that the scatter signal follows Poisson distribution, where an optimization objective function fused with sparse feature penalty is modeled. Then, an over-relaxation algorithm is deduced mathematically to solve this objective function. For optimizing the parameters in the over-relaxation algorithm, the deep $Q$ -network in the deep reinforcement learning scheme is built to intelligently interact with the over-relaxation algorithm to accurately and rapidly estimate a scatter signal with the large range of photon numbers. Experimental results demonstrated that our proposed framework can achieve superior performance with structural similarity $> 0.94$ , peak signal-to-noise ratio $> 26.55 dB$ , and relative absolute error $< 5.62 %$ , and the lowest computation time for one scatter image generation can be within 2 s.

Parameters

Values

Descriptions

Nepisode

100

Number of training episodes

Nprj

Number of training projections

Nstep

Number of steps for each episode

Nupdate

Number of steps for target network weights update

2000

Capacity of experience replay memory

[0.01, 1]

Probability of random action in ε-greedy algorithm

0.6

Discount factor

0.001

Learning rate of gradient descent for main network

Nbatch

Mini-batch samples for network training

Photon Number

SSIM (1=Best)

PSNR (dB)

RAE (%)

Empirical

ASEF

Empirical

ASEF

Empirical

ASEF

avg.

std.

avg.

std.

avg.

std.

avg.

std.

avg.

std.

avg.

std.

5×105

0.79

4.70×10−2

0.94

2.36×10−2

21.54

0.85

26.55

1.34

12.03

1.27×10−2

5.62

1.27×10−2

1×106

0.88

3.73×10−2

0.96

1.67×10−2

23.99

0.72

29.05

1.22

8.52

9.65×10−3

4.22

6.53×10−3

5×106

0.97

8.83×10−3

0.99

3.85×10−3

30.26

0.91

33.76

1.03

3.81

4.69×10−3

2.42

3.25×10−3

1×107

0.98

4.31×10−3

0.99

2.02×10−3

33.19

0.83

36.05

0.89

2.68

3.14×10−3

1.87

2.35×10−3

1×108

0.99

4.96×10−4

0.99

3.97×10−4

43.03

0.82

43.96

0.73

0.84

9.31×10−4

0.74

7.36×10−4

1×109

0.99

4.84×10−5

0.99

4.64×10−5

52.97

0.91

53.12

0.89

0.27

3.26×10−4

0.26

3.06×10−4

Computation Time (s)

5×105

1×106

5×106

1×107

1×108

1×109

1×1010

1×1011

0.43

0.45

0.57

0.83

5.94

60.00

633.95

6402.60

DRL

8.98

4.80

1.94

0.98

0.32

0.29

Total

9.41

5.25

2.51

1.81

6.26

60.29

634.24

6402.89

1.	Initialize main network weights W and target network weights W^
2.	Forepisode=1,2,…, Nepisodedo
3.	Forprojection=1,2,…,Nprjdo
4.	Initialize {k0,ω0,β0}
5.	Generate s1 using Eq. (10) with {k0,ω0,β0}
6.	Fort=1,2,…, Nstepdo
7.	Randomly select one subnetwork from {Wk,Wω,Wβ}
8.	With probability ε select action at randomly
9.	Otherwise choose at=argmaxa[Qπ(st,a;W)]
10.	Adjust parameters {kt,ωt,βt} according to at
11.	Generate st+1 using Eq. (10) with {kt,ωt,βt}
12.	Compute reward rt using Eq. (19)
13.	Store dataset {st,at,rt,st+1} in experience replay D
14.	Randomly sample a mini-batch of dataset from D
15.	Compute the gradient of loss function in Eq. (17)
16.	Update main network weights W={Wk,Wω,Wβ}
17.	For every Nupdate steps, let W^=W
18.	End For
19.	End For
20.	End For

微信扫一扫：分享

微信扫一扫：分享