X-ray crystallography experimental data screening based on convolutional neural network algorithms

Zi HUI; Li YU; Huan ZHOU; Lin TANG; Jianhua HE

doi:10.11889/j.0253-3219.2023.hjs.46.030101

Journals >NUCLEAR TECHNIQUES >Volume 46 >Issue 3 >Page 030101 > Article

NUCLEAR TECHNIQUES
Vol. 46, Issue 3, 030101 (2023)

X-ray crystallography experimental data screening based on convolutional neural network algorithms

Zi HUI¹, Li YU^2、3, Huan ZHOU⁴, Lin TANG¹, and Jianhua HE^1、*

Author Affiliations

¹The Institute for Advanced Studies, Wuhan University, Wuhan 430072, China

²Shanghai Institute of Applied Physics, Chinese Academy of Sciences, Shanghai 201800, China

³University of Chinese Academy of Sciences, Beijing 100049, China

⁴Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai 201204, China

show less

DOI: 10.11889/j.0253-3219.2023.hjs.46.030101 Cite this Article

Zi HUI, Li YU, Huan ZHOU, Lin TANG, Jianhua HE. X-ray crystallography experimental data screening based on convolutional neural network algorithms[J]. NUCLEAR TECHNIQUES, 2023, 46(3): 030101 Copy Citation Text

show less

$Comparison of LN83 diffraction pattern before (a) and after (b) gray value equalization$

Fig. 1. Comparison of LN83 diffraction pattern before (a) and after (b) gray value equalization

Download full size | View in the Article

$Diffraction pattern of protein crystal after gray value equalization$

Fig. 2. Diffraction pattern of protein crystal after gray value equalization

Download full size | View in the Article

$LN83 diffraction pattern image enhancement results (a) Original image, (b) Flip left and right, (c) Rotate 90° counterclockwise, (d) Rotate 25° counterclockwise and move 10 pixels to the right, (e) Rotate 110° clockwise, move 5 pixels to the right and 5 pixels to the down, (f) Rotate 60° clockwise$

Fig. 3. LN83 diffraction pattern image enhancement results (a) Original image, (b) Flip left and right, (c) Rotate 90° counterclockwise, (d) Rotate 25° counterclockwise and move 10 pixels to the right, (e) Rotate 110° clockwise, move 5 pixels to the right and 5 pixels to the down, (f) Rotate 60° clockwise

Download full size | View in the Article

Fig. 4. Flow chart of convolutional neural network for training and prediction

Download full size | View in the Article

Fig. 5. Accuracy and operation rate of verification set and test set based on different networks(a) Verification set accuracy, (b) Test set accuracy, (c) Verification set running rate, (d) Test setverification set running rate

Download full size | View in the Article

Fig. 6. t-SNE dimensionality reduction results of six convolutional neural networks (the circle is the "maybe " sample, the cross is the "Miss" sample, and the pentagram is the "hit" sample)(a) MobileNets, (b) ResNet, (c) Inception-v1, (d) Inception-v3, (e) Vgg16, (f) AlexNet

Download full size | View in the Article

Fig. 7. Running rate of LN83 on GPU and CPU

Download full size | View in the Article

Fig. 8. MobileNets hit /maybe (a) and miss sample (b) reliability distribution

Download full size | View in the Article

Fig. 9. Sample selected by MobileNets (a) Hit, (b) Maybe, (c) Miss

Download full size | View in the Article

数据 Dataset	蛋白质 Protein	入射能量 Incident energy / keV	仪器 Instrument	探测器 Detector
LN83	氢化酶蛋白质晶体 Hydrogenase	9.498	MFX	Rayonix
LN84	光系统 II Photosystem II	9.516	MFX	Rayonix
LO19	辛环素 Cyclophilin A	9.442	MFX	Rayonix
L498	嗜热菌蛋白酶 Thermolysin	9.773	CXI	CSPAD

Table 1. Experimental data

数据类型

Data type

布拉格点的数量

Number of Bragg points

有效信息含量

Effective information content

命中Hit

X≥10

较多有效信息 More valid information

也许命中 Maybe

10＞X≥4

较少有效信息 Less valid information

未命中 Miss

X≤3

缺失有效信息 Loss valid information

Table 2. Data classification

网络 Net	网络深度 / 层 Depth / layer	特点 Characteristic
AlexNet	8	网络层数少，采用ReLu激活函数 Less layer, use ReLu activation function
Vgg16	16	采用小卷积核，收敛速度加快 Small convolution kernels to speed up convergence
Inception-V1	22	并行计算，去除全连接层 Parallel computing, remove the full connection layer
Inception-V3	46	并行计算，将卷积拆分，减少数据规模 Parallel computing, split convolution
ResNet101	101	采用残差网络优化学习目标 Optimize learning objectives using residual network
MobileNets-V1	28	卷积可分离，引入全局超参数 Separate the convolution depth, use global hyperparameters

Table 3. Five convolutional neural networks

样品 Samples	验证集准确度 Accurancy / %	测试集准确度 Accurancy
L498-氢化酶蛋白质晶体 Thermolysin	62.2	7/10
LN84-光系统 II Photosystem II	82.3	8/10
LN83-嗜热菌蛋白酶Hydrogenase	81.8	8/10
LO19-辛环素Cyclophilin A	78.0	9/10

Table 4. Verification set and test set accuracy of each samples based on MobileNets

网络 Nets	标签 Label	LN83-氢化酶蛋白质晶体 Hydrogenase
网络 Nets	标签 Label	命中 Hit	也许命中 Maybe	未命中 Miss
MobileNets	命中 Hit	0.919	0.070	0.011
	也许命中 Maybe	0.168	0.701	0.131
	未命中 Miss	0.014	0.043	0.943
Inception-v1	命中 Hit	0.935	0.043	0.022
	也许命中 Maybe	0.350	0.416	0.234
	未命中 Miss	0.008	0.028	0.964
Inception-v3	命中 Hit	0.958	0.029	0.013
	也许命中 Maybe	0.547	0.343	0.109
	未命中 Miss	0.058	0.202	0.740
Vgg16	命中 Hit	0.893	0.086	0.021
	也许命中 Maybe	0.073	0.876	0.051
	未命中 Miss	0.020	0.141	0.840
ResNet	命中 Hit	0.854	0.084	0.063
	也许命中 Maybe	0.015	0.518	0.467
	未命中 Miss	0.001	0.004	0.995
AlexNet	命中 Hit	0.907	0.014	0.079
	也许命中 Maybe	0.927	0.022	0.051
	未命中 Miss	0.509	0.016	0.475

Table 5. Accuracy of verification set and test set using different networks based on LN83

网络Nets	命中/也许命中Hit/maybe	未命中Miss
MobileNets	0.970	0.943
Inception-V1	0.944	0.964
Inception-V3	0.972	0.740
Vgg16	0.974	0.840
ResNet	0.873	0.955
AlexNet	0.925	0.475

Table 6. Accuracy of two classification based on Ln83 sample

Zi HUI, Li YU, Huan ZHOU, Lin TANG, Jianhua HE. X-ray crystallography experimental data screening based on convolutional neural network algorithms[J]. NUCLEAR TECHNIQUES, 2023, 46(3): 030101

Download Citation

Tools

Save the article for my favorites

Paper Information