• NUCLEAR TECHNIQUES
  • Vol. 46, Issue 3, 030101 (2023)
Zi HUI1, Li YU2、3, Huan ZHOU4, Lin TANG1, and Jianhua HE1、*
Author Affiliations
  • 1The Institute for Advanced Studies, Wuhan University, Wuhan 430072, China
  • 2Shanghai Institute of Applied Physics, Chinese Academy of Sciences, Shanghai 201800, China
  • 3University of Chinese Academy of Sciences, Beijing 100049, China
  • 4Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai 201204, China
  • show less
    DOI: 10.11889/j.0253-3219.2023.hjs.46.030101 Cite this Article
    Zi HUI, Li YU, Huan ZHOU, Lin TANG, Jianhua HE. X-ray crystallography experimental data screening based on convolutional neural network algorithms[J]. NUCLEAR TECHNIQUES, 2023, 46(3): 030101 Copy Citation Text show less
    Comparison of LN83 diffraction pattern before (a) and after (b) gray value equalization
    Fig. 1. Comparison of LN83 diffraction pattern before (a) and after (b) gray value equalization
    Diffraction pattern of protein crystal after gray value equalization
    Fig. 2. Diffraction pattern of protein crystal after gray value equalization
    LN83 diffraction pattern image enhancement results (a) Original image, (b) Flip left and right, (c) Rotate 90° counterclockwise, (d) Rotate 25° counterclockwise and move 10 pixels to the right, (e) Rotate 110° clockwise, move 5 pixels to the right and 5 pixels to the down, (f) Rotate 60° clockwise
    Fig. 3. LN83 diffraction pattern image enhancement results (a) Original image, (b) Flip left and right, (c) Rotate 90° counterclockwise, (d) Rotate 25° counterclockwise and move 10 pixels to the right, (e) Rotate 110° clockwise, move 5 pixels to the right and 5 pixels to the down, (f) Rotate 60° clockwise
    Flow chart of convolutional neural network for training and prediction
    Fig. 4. Flow chart of convolutional neural network for training and prediction
    Accuracy and operation rate of verification set and test set based on different networks(a) Verification set accuracy, (b) Test set accuracy, (c) Verification set running rate, (d) Test setverification set running rate
    Fig. 5. Accuracy and operation rate of verification set and test set based on different networks(a) Verification set accuracy, (b) Test set accuracy, (c) Verification set running rate, (d) Test setverification set running rate
    t-SNE dimensionality reduction results of six convolutional neural networks (the circle is the "maybe " sample, the cross is the "Miss" sample, and the pentagram is the "hit" sample)(a) MobileNets, (b) ResNet, (c) Inception-v1, (d) Inception-v3, (e) Vgg16, (f) AlexNet
    Fig. 6. t-SNE dimensionality reduction results of six convolutional neural networks (the circle is the "maybe " sample, the cross is the "Miss" sample, and the pentagram is the "hit" sample)(a) MobileNets, (b) ResNet, (c) Inception-v1, (d) Inception-v3, (e) Vgg16, (f) AlexNet
    Running rate of LN83 on GPU and CPU
    Fig. 7. Running rate of LN83 on GPU and CPU
    MobileNets hit /maybe (a) and miss sample (b) reliability distribution
    Fig. 8. MobileNets hit /maybe (a) and miss sample (b) reliability distribution
    Sample selected by MobileNets (a) Hit, (b) Maybe, (c) Miss
    Fig. 9. Sample selected by MobileNets (a) Hit, (b) Maybe, (c) Miss

    数据

    Dataset

    蛋白质

    Protein

    入射能量

    Incident energy / keV

    仪器

    Instrument

    探测器

    Detector

    LN83氢化酶蛋白质晶体 Hydrogenase9.498MFXRayonix
    LN84光系统 II Photosystem II9.516MFXRayonix
    LO19辛环素 Cyclophilin A9.442MFXRayonix
    L498嗜热菌蛋白酶 Thermolysin9.773CXICSPAD
    Table 1. Experimental data

    数据类型

    Data type

    布拉格点的数量

    Number of Bragg points

    有效信息含量

    Effective information content

    命中HitX≥10较多有效信息 More valid information
    也许命中 Maybe10>X≥4较少有效信息 Less valid information
    未命中 MissX≤3缺失有效信息 Loss valid information
    Table 2. Data classification

    网络

    Net

    网络深度 / 层

    Depth / layer

    特点

    Characteristic

    AlexNet8网络层数少,采用ReLu激活函数 Less layer, use ReLu activation function
    Vgg1616采用小卷积核,收敛速度加快 Small convolution kernels to speed up convergence
    Inception-V122并行计算,去除全连接层 Parallel computing, remove the full connection layer
    Inception-V346并行计算,将卷积拆分,减少数据规模 Parallel computing, split convolution
    ResNet101101采用残差网络优化学习目标 Optimize learning objectives using residual network
    MobileNets-V128卷积可分离,引入全局超参数 Separate the convolution depth, use global hyperparameters
    Table 3. Five convolutional neural networks
    样品 Samples验证集准确度 Accurancy / %测试集准确度 Accurancy
    L498-氢化酶蛋白质晶体 Thermolysin62.27/10
    LN84-光系统 II Photosystem II82.38/10
    LN83-嗜热菌蛋白酶Hydrogenase81.88/10
    LO19-辛环素Cyclophilin A78.09/10
    Table 4. Verification set and test set accuracy of each samples based on MobileNets
    网络 Nets标签 LabelLN83-氢化酶蛋白质晶体 Hydrogenase
    命中 Hit也许命中 Maybe未命中 Miss
    MobileNets命中 Hit0.9190.0700.011
    也许命中 Maybe0.1680.7010.131
    未命中 Miss0.0140.0430.943
    Inception-v1命中 Hit0.9350.0430.022
    也许命中 Maybe0.3500.4160.234
    未命中 Miss0.0080.0280.964
    Inception-v3命中 Hit0.9580.0290.013
    也许命中 Maybe0.5470.3430.109
    未命中 Miss0.0580.2020.740
    Vgg16命中 Hit0.8930.0860.021
    也许命中 Maybe0.0730.8760.051
    未命中 Miss0.0200.1410.840
    ResNet命中 Hit0.8540.0840.063
    也许命中 Maybe0.0150.5180.467
    未命中 Miss0.0010.0040.995
    AlexNet命中 Hit0.9070.0140.079
    也许命中 Maybe0.9270.0220.051
    未命中 Miss0.5090.0160.475
    Table 5. Accuracy of verification set and test set using different networks based on LN83
    网络Nets命中/也许命中Hit/maybe未命中Miss
    MobileNets0.9700.943
    Inception-V10.9440.964
    Inception-V30.9720.740
    Vgg160.9740.840
    ResNet0.8730.955
    AlexNet0.9250.475
    Table 6. Accuracy of two classification based on Ln83 sample
    Zi HUI, Li YU, Huan ZHOU, Lin TANG, Jianhua HE. X-ray crystallography experimental data screening based on convolutional neural network algorithms[J]. NUCLEAR TECHNIQUES, 2023, 46(3): 030101
    Download Citation