• Photonics Research
  • Vol. 10, Issue 12, 2846 (2022)
Carlo M. Valensise1, Ivana Grecco2, Davide Pierangeli1、2、3、*, and Claudio Conti1、2、3
Author Affiliations
  • 1Enrico Fermi Research Center (CREF), 00184 Rome, Italy
  • 2Physics Department, Sapienza University of Rome, 00185 Rome, Italy
  • 3Institute for Complex Systems, National Research Council (ISC-CNR), 00185 Rome, Italy
  • show less
    DOI: 10.1364/PRJ.472932 Cite this Article Set citation alerts
    Carlo M. Valensise, Ivana Grecco, Davide Pierangeli, Claudio Conti. Large-scale photonic natural language processing[J]. Photonics Research, 2022, 10(12): 2846 Copy Citation Text show less
    Three-dimensional PELM for language processing. (A) The text database entry is a paragraph of variable length. Text pre-processing: a sparse representation of the input paragraph is mapped into a Hadamard matrix with phase values in [0,π]. (B) The mask is encoded into the optical wavefront by a phase-only SLM. Free-space propagation of the optical field maps the input data into a 3D intensity distribution (speckle-like volume). (C) Sampling the propagating laser beam in multiple far-field planes enables upscaling the feature space. Intensities picked from all the spatial modes form the output layer H3D that undergoes training via ridge regression. By using three planes (j=3), we get a network capacity C>1010. (D) The example shows a binary text classification problem for large-scale rating.
    Fig. 1. Three-dimensional PELM for language processing. (A) The text database entry is a paragraph of variable length. Text pre-processing: a sparse representation of the input paragraph is mapped into a Hadamard matrix with phase values in [0,π]. (B) The mask is encoded into the optical wavefront by a phase-only SLM. Free-space propagation of the optical field maps the input data into a 3D intensity distribution (speckle-like volume). (C) Sampling the propagating laser beam in multiple far-field planes enables upscaling the feature space. Intensities picked from all the spatial modes form the output layer H3D that undergoes training via ridge regression. By using three planes (j=3), we get a network capacity C>1010. (D) The example shows a binary text classification problem for large-scale rating.
    Photonic sentiment analysis. (A), (B) Training and test accuracy of the 3D-PELM on the IMDb dataset as a function of the number of output channels. The shaded area corresponds to the over-parameterized region. The configuration in (B) allows us to reach very high accuracy in the over-parameterized region with a dataset limited to Ntrain=1186 training points. In (A), the same accuracy is reached in the under-parameterized region with Ntrain=12,278. Black horizontal lines correspond to the maximum test accuracy achieved (0.77). (C) IMDb classification accuracy by varying the number of features M and training dataset size Ntrain. The boundary between the under and over-parameterized region (interpolation threshold), Ntrain=M, is characterized by a sharp accuracy drop (cyan contour line).
    Fig. 2. Photonic sentiment analysis. (A), (B) Training and test accuracy of the 3D-PELM on the IMDb dataset as a function of the number of output channels. The shaded area corresponds to the over-parameterized region. The configuration in (B) allows us to reach very high accuracy in the over-parameterized region with a dataset limited to Ntrain=1186 training points. In (A), the same accuracy is reached in the under-parameterized region with Ntrain=12,278. Black horizontal lines correspond to the maximum test accuracy achieved (0.77). (C) IMDb classification accuracy by varying the number of features M and training dataset size Ntrain. The boundary between the under and over-parameterized region (interpolation threshold), Ntrain=M, is characterized by a sharp accuracy drop (cyan contour line).
    Performances at ultralarge scale. (A)–(C) Test accuracy as a function of M for different input sizes L. In all cases, the 3D-PELM performance saturates in the over-parameterized region, reaching a plateau. A linear fit of the data preceding the plateau shows that the onset of the saturation is faster for datasets with a larger input space. The corresponding angular coefficient m is inset in each panel. (D) Test accuracy varying the training set size for M=0.8×105 and M=1.2×105.
    Fig. 3. Performances at ultralarge scale. (A)–(C) Test accuracy as a function of M for different input sizes L. In all cases, the 3D-PELM performance saturates in the over-parameterized region, reaching a plateau. A linear fit of the data preceding the plateau shows that the onset of the saturation is faster for datasets with a larger input space. The corresponding angular coefficient m is inset in each panel. (D) Test accuracy varying the training set size for M=0.8×105 and M=1.2×105.
    Analysis of the IMDb accuracy. (A), (B) The comparison reports the accuracy for the experimental device (3D-PELM device), the simulated device (3D-PELM numerics), the random projection method with ridge regression (RP), the support vector machine (SVM), and a convolutional neural network (CNN) in both the under-parameterized (M=1×103) and over-parameterized (M=4×104) regimes, for (A) Ntrain=6700 and (B) Ntrain=1500. 8-bit numerical results, when applicable, refer to the over-parameterized regime.
    Fig. 4. Analysis of the IMDb accuracy. (A), (B) The comparison reports the accuracy for the experimental device (3D-PELM device), the simulated device (3D-PELM numerics), the random projection method with ridge regression (RP), the support vector machine (SVM), and a convolutional neural network (CNN) in both the under-parameterized (M=1×103) and over-parameterized (M=4×104) regimes, for (A) Ntrain=6700 and (B) Ntrain=1500. 8-bit numerical results, when applicable, refer to the over-parameterized regime.
    Working PrincipleMLCMachine Learning TaskRef.
    Time-multiplexed cavity14007129107Regression[39]
    Amplitude modulation16,3842000108Human action recognition[27]
    Frequency multiplexing200640105Time series recovery[41]
    Optical multiple scattering50,00064106Chaotic series prediction[38]
    Amplitude Fourier filtering102443,263107Image classification[30]
    Multimode fiber240240105Classification, regression[35]
    Free-space propagation6400784106Classification, regression[34]
    3D optical field120,000131,0441010Natural language processing3D-PELM
    Table 1. Maximum Network Capacity of Current Photonic Neuromorphic Computing Hardware for Supervised Learning
    Carlo M. Valensise, Ivana Grecco, Davide Pierangeli, Claudio Conti. Large-scale photonic natural language processing[J]. Photonics Research, 2022, 10(12): 2846
    Download Citation