Citation: | ZHENG Yi-zhen, DAI Jian, ZHANG Tian, XU Kun. Multimodal feature fusion based on heterogeneous optical neural networks[J].Chinese Optics, 2023, 16(6): 1343-1355.doi:10.37188/CO.2023-0036 |
Current study on photonic neural networks mainly focuses on improving the performance of single-modal networks, while study on multimodal information processing is lacking. Compared with single-modal networks, multimodal learning utilizes complementary information between modalities. Therefore, multimodal learning can make the representation learned by the model more complete. In this paper, we propose a method that combines photonic neural networks and multimodal fusion techniques. First, a heterogeneous photonic neural network is constructed by combining a photonic convolutional neural network and a photonic artificial neural network, and multimodal data are processed by the heterogeneous photonic neural network. Second, the fusion performance is enhanced by introducing attention mechanism in the fusion stage. Ultimately, the accuracy of task classification is improved. In the MNIST dataset of handwritten digits classification task, the classification accuracy of the heterogeneous photonic neural network fused by the splicing method is 95.75%; the heterogeneous photonic neural network fused by introducing the attention mechanism is classified with an accuracy of 98.31%, which is better than many current advanced single-modal photonic neural networks. Compared with the electronic heterogeneous neural network, the training speed of the model is improved by 1.7 times; compared with the single-modality photonic neural network model, the heterogeneous photonic neural network can make the representation learned by the model more complete, thus effectively improving the classification accuracy of MNIST dataset of handwritten digits.
[1] |
王惠琴, 侯文斌, 黄瑞, 等. 基于深度学习的空间脉冲位置调制多分类检测器[J]. 中国光学,2023,16(2):415-424.
doi:10.37188/CO.2022-0106
WANG H Q, HOU W B, HUANG R,
et al. Spatial pulse position modulation multi-classification detector based on deep learning[J].
Chinese Optics, 2023, 16(2): 415-424. (in Chinese)
doi:10.37188/CO.2022-0106
|
[2] |
姜林奇, 宁春玉, 余海涛. 基于多尺度特征与通道特征融合的脑肿瘤良恶性分类模型[J]. 中国光学,2022,15(6):1339-1349.
doi:10.37188/CO.2022-0067
JIANG L Q, NING CH Y, YU H T,
et al. Classification model based on fusion of multi-scale feature and channel feature for benign and malignant brain tumors[J].
Chinese Optics, 2022, 15(6): 1339-1349. (in Chinese)
doi:10.37188/CO.2022-0067
|
[3] |
李冠楠, 石俊凯, 陈晓梅, 等. 基于机器学习的过焦扫描显微测量方法研究[J]. 中国光学,2022,15(4):703-711.
doi:10.37188/CO.2022-0009
LI G N, SHI J K, CHEN X M,
et al. Through-focus scanning optical microscopy measurement based on machine learning[J].
Chinese Optics, 2022, 15(4): 703-711. (in Chinese)
doi:10.37188/CO.2022-0009
|
[4] |
肖树林, 胡长虹, 高路尧, 等. 像元映射变分辨率光谱成像重构[J]. 中国光学,2022,15(5):1045-1054.
doi:10.37188/CO.2022-0108
XIAO SH L, HU CH H, GAO L Y,
et al. Pixel mapping variable-resolution spectral imaging reconstruction[J].
Chinese Optics, 2022, 15(5): 1045-1054. (in Chinese)
doi:10.37188/CO.2022-0108
|
[5] |
MARKRAM H, MULLER E, RAMASWAMY S,
et al. Reconstruction and simulation of neocortical microcircuitry[J].
Cell, 2015, 163(2): 456-492.
doi:10.1016/j.cell.2015.09.029
|
[6] |
GOODMAN J W, DIAS A R, WOODY L M. Fully parallel, high-speed incoherent optical method for performing discrete Fourier transforms[J].
Optics Letters, 1978, 2(1): 1-3.
doi:10.1364/OL.2.000001
|
[7] |
RECK M, ZEILINGER A, BERNSTEIN H J,
et al. Experimental realization of any discrete unitary operator[J].
Physical Review Letters, 1994, 73(1): 58-61.
doi:10.1103/PhysRevLett.73.58
|
[8] |
CLEMENTS W R, HUMPHREYS P C, METCALF B J,
et al. Optimal design for universal multiport interferometers[J].
Optica, 2016, 3(12): 1460-1465.
doi:10.1364/OPTICA.3.001460
|
[9] |
SHEN Y CH, HARRIS N C, SKIRLO S,
et al. Deep learning with coherent nanophotonic circuits[J].
Nature Photonics, 2017, 11(7): 441-446.
doi:10.1038/nphoton.2017.93
|
[10] |
ZHANG T, WANG J, LIU Q,
et al. Efficient spectrum prediction and inverse design for plasmonic waveguide systems based on artificial neural networks[J].
Photonics Research, 2019, 7(3): 368-380.
doi:10.1364/PRJ.7.000368
|
[11] |
BAGHERIAN H, SKIRLO S, SHEN Y CH,
et al. On-chip optical convolutional neural networks[J].
arXiv:, 1808, 03303: 2018.
|
[12] |
QU Y R, ZHU H ZH, SHEN Y CH,
et al. Inverse design of an integrated-nanophotonics optical neural network[J].
Science Bulletin, 2020, 65(14): 1177-1183.
doi:10.1016/j.scib.2020.03.042
|
[13] |
DAN Y H, FAN Z Y, SUN X J,
et al. All-type optical logic gates using plasmonic coding metamaterials and multi-objective optimization[J].
Optics Express, 2022, 30(7): 11633-11646.
doi:10.1364/OE.449280
|
[14] |
ZHANG CH, YANG Z CH, HE X D,
et al. Multimodal intelligence: representation learning, information fusion, and applications[J].
IEEE Journal of Selected Topics in Signal Processing, 2020, 14(3): 478-493.
doi:10.1109/JSTSP.2020.2987728
|
[15] |
HUANG Y, DU CH ZH, XUE Z H,
et al.. What makes multi-modal learning better than single (provably)[C].
35th Conference on Neural Information Processing Systems, NeurIPS, 2021: 10944-10956.
|
[16] |
PENG X K, WEI Y K, DENG A D,
et al.. Balanced multimodal learning via on-the-fly gradient modulation[C].
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, 2022: 8228-8237.
|
[17] |
RAMESH A, PAVLOV M, GOH G,
et al.. Zero-shot text-to-image generation[C].
Proceedings of the 38th International Conference on Machine Learning, ICML, 2021: 8821-8831.
|
[18] |
NAGRANI A, YANG SH, ARNAB A,
et al.. Attention bottlenecks for multimodal fusion[C].
35th Conference on Neural Information Processing Systems, NeurIPS, 2021: 14200-14213.
|
[19] |
TROSTEN D J, LØKSE S, JENSSEN R,
et al.. Reconsidering representation alignment for multi-view clustering[C].
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, 2021: 1255-1265.
|
[20] |
JIA CH, YANG Y F, XIA Y,
et al.. Scaling up visual and vision-language representation learning with noisy text supervision[C].
Proceedings of the 38th International Conference on Machine Learning, ICML, 2021: 4904-4916.
|
[21] |
ANASTASOPOULOS A, KUMAR S, LIAO H. Neural language modeling with visual features[J].
arXiv:, 1903, 02930: 2019.
|
[22] |
VIELZEUF V, LECHERVY A, PATEUX S,
et al.. Centralnet: a multilayer approach for multimodal fusion[C].
Proceedings of the European Conference on Computer Vision, Munich, 2019: 575-589.
|
[23] |
ZHANG H, GU M, JIANG X D,
et al. An optical neural chip for implementing complex-valued neural network[J].
Nature Communications, 2021, 12(1): 457.
doi:10.1038/s41467-020-20719-7
|
[24] |
WOO S, PARK J, LEE J Y,
et al.. CBAM: convolutional block attention module[C].
Proceedings of the 15th European Conference on Computer Vision (ECCV), Munich, 2018: 3-19.
|
[25] |
LIN X, RIVENSON Y, YARDIMCI N T,
et al. All-optical machine learning using diffractive deep neural networks[J].
Science, 2018, 361(6406): 1004-1008.
doi:10.1126/science.aat8084
|
[26] |
WU Q H, SUI X B, FEI Y H,
et al. Multi-layer optical Fourier neural network based on the convolution theorem[J].
AIP Advances, 2021, 11(5): 055012.
doi:10.1063/5.0055446
|
[27] |
FELDMANN J, YOUNGBLOOD N, KARPOV M,
et al. Parallel convolutional processing using an integrated photonic tensor core[J].
Nature, 2021, 589(7840): 52-58.
doi:10.1038/s41586-020-03070-1
|
[28] |
ZHANG D N, ZHANG Y J, ZHANG Y,
et al. Training and inference of optical neural networks with noise and low-bits control[J].
Applied Sciences, 2021, 11(8): 3692.
doi:10.3390/app11083692
|
[29] |
KRIEGESKORTE N. Deep neural networks: a new framework for modeling biological vision and brain information processing[J].
Annual Review of Vision Science, 2015, 1: 417-446.
doi:10.1146/annurev-vision-082114-035447
|
[30] |
GENG Y, HAN Z B, ZHANG CH Q,
et al.. Uncertainty-aware multi-view representation learning[C].
Proceedings of the AAAI Conference on Artificial Intelligence, 2021: 7545-7553.
|
[31] |
JIA X D, JING X Y, ZHU X K,
et al. Semi-supervised multi-view deep discriminant representation learning[J].
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(7): 2496-2509.
doi:10.1109/TPAMI.2020.2973634
|
[32] |
HAN Z B, ZHANG CH Q, FU H ZH,
et al. Trusted multi-view classification with dynamic evidential fusion[J].
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(2): 2551-2566.
doi:10.1109/TPAMI.2022.3171983
|
[33] |
SHAO R, ZHANG G, GONG X. Generalized robust training scheme using genetic algorithm for optical neural networks with imprecise components[J].
Photonics Research, 2022, 10(8): 1868-1876.
doi:10.1364/PRJ.449570
|