基于异构光子神经网络的多模态特征融合

郑一臻; 戴键; 张天; 徐坤

doi:10.37188/CO.2023-0036

基于异构光子神经网络的多模态特征融合

doi: 10.37188/CO.2023-0036

北京邮电大学信息光子学与光通信国家重点实验室, 北京 100876

基金项目: 国家自然科学基金资助（No. 62171055，No. 61705015，No. 61625104，No. 61821001，No. 62135009，No. 61971065）；国家重点研发计划资助（No. 2019YFB1803504）；信息光子学与光通信国家重点实验室（北京邮电大学）基金资助（No. IPOC2020ZT08，No. IPOC2020ZT03）

详细信息

作者简介:
郑一臻（1996—），男，福建漳州人，硕士研究生，2019年于福建师范大学获得学士学位，主要从事智能光计算等方面研究。E-mail：2020111757@bupt.edu.cn

戴　键（1987—），男，安徽合肥人，北京邮电大学电子工程学院副教授，博士生导师，主要从事微波光子学、集成光子学等方面的研究。E-mail：daijian@bupt.edu.cn

张　天（1988—），女，湖北孝感人，北京邮电大学电子工程学院副教授，博士生导师，主要从事智能光计算、光子器件智能设计与优化、微纳光子学等方面的研究。E-mail：ztian@bupt.edu.cn

徐　坤（1973—），男，湖南人，北京邮电大学电子工程学院教授，博士生导师，主要从事信息光子学等方面的研究。E-mail：xukun@bupt.edu.cn

中图分类号: TP183
计量
- 文章访问数: 541
- HTML全文浏览量: 159
- PDF下载量: 178
- 被引次数: 0
出版历程
- 收稿日期: 2023-03-01
- 修回日期: 2023-04-04
- 网络出版日期: 2023-07-11

Multimodal feature fusion based on heterogeneous optical neural networks

State Key Laboratory of Information Photonics and Optical Communications, Beijing University of Posts and Telecommunications, Beijing 100876, China

Funds: Supported by the National Natural Science Foundation of China (No. 62171055, No. 61705015, No. 61625104, No. 61821001, No. 62135009, No. 61971065); National Key Research and Development Program (No. 2019YFB1803504); the State Key Laboratory of Information Photonics and Optical Communications (Beijing University of Posts and Telecommunications) (No. IPOC2020ZT08, No. IPOC2020ZT03)

More Information

Corresponding author: ztian@bupt.edu.cn

摘要

摘要:
当前光子神经网络的研究主要集中在单一模态网络的性能提升上，而缺少对多模态信息处理的研究。与单一模态网络相比，多模态学习可以利用不同模态信息之间的互补性，因此，多模态学习可以使得模型学习到的表示更加完备。本文提出了将光子神经网络和多模态融合技术相结合的方法。首先，利用光子卷积神经网络和光子人工神经网络相结合构建异构光子神经网络，并通过异构光子神经网络处理多模态数据。其次，在融合阶段通过引入注意力机制提升融合效果，最终提高任务分类的准确率。在多模态手写数字数据集分类任务上，使用拼接方法融合的异构光子神经网络的分类准确率为95.75%；引入注意力机制融合的异构光子神经网络的分类准确率为98.31%，并且优于当前众多先进单一模态的光子神经网络。结果显示：与电子异构神经网络相比，该模型训练速度提升了1.7倍。与单一模态的光子神经网络模型相比，异构光子神经网络可以使得模型学习到的表示更加完备，从而有效地提高多模态手写数字数据集分类的准确率。
- 光子神经网络 /
- 多模态 /
- 注意力机制
Abstract:
Current study on photonic neural networks mainly focuses on improving the performance of single-modal networks, while study on multimodal information processing is lacking. Compared with single-modal networks, multimodal learning utilizes complementary information between modalities. Therefore, multimodal learning can make the representation learned by the model more complete. In this paper, we propose a method that combines photonic neural networks and multimodal fusion techniques. First, a heterogeneous photonic neural network is constructed by combining a photonic convolutional neural network and a photonic artificial neural network, and multimodal data are processed by the heterogeneous photonic neural network. Second, the fusion performance is enhanced by introducing attention mechanism in the fusion stage. Ultimately, the accuracy of task classification is improved. In the MNIST dataset of handwritten digits classification task, the classification accuracy of the heterogeneous photonic neural network fused by the splicing method is 95.75%; the heterogeneous photonic neural network fused by introducing the attention mechanism is classified with an accuracy of 98.31%, which is better than many current advanced single-modal photonic neural networks. Compared with the electronic heterogeneous neural network, the training speed of the model is improved by 1.7 times; compared with the single-modality photonic neural network model, the heterogeneous photonic neural network can make the representation learned by the model more complete, thus effectively improving the classification accuracy of MNIST dataset of handwritten digits.
- photonic neural network /
- multimodal /
- attention mechanism

HTML全文

图 1 异构光子神经网络的结构示意图

Figure 1. Schematic diagram of the structure of the heterogeneous photonic neural network

下载: 全尺寸图片幻灯片

图 2 光学卷积结果

Figure 2. Optical convolution results

下载: 全尺寸图片幻灯片

图 3 （a）AbsSquared非线性激活函数结构及（b）其测试结果

Figure 3. (a) AbsSquared nonlinear activation function structure and (b) the test results

下载: 全尺寸图片幻灯片

图 4 端口输出光功率波形图

Figure 4. Port output optical power waveform

下载: 全尺寸图片幻灯片

图 5 学习率和优化器的选择

Figure 5. Learning rate and optimizer selection

下载: 全尺寸图片幻灯片

图 6 空间注意力模块

Figure 6. Spatial attention module

下载: 全尺寸图片幻灯片

图 7 基于注意力机制的异构光子神经网络结构示意图

Figure 7. Schematic diagram of heterogeneous photonic neural network structure based on attention mechanism

下载: 全尺寸图片幻灯片

图 8 基于注意力机制的异构光子神经网络的学习率和优化器的选择

Figure 8. Learning rate and optimizer selection for heterogeneous photonic neural networks based on attentionmechanism

下载: 全尺寸图片幻灯片

图 9 随机高斯噪声对训练集准确率的影响

Figure 9. The effect of random Gaussian noise on the accuracy of the training set

下载: 全尺寸图片幻灯片

表 1 拼接融合的异构电子神经网络训练各部分时间占比

Table 1. Time share of each part of training for heterogeneous electronic neural networks with splicing and fusion

	正向传播	反向传播	参数更新时间	总时间
时间/s	6.39	7.58	1.19	15.16
占比/%	42.14	50.00	7.86	100.00

下载: 导出CSV

表 2 基于注意力机制融合的异构电子神经网络训练各部分时间占比

Table 2. Time share of each part of training of heterogeneous electronic neural networks based on the fusion of attention mechanisms

	正向传播	反向传播	参数更新时间	总时间
时间/s	6.53	8.18	1.09	15.80
占比/%	41.33	51.75	6.92	100.00

下载: 导出CSV

表 3 先进方法分类结果对比表

Table 3. Comparison of classification results of advanced methods

文献	准确率（%）	文献	准确率（%）
文献[25]	97.18	文献[29]	97.37
文献[26]	92.51	文献[30]	98.10
文献[27]	96.10	文献[31]	98.28
文献[28]	96.00	文献[32]	98.75
基于简单拼接融合的方法	95.75	基于简单拼接融合的方法	95.75
基于注意力机制融合的方法	98.31	基于注意力机制融合的方法	98.31

下载: 导出CSV

map

参考文献(33)

[1]	王惠琴, 侯文斌, 黄瑞, 等. 基于深度学习的空间脉冲位置调制多分类检测器[J]. 中国光学,2023,16(2):415-424. doi: 10.37188/CO.2022-0106 WANG H Q, HOU W B, HUANG R, et al. Spatial pulse position modulation multi-classification detector based on deep learning[J]. Chinese Optics, 2023, 16(2): 415-424. (in Chinese) doi: 10.37188/CO.2022-0106
[2]	姜林奇, 宁春玉, 余海涛. 基于多尺度特征与通道特征融合的脑肿瘤良恶性分类模型[J]. 中国光学,2022,15(6):1339-1349. doi: 10.37188/CO.2022-0067 JIANG L Q, NING CH Y, YU H T, et al. Classification model based on fusion of multi-scale feature and channel feature for benign and malignant brain tumors[J]. Chinese Optics, 2022, 15(6): 1339-1349. (in Chinese) doi: 10.37188/CO.2022-0067
[3]	李冠楠, 石俊凯, 陈晓梅, 等. 基于机器学习的过焦扫描显微测量方法研究[J]. 中国光学,2022,15(4):703-711. doi: 10.37188/CO.2022-0009 LI G N, SHI J K, CHEN X M, et al. Through-focus scanning optical microscopy measurement based on machine learning[J]. Chinese Optics, 2022, 15(4): 703-711. (in Chinese) doi: 10.37188/CO.2022-0009
[4]	肖树林, 胡长虹, 高路尧, 等. 像元映射变分辨率光谱成像重构[J]. 中国光学,2022,15(5):1045-1054. doi: 10.37188/CO.2022-0108 XIAO SH L, HU CH H, GAO L Y, et al. Pixel mapping variable-resolution spectral imaging reconstruction[J]. Chinese Optics, 2022, 15(5): 1045-1054. (in Chinese) doi: 10.37188/CO.2022-0108
[5]	MARKRAM H, MULLER E, RAMASWAMY S, et al. Reconstruction and simulation of neocortical microcircuitry[J]. Cell, 2015, 163(2): 456-492. doi: 10.1016/j.cell.2015.09.029
[6]	GOODMAN J W, DIAS A R, WOODY L M. Fully parallel, high-speed incoherent optical method for performing discrete Fourier transforms[J]. Optics Letters, 1978, 2(1): 1-3. doi: 10.1364/OL.2.000001
[7]	RECK M, ZEILINGER A, BERNSTEIN H J, et al. Experimental realization of any discrete unitary operator[J]. Physical Review Letters, 1994, 73(1): 58-61. doi: 10.1103/PhysRevLett.73.58
[8]	CLEMENTS W R, HUMPHREYS P C, METCALF B J, et al. Optimal design for universal multiport interferometers[J]. Optica, 2016, 3(12): 1460-1465. doi: 10.1364/OPTICA.3.001460
[9]	SHEN Y CH, HARRIS N C, SKIRLO S, et al. Deep learning with coherent nanophotonic circuits[J]. Nature Photonics, 2017, 11(7): 441-446. doi: 10.1038/nphoton.2017.93
[10]	ZHANG T, WANG J, LIU Q, et al. Efficient spectrum prediction and inverse design for plasmonic waveguide systems based on artificial neural networks[J]. Photonics Research, 2019, 7(3): 368-380. doi: 10.1364/PRJ.7.000368
[11]	BAGHERIAN H, SKIRLO S, SHEN Y CH, et al. On-chip optical convolutional neural networks[J]. arXiv:, 1808, 03303: 2018.
[12]	QU Y R, ZHU H ZH, SHEN Y CH, et al. Inverse design of an integrated-nanophotonics optical neural network[J]. Science Bulletin, 2020, 65(14): 1177-1183. doi: 10.1016/j.scib.2020.03.042
[13]	DAN Y H, FAN Z Y, SUN X J, et al. All-type optical logic gates using plasmonic coding metamaterials and multi-objective optimization[J]. Optics Express, 2022, 30(7): 11633-11646. doi: 10.1364/OE.449280
[14]	ZHANG CH, YANG Z CH, HE X D, et al. Multimodal intelligence: representation learning, information fusion, and applications[J]. IEEE Journal of Selected Topics in Signal Processing, 2020, 14(3): 478-493. doi: 10.1109/JSTSP.2020.2987728
[15]	HUANG Y, DU CH ZH, XUE Z H, et al.. What makes multi-modal learning better than single (provably)[C]. 35th Conference on Neural Information Processing Systems, NeurIPS, 2021: 10944-10956.
[16]	PENG X K, WEI Y K, DENG A D, et al.. Balanced multimodal learning via on-the-fly gradient modulation[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, 2022: 8228-8237.
[17]	RAMESH A, PAVLOV M, GOH G, et al.. Zero-shot text-to-image generation[C]. Proceedings of the 38th International Conference on Machine Learning, ICML, 2021: 8821-8831.
[18]	NAGRANI A, YANG SH, ARNAB A, et al.. Attention bottlenecks for multimodal fusion[C]. 35th Conference on Neural Information Processing Systems, NeurIPS, 2021: 14200-14213.
[19]	TROSTEN D J, LØKSE S, JENSSEN R, et al.. Reconsidering representation alignment for multi-view clustering[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, 2021: 1255-1265.
[20]	JIA CH, YANG Y F, XIA Y, et al.. Scaling up visual and vision-language representation learning with noisy text supervision[C]. Proceedings of the 38th International Conference on Machine Learning, ICML, 2021: 4904-4916.
[21]	ANASTASOPOULOS A, KUMAR S, LIAO H. Neural language modeling with visual features[J]. arXiv:, 1903, 02930: 2019.
[22]	VIELZEUF V, LECHERVY A, PATEUX S, et al.. Centralnet: a multilayer approach for multimodal fusion[C]. Proceedings of the European Conference on Computer Vision, Munich, 2019: 575-589.
[23]	ZHANG H, GU M, JIANG X D, et al. An optical neural chip for implementing complex-valued neural network[J]. Nature Communications, 2021, 12(1): 457. doi: 10.1038/s41467-020-20719-7
[24]	WOO S, PARK J, LEE J Y, et al.. CBAM: convolutional block attention module[C]. Proceedings of the 15th European Conference on Computer Vision (ECCV), Munich, 2018: 3-19.
[25]	LIN X, RIVENSON Y, YARDIMCI N T, et al. All-optical machine learning using diffractive deep neural networks[J]. Science, 2018, 361(6406): 1004-1008. doi: 10.1126/science.aat8084
[26]	WU Q H, SUI X B, FEI Y H, et al. Multi-layer optical Fourier neural network based on the convolution theorem[J]. AIP Advances, 2021, 11(5): 055012. doi: 10.1063/5.0055446
[27]	FELDMANN J, YOUNGBLOOD N, KARPOV M, et al. Parallel convolutional processing using an integrated photonic tensor core[J]. Nature, 2021, 589(7840): 52-58. doi: 10.1038/s41586-020-03070-1
[28]	ZHANG D N, ZHANG Y J, ZHANG Y, et al. Training and inference of optical neural networks with noise and low-bits control[J]. Applied Sciences, 2021, 11(8): 3692. doi: 10.3390/app11083692
[29]	KRIEGESKORTE N. Deep neural networks: a new framework for modeling biological vision and brain information processing[J]. Annual Review of Vision Science, 2015, 1: 417-446. doi: 10.1146/annurev-vision-082114-035447
[30]	GENG Y, HAN Z B, ZHANG CH Q, et al.. Uncertainty-aware multi-view representation learning[C]. Proceedings of the AAAI Conference on Artificial Intelligence, 2021: 7545-7553.
[31]	JIA X D, JING X Y, ZHU X K, et al. Semi-supervised multi-view deep discriminant representation learning[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(7): 2496-2509. doi: 10.1109/TPAMI.2020.2973634
[32]	HAN Z B, ZHANG CH Q, FU H ZH, et al. Trusted multi-view classification with dynamic evidential fusion[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(2): 2551-2566. doi: 10.1109/TPAMI.2022.3171983
[33]	SHAO R, ZHANG G, GONG X. Generalized robust training scheme using genetic algorithm for optical neural networks with imprecise components[J]. Photonics Research, 2022, 10(8): 1868-1876. doi: 10.1364/PRJ.449570