留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于异构光子神经网络的多模态特征融合

郑一臻 戴键 张天 徐坤

郑一臻, 戴键, 张天, 徐坤. 基于异构光子神经网络的多模态特征融合[J]. , 2023, 16(6): 1343-1355. doi: 10.37188/CO.2023-0036
引用本文: 郑一臻, 戴键, 张天, 徐坤. 基于异构光子神经网络的多模态特征融合[J]. , 2023, 16(6): 1343-1355. doi: 10.37188/CO.2023-0036
ZHENG Yi-zhen, DAI Jian, ZHANG Tian, XU Kun. Multimodal feature fusion based on heterogeneous optical neural networks[J]. Chinese Optics, 2023, 16(6): 1343-1355. doi: 10.37188/CO.2023-0036
Citation: ZHENG Yi-zhen, DAI Jian, ZHANG Tian, XU Kun. Multimodal feature fusion based on heterogeneous optical neural networks[J]. Chinese Optics, 2023, 16(6): 1343-1355. doi: 10.37188/CO.2023-0036

基于异构光子神经网络的多模态特征融合

基金项目: 国家自然科学基金资助(No. 62171055,No. 61705015,No. 61625104,No. 61821001,No. 62135009,No. 61971065);国家重点研发计划资助(No. 2019YFB1803504);信息光子学与光通信国家重点实验室(北京邮电大学)基金资助(No. IPOC2020ZT08,No. IPOC2020ZT03)
详细信息
    作者简介:

    郑一臻(1996—),男,福建漳州人,硕士研究生,2019年于福建师范大学获得学士学位,主要从事智能光计算等方面研究。E-mail:2020111757@bupt.edu.cn

    戴 键(1987—),男,安徽合肥人,北京邮电大学电子工程学院副教授,博士生导师,主要从事微波光子学、集成光子学等方面的研究。E-mail:daijian@bupt.edu.cn

    张 天(1988—),女,湖北孝感人,北京邮电大学电子工程学院副教授,博士生导师,主要从事智能光计算、光子器件智能设计与优化、微纳光子学等方面的研究。E-mail:ztian@bupt.edu.cn

    徐 坤(1973—),男,湖南人,北京邮电大学电子工程学院教授,博士生导师,主要从事信息光子学等方面的研究。E-mail:xukun@bupt.edu.cn

  • 中图分类号: TP183

Multimodal feature fusion based on heterogeneous optical neural networks

Funds: Supported by the National Natural Science Foundation of China (No. 62171055, No. 61705015, No. 61625104, No. 61821001, No. 62135009, No. 61971065); National Key Research and Development Program (No. 2019YFB1803504); the State Key Laboratory of Information Photonics and Optical Communications (Beijing University of Posts and Telecommunications) (No. IPOC2020ZT08, No. IPOC2020ZT03)
More Information
  • 摘要:

    当前光子神经网络的研究主要集中在单一模态网络的性能提升上,而缺少对多模态信息处理的研究。与单一模态网络相比,多模态学习可以利用不同模态信息之间的互补性,因此,多模态学习可以使得模型学习到的表示更加完备。本文提出了将光子神经网络和多模态融合技术相结合的方法。首先,利用光子卷积神经网络和光子人工神经网络相结合构建异构光子神经网络,并通过异构光子神经网络处理多模态数据。其次,在融合阶段通过引入注意力机制提升融合效果,最终提高任务分类的准确率。在多模态手写数字数据集分类任务上,使用拼接方法融合的异构光子神经网络的分类准确率为95.75%;引入注意力机制融合的异构光子神经网络的分类准确率为98.31%,并且优于当前众多先进单一模态的光子神经网络。结果显示:与电子异构神经网络相比,该模型训练速度提升了1.7倍。与单一模态的光子神经网络模型相比,异构光子神经网络可以使得模型学习到的表示更加完备,从而有效地提高多模态手写数字数据集分类的准确率。

     

  • 图 1  异构光子神经网络的结构示意图

    Figure 1.  Schematic diagram of the structure of the heterogeneous photonic neural network

    图 2  光学卷积结果

    Figure 2.  Optical convolution results

    图 3  (a)AbsSquared非线性激活函数结构及(b)其测试结果

    Figure 3.  (a) AbsSquared nonlinear activation function structure and (b) the test results

    图 4  端口输出光功率波形图

    Figure 4.  Port output optical power waveform

    图 5  学习率和优化器的选择

    Figure 5.  Learning rate and optimizer selection

    图 6  空间注意力模块

    Figure 6.  Spatial attention module

    图 7  基于注意力机制的异构光子神经网络结构示意图

    Figure 7.  Schematic diagram of heterogeneous photonic neural network structure based on attention mechanism

    图 8  基于注意力机制的异构光子神经网络的学习率和优化器的选择

    Figure 8.  Learning rate and optimizer selection for heterogeneous photonic neural networks based on attentionmechanism

    图 9  随机高斯噪声对训练集准确率的影响

    Figure 9.  The effect of random Gaussian noise on the accuracy of the training set

    表  1  拼接融合的异构电子神经网络训练各部分时间占比

    Table  1.   Time share of each part of training for heterogeneous electronic neural networks with splicing and fusion

    正向传播反向传播参数更新时间总时间
    时间/s6.397.581.1915.16
    占比/%42.1450.007.86100.00
    下载: 导出CSV

    表  2  基于注意力机制融合的异构电子神经网络训练各部分时间占比

    Table  2.   Time share of each part of training of heterogeneous electronic neural networks based on the fusion of attention mechanisms

    正向传播反向传播参数更新时间总时间
    时间/s6.538.181.0915.80
    占比/%41.3351.756.92100.00
    下载: 导出CSV

    表  3  先进方法分类结果对比表

    Table  3.   Comparison of classification results of advanced methods

    文献准确率(%)文献准确率(%)
    文献[25]97.18文献[29]97.37
    文献[26]92.51文献[30]98.10
    文献[27]96.10文献[31]98.28
    文献[28]96.00文献[32]98.75
    基于简单拼接
    融合的方法
    95.75基于简单拼接
    融合的方法
    95.75
    基于注意力机制
    融合的方法
    98.31基于注意力机制
    融合的方法
    98.31
    下载: 导出CSV
    Baidu
  • [1] 王惠琴, 侯文斌, 黄瑞, 等. 基于深度学习的空间脉冲位置调制多分类检测器[J]. 中国光学,2023,16(2):415-424. doi: 10.37188/CO.2022-0106

    WANG H Q, HOU W B, HUANG R, et al. Spatial pulse position modulation multi-classification detector based on deep learning[J]. Chinese Optics, 2023, 16(2): 415-424. (in Chinese) doi: 10.37188/CO.2022-0106
    [2] 姜林奇, 宁春玉, 余海涛. 基于多尺度特征与通道特征融合的脑肿瘤良恶性分类模型[J]. 中国光学,2022,15(6):1339-1349. doi: 10.37188/CO.2022-0067

    JIANG L Q, NING CH Y, YU H T, et al. Classification model based on fusion of multi-scale feature and channel feature for benign and malignant brain tumors[J]. Chinese Optics, 2022, 15(6): 1339-1349. (in Chinese) doi: 10.37188/CO.2022-0067
    [3] 李冠楠, 石俊凯, 陈晓梅, 等. 基于机器学习的过焦扫描显微测量方法研究[J]. 中国光学,2022,15(4):703-711. doi: 10.37188/CO.2022-0009

    LI G N, SHI J K, CHEN X M, et al. Through-focus scanning optical microscopy measurement based on machine learning[J]. Chinese Optics, 2022, 15(4): 703-711. (in Chinese) doi: 10.37188/CO.2022-0009
    [4] 肖树林, 胡长虹, 高路尧, 等. 像元映射变分辨率光谱成像重构[J]. 中国光学,2022,15(5):1045-1054. doi: 10.37188/CO.2022-0108

    XIAO SH L, HU CH H, GAO L Y, et al. Pixel mapping variable-resolution spectral imaging reconstruction[J]. Chinese Optics, 2022, 15(5): 1045-1054. (in Chinese) doi: 10.37188/CO.2022-0108
    [5] MARKRAM H, MULLER E, RAMASWAMY S, et al. Reconstruction and simulation of neocortical microcircuitry[J]. Cell, 2015, 163(2): 456-492. doi: 10.1016/j.cell.2015.09.029
    [6] GOODMAN J W, DIAS A R, WOODY L M. Fully parallel, high-speed incoherent optical method for performing discrete Fourier transforms[J]. Optics Letters, 1978, 2(1): 1-3. doi: 10.1364/OL.2.000001
    [7] RECK M, ZEILINGER A, BERNSTEIN H J, et al. Experimental realization of any discrete unitary operator[J]. Physical Review Letters, 1994, 73(1): 58-61. doi: 10.1103/PhysRevLett.73.58
    [8] CLEMENTS W R, HUMPHREYS P C, METCALF B J, et al. Optimal design for universal multiport interferometers[J]. Optica, 2016, 3(12): 1460-1465. doi: 10.1364/OPTICA.3.001460
    [9] SHEN Y CH, HARRIS N C, SKIRLO S, et al. Deep learning with coherent nanophotonic circuits[J]. Nature Photonics, 2017, 11(7): 441-446. doi: 10.1038/nphoton.2017.93
    [10] ZHANG T, WANG J, LIU Q, et al. Efficient spectrum prediction and inverse design for plasmonic waveguide systems based on artificial neural networks[J]. Photonics Research, 2019, 7(3): 368-380. doi: 10.1364/PRJ.7.000368
    [11] BAGHERIAN H, SKIRLO S, SHEN Y CH, et al. On-chip optical convolutional neural networks[J]. arXiv:, 1808, 03303: 2018.
    [12] QU Y R, ZHU H ZH, SHEN Y CH, et al. Inverse design of an integrated-nanophotonics optical neural network[J]. Science Bulletin, 2020, 65(14): 1177-1183. doi: 10.1016/j.scib.2020.03.042
    [13] DAN Y H, FAN Z Y, SUN X J, et al. All-type optical logic gates using plasmonic coding metamaterials and multi-objective optimization[J]. Optics Express, 2022, 30(7): 11633-11646. doi: 10.1364/OE.449280
    [14] ZHANG CH, YANG Z CH, HE X D, et al. Multimodal intelligence: representation learning, information fusion, and applications[J]. IEEE Journal of Selected Topics in Signal Processing, 2020, 14(3): 478-493. doi: 10.1109/JSTSP.2020.2987728
    [15] HUANG Y, DU CH ZH, XUE Z H, et al.. What makes multi-modal learning better than single (provably)[C]. 35th Conference on Neural Information Processing Systems, NeurIPS, 2021: 10944-10956.
    [16] PENG X K, WEI Y K, DENG A D, et al.. Balanced multimodal learning via on-the-fly gradient modulation[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, 2022: 8228-8237.
    [17] RAMESH A, PAVLOV M, GOH G, et al.. Zero-shot text-to-image generation[C]. Proceedings of the 38th International Conference on Machine Learning, ICML, 2021: 8821-8831.
    [18] NAGRANI A, YANG SH, ARNAB A, et al.. Attention bottlenecks for multimodal fusion[C]. 35th Conference on Neural Information Processing Systems, NeurIPS, 2021: 14200-14213.
    [19] TROSTEN D J, LØKSE S, JENSSEN R, et al.. Reconsidering representation alignment for multi-view clustering[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, 2021: 1255-1265.
    [20] JIA CH, YANG Y F, XIA Y, et al.. Scaling up visual and vision-language representation learning with noisy text supervision[C]. Proceedings of the 38th International Conference on Machine Learning, ICML, 2021: 4904-4916.
    [21] ANASTASOPOULOS A, KUMAR S, LIAO H. Neural language modeling with visual features[J]. arXiv:, 1903, 02930: 2019.
    [22] VIELZEUF V, LECHERVY A, PATEUX S, et al.. Centralnet: a multilayer approach for multimodal fusion[C]. Proceedings of the European Conference on Computer Vision, Munich, 2019: 575-589.
    [23] ZHANG H, GU M, JIANG X D, et al. An optical neural chip for implementing complex-valued neural network[J]. Nature Communications, 2021, 12(1): 457. doi: 10.1038/s41467-020-20719-7
    [24] WOO S, PARK J, LEE J Y, et al.. CBAM: convolutional block attention module[C]. Proceedings of the 15th European Conference on Computer Vision (ECCV), Munich, 2018: 3-19.
    [25] LIN X, RIVENSON Y, YARDIMCI N T, et al. All-optical machine learning using diffractive deep neural networks[J]. Science, 2018, 361(6406): 1004-1008. doi: 10.1126/science.aat8084
    [26] WU Q H, SUI X B, FEI Y H, et al. Multi-layer optical Fourier neural network based on the convolution theorem[J]. AIP Advances, 2021, 11(5): 055012. doi: 10.1063/5.0055446
    [27] FELDMANN J, YOUNGBLOOD N, KARPOV M, et al. Parallel convolutional processing using an integrated photonic tensor core[J]. Nature, 2021, 589(7840): 52-58. doi: 10.1038/s41586-020-03070-1
    [28] ZHANG D N, ZHANG Y J, ZHANG Y, et al. Training and inference of optical neural networks with noise and low-bits control[J]. Applied Sciences, 2021, 11(8): 3692. doi: 10.3390/app11083692
    [29] KRIEGESKORTE N. Deep neural networks: a new framework for modeling biological vision and brain information processing[J]. Annual Review of Vision Science, 2015, 1: 417-446. doi: 10.1146/annurev-vision-082114-035447
    [30] GENG Y, HAN Z B, ZHANG CH Q, et al.. Uncertainty-aware multi-view representation learning[C]. Proceedings of the AAAI Conference on Artificial Intelligence, 2021: 7545-7553.
    [31] JIA X D, JING X Y, ZHU X K, et al. Semi-supervised multi-view deep discriminant representation learning[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(7): 2496-2509. doi: 10.1109/TPAMI.2020.2973634
    [32] HAN Z B, ZHANG CH Q, FU H ZH, et al. Trusted multi-view classification with dynamic evidential fusion[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(2): 2551-2566. doi: 10.1109/TPAMI.2022.3171983
    [33] SHAO R, ZHANG G, GONG X. Generalized robust training scheme using genetic algorithm for optical neural networks with imprecise components[J]. Photonics Research, 2022, 10(8): 1868-1876. doi: 10.1364/PRJ.449570
  • 加载中
图(9) / 表(3)
计量
  • 文章访问数:  494
  • HTML全文浏览量:  149
  • PDF下载量:  175
  • 被引次数: 0
出版历程
  • 收稿日期:  2023-03-01
  • 修回日期:  2023-04-04
  • 网络出版日期:  2023-07-11

目录

    /

    返回文章
    返回
    Baidu
    map