一级黄色片免费播放|中国黄色视频播放片|日本三级a|可以直接考播黄片影视免费一级毛片

高級搜索

留言板

尊敬的讀者、作者、審稿人, 關(guān)于本刊的投稿、審稿、編輯和出版的任何問題, 您可以本頁添加留言。我們將盡快給您答復。謝謝您的支持!

姓名
郵箱
手機號碼
標題
留言內(nèi)容
驗證碼

魔方派:面向全同態(tài)加密的存算模運算加速器設(shè)計

李冰 劉懷駿 張偉功

李冰, 劉懷駿, 張偉功. 魔方派:面向全同態(tài)加密的存算模運算加速器設(shè)計[J]. 電子與信息學報, 2023, 45(9): 3302-3310. doi: 10.11999/JEIT230349
引用本文: 李冰, 劉懷駿, 張偉功. 魔方派:面向全同態(tài)加密的存算模運算加速器設(shè)計[J]. 電子與信息學報, 2023, 45(9): 3302-3310. doi: 10.11999/JEIT230349
LI Bing, LIU Huaijun, ZHANG Weigong. M2PI: Processing-in-Memory Modular Computing Accelerator for Full Homomorphic Encryption[J]. Journal of Electronics & Information Technology, 2023, 45(9): 3302-3310. doi: 10.11999/JEIT230349
Citation: LI Bing, LIU Huaijun, ZHANG Weigong. M2PI: Processing-in-Memory Modular Computing Accelerator for Full Homomorphic Encryption[J]. Journal of Electronics & Information Technology, 2023, 45(9): 3302-3310. doi: 10.11999/JEIT230349

魔方派:面向全同態(tài)加密的存算模運算加速器設(shè)計

doi: 10.11999/JEIT230349 cstr: 32379.14.JEIT230349
基金項目: 國家自然基金項目(62204164)
詳細信息
    作者簡介:

    李冰:女,博士,副研究員,研究方向為計算機系統(tǒng)結(jié)構(gòu)、存內(nèi)計算架構(gòu)芯片設(shè)計

    劉懷駿:男,碩士生,研究方向為計算機體系結(jié)構(gòu)、存內(nèi)計算、憶阻器

    張偉功:男,博士,研究員,研究方向為計算機體系結(jié)構(gòu)、高可靠嵌入式系統(tǒng)設(shè)計

    通訊作者:

    李冰 bing.li@cnu.edu.cn

  • 中圖分類號: TN401; TP391

M2PI: Processing-in-Memory Modular Computing Accelerator for Full Homomorphic Encryption

Funds: The National Natural Science Foundation of China (62204164)
  • 摘要: 全同態(tài)加密(FHE)由于其可以實現(xiàn)隱私數(shù)據(jù)的計算,大大提高了數(shù)據(jù)的安全性而在醫(yī)療診斷、云計算、機器學習等領(lǐng)域取得了廣泛的關(guān)注。但是全同態(tài)密碼高昂的計算代價阻礙了其廣泛應(yīng)用。即使經(jīng)過算法和軟件設(shè)計優(yōu)化,F(xiàn)HE全同態(tài)加密中一個整數(shù)明文的密文數(shù)據(jù)規(guī)??梢赃_到56 MByte,端側(cè)生成的密鑰最大都會達到11 k Byte。密文以及密鑰數(shù)據(jù)規(guī)模過大引起嚴重的計算和訪存瓶頸。存內(nèi)計算(PIM)是一個解決該問題的有效方案,其完全消除了內(nèi)存墻的延遲和功耗問題,在端側(cè)計算大規(guī)模數(shù)據(jù)時更具優(yōu)勢。利用存內(nèi)計算加速全同態(tài)計算的工作已經(jīng)被廣泛研究,但是全同態(tài)加密端側(cè)的執(zhí)行過程由于耗時的模運算也面臨著執(zhí)行時間的瓶頸。該文分析了BFV方案加密、解密、密鑰生成操作中各個關(guān)鍵算子的計算開銷,發(fā)現(xiàn)模計算的計算開銷平均占比達到了41%,延遲占比中訪存占97%,因此,該文提出一個名為魔方派(M2PI)的基于靜態(tài)隨機存取存儲器(SRAM)存內(nèi)計算的模運算加速器設(shè)計。實驗結(jié)果表明,該文所提加速器相比CPU中模計算有1.77倍的計算速度提升以及32.76倍能量的節(jié)省。
  • 圖  1  全同態(tài)加密完整過程

    圖  2  不同參數(shù)規(guī)模下各階段算子執(zhí)行時間對比

    圖  3  優(yōu)化前后模計算延遲對比

    圖  4  巴雷特模運算中計算和訪存時間對比

    圖  5  用于模計算的 PIM 陣列

    圖  6  按位邏輯計算

    圖  7  多比特邏輯運算

    圖  8  PIM 中實現(xiàn)計算

    圖  10  優(yōu)化前后各計算過程執(zhí)行速度對比

    圖  9  PIM 中執(zhí)行模運算與之前結(jié)果對比

    圖  11  優(yōu)化前后 4 個計算過程執(zhí)行能量節(jié)省

    算法 1 巴雷特模算法
     計算:$r = c{\text{ mod }}q$ (qn 位數(shù),2n)
     準備:模數(shù)q, 被模數(shù)c
     (1) $mu = {2^{2n}}/q$
     (2) ${t_1} = c > > n - 1$
     (3) ${t_2} = {t_1} \times mu$
     (4) ${t_3} = {t_2} > > n + 1$
     (5) $ {r_1} = c{\text{ }}\% {\text{ }}{2^{n + 1}} $
     (6) ${r_2} = ({t_3} \times q){\text{ \% }}{{\text{2}}^{n + 1}}$
     (7) $r = {r_1} - {r_2}$
     判斷:$r > q/2?(r - q):r$
     返回 r
    下載: 導出CSV
    算法 2 PIM中執(zhí)行巴雷特模算法
     計算:$r = c{\text{ mod }}q$ (qn bit數(shù))
     (1) $x = c > > n - 1$
     (2) $a = x < < n - 1$
     (3) $r = c - a$
     判斷:$r > q/2?(r - q):r$
     返回 r
    下載: 導出CSV
    算法 3 PIM 中實現(xiàn)減法
     輸入:c(被減數(shù)),$na$(${{a}}$對應(yīng)的補碼)
     輸出:${\text{Sub}}$(為最終減法結(jié)果)
     (1) $t = (((c + na)' + c)' + ((c + na)' + na)')'$
     (2) $b = ((t + b)' + (c + na)')'$
     (3) ${\text{Sub}} = ((t + (t + b)')' + (b + (t + b)')')'$
    下載: 導出CSV

    表  1  實驗參數(shù)設(shè)置[11,21]

    參數(shù)
    晶體管工藝65 nm
    單元類型10T-SRAM
    電源電壓(Vdd)1.1 V
    閾值電壓(Vth)166.3941e-3 V
    讀取能量293.3 fJ
    寫入能量316.3 fJ
    陣列大小128×512
    時鐘頻率100 MHz
    下載: 導出CSV
  • [1] RIVEST R L, SHAMIR A, and ADLEMAN L. A method for obtaining digital signatures and public-key cryptosystems[J]. Communications of the ACM, 1978, 21(2): 120–126. doi: 10.1145/359340.359342
    [2] GENTRY C. Fully homomorphic encryption using ideal lattices[C]. The Forty-First Annual ACM Symposium on Theory of Computing, Bethesda, USA, 2009: 169–178.
    [3] FAN Junfeng and VERCAUTEREN F. Somewhat practical fully homomorphic encryption[R]. Paper 2012/144, 2012.
    [4] MARCOLLA C, SUCASAS V, MANZANO M, et al. Survey on fully homomorphic encryption, theory, and applications[J]. Proceedings of the IEEE, 2022, 110(10): 1572–1609. doi: 10.1109/JPROC.2022.3205665
    [5] KIM S, KIM J, KIM M J, et al. BTS: An accelerator for bootstrappable fully homomorphic encryption[C]. The 49th Annual International Symposium on Computer Architecture, New York, USA, 2022: 711–725.
    [6] NEJATOLLAHI H, GUPTA S, IMANI M, et al. CryptoPIM: In-memory acceleration for lattice-based cryptographic hardware[C]. 2020 57th ACM/IEEE Design Automation Conference (DAC), San Francisco, USA, 2020: 1–6.
    [7] SAMARDZIC N, FELDMANN A, KRASTEV A, et al. F1: A fast and programmable accelerator for fully homomorphic encryption[C/OL]. MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture, Greece, 2021: 238–252.
    [8] AHN J, HONG S, YOO S, et al. A scalable processing-in-memory accelerator for parallel graph processing[C]. The 42nd Annual International Symposium on Computer Architecture, Portland, USA, 2015: 105–117.
    [9] AGA S, JELOKA S, SUBRAMANIYAN A, et al. Compute caches[C]. 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA), Austin, USA, 2017: 481–492.
    [10] CAO Zhengjun, WEI Ruizhong, and LIN Xiaodong. A fast modular reduction method[R]. Paper 2014/040, 2014.
    [11] CHEN Paiyu, PENG Xiaochen, and YU Shimeng. NeuroSim+: An integrated device-to-algorithm framework for benchmarking synaptic devices and array architectures[C]. 2017 IEEE International Electron Devices Meeting (IEDM), San Francisco, USA, 2017: 6.1. 1–6.1. 4.
    [12] HE Yuquan, QU Songyun, LIN Gangliang, et al. Processing-in-SRAM acceleration for ultra-low power visual 3D perception[C]. The 59th ACM/IEEE Design Automation Conference, San Francisco, USA, 2022: 295–300.
    [13] LI Dai, PAKALA A, and YANG Kaiyuan. MeNTT: A compact and efficient processing-in-memory number theoretic transform (NTT) accelerator[J]. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2022, 30(5): 579–588. doi: 10.1109/TVLSI.2022.3151321
    [14] REIS D, TAKESHITA J, JUNG T, et al. Computing-in-memory for performance and energy-efficient homomorphic encryption[J]. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2020, 28(11): 2300–2313. doi: 10.1109/TVLSI.2020.3017595
    [15] GUPTA S, CAMMAROTA R, and ROSING T Š. MemFHE: End-to-end computing with fully homomorphic encryption in memory[J]. ACM Transactions on Embedded Computing Systems, 2022, 30(5): 579−588.
    [16] Sarojaerabelli. py-FHE - A python library for fully homomorphic encryption[EB/OL]. https://github.com/sarojaerabelli/py-fhe, 2022.
    [17] ALBRECHT M, CHASE M, CHEN Hao, et al. Homomorphic encryption standard[M]. LAUTER K, DAI Wei, and LAINE K. Protecting Privacy Through Homomorphic Encryption. Cham: Springer, 2021: 31–62.
    [18] AKYEL K C, CHARLES H P, MOTTIN J, et al. DRC2: Dynamically reconfigurable computing circuit based on memory architecture[C]. 2016 IEEE International Conference on Rebooting Computing (ICRC), San Diego, USA, 2016: 1–8.
    [19] HASAN M, SAHA U K, HOSSAIN M S, et al. Low power design of a two bit mangitude comparator for high speed operation[C]. 2019 International Conference on Computer Communication and Informatics (ICCCI), Coimbatore, India, 2019: 1–4.
    [20] TALATI N, GUPTA S, MANE P, et al. Logic design within memristive memories using memristor-aided loGIC (MAGIC)[J]. IEEE Transactions on Nanotechnology, 2016, 15(4): 635–650. doi: 10.1109/TNANO.2016.2570248
    [21] YANG Y, JEONG H, SONG S C, et al. Single bit-line 7T SRAM cell for near-threshold voltage operation with enhanced performance and energy in 14 nm FinFET technology[J]. IEEE Transactions on Circuits and Systems I: Regular Papers, 2016, 63(7): 1023–1032. doi: 10.1109/TCSI.2016.2556118
  • 加載中
圖(11) / 表(4)
計量
  • 文章訪問數(shù):  859
  • HTML全文瀏覽量:  364
  • PDF下載量:  132
  • 被引次數(shù): 0
出版歷程
  • 收稿日期:  2023-04-27
  • 修回日期:  2023-08-18
  • 網(wǎng)絡(luò)出版日期:  2023-08-24
  • 刊出日期:  2023-09-27

目錄

    /

    返回文章
    返回