基于模板對齊與多階段特征學(xué)習(xí)的光場角度重建

郁梅; 周濤; 陳曄曜; 蔣志迪; 駱挺; 蔣剛毅

doi:10.11999/JEIT240481

基于模板對齊與多階段特征學(xué)習(xí)的光場角度重建

doi: 10.11999/JEIT240481 cstr: 32379.14.JEIT240481

郁梅¹,
周濤¹,
陳曄曜¹,
蔣志迪²,
駱挺²,
蔣剛毅^1, ,

1.
寧波大學(xué)信息科學(xué)與工程學(xué)院寧波 315211
2.
寧波大學(xué)科學(xué)技術(shù)學(xué)院寧波 315300

基金項(xiàng)目: 國家自然科學(xué)基金(62271276, 62071266, 62401301)，浙江省自然科學(xué)基金(LQ24F010002)

詳細(xì)信息

作者簡介:
郁梅：女，教授，研究方向?yàn)槎嗝襟w信號處理與通信、計(jì)算成像、視覺感知與編碼、圖像視頻質(zhì)量評價(jià)等

周濤：男，碩士生，研究方向?yàn)楣鈭鰣D像處理、光場角度重建

陳曄曜：男，講師，研究方向?yàn)楦邉?dòng)態(tài)范圍成像、計(jì)算成像、視頻處理等

蔣志迪：男，副教授，研究方向?yàn)閿?shù)字視頻壓縮與通信、多視圖視頻編碼和圖像處理

駱挺：男，教授，研究方向?yàn)樗聢D像增強(qiáng)，高動(dòng)態(tài)范圍成像等

蔣剛毅：男，教授，研究方向?yàn)槎嗝襟w信號處理，圖像處理與視頻壓縮，計(jì)算成像與視覺感知等

通訊作者:
蔣剛毅　jianggangyi@126.com

中圖分類號: TN911.73
計(jì)量
- 文章訪問數(shù): 199
- HTML全文瀏覽量: 88
- PDF下載量: 40
- 被引次數(shù): 0
出版歷程
- 收稿日期: 2024-06-13
- 修回日期: 2025-01-23
- 網(wǎng)絡(luò)出版日期: 2025-02-09
- 刊出日期: 2025-02-28

Light Field Angular Reconstruction Based on Template Alignment and Multi-stage Feature Learning

1.
Faculty of Information Science and Engineering, Ningbo University, Ningbo 315211, China
2.
School of Science and Technology, Ningbo University, Ningbo 315300, China

Funds: The National Natural Science Foundation of China (62271276, 62071266, 62401301), The Natural Science Foundation of Zhejiang Province (LQ24F010002)

摘要

摘要: 現(xiàn)有光場圖像角度重建方法通過探索光場圖像內(nèi)在的空間-角度信息以進(jìn)行角度重建，但無法同時(shí)處理不同視點(diǎn)層的子孔徑圖像重建任務(wù)，難以滿足光場圖像可伸縮編碼的需求。為此，將視點(diǎn)層視為稀疏模板，該文提出一種能夠單模型處理不同角度稀疏模板的光場圖像角度重建方法。將不同的角度稀疏模板視為微透鏡陣列圖像的不同表示，通過模板對齊將輸入的不同視點(diǎn)層整合為微透鏡陣列圖像，采用多階段特征學(xué)習(xí)方式，以微透鏡陣列級-子孔徑級的特征學(xué)習(xí)策略來處理不同輸入的稀疏模板，并輔以獨(dú)特的訓(xùn)練模式，以穩(wěn)定地參考不同角度稀疏模板，重建任意角度位置的子孔徑圖像。實(shí)驗(yàn)結(jié)果表明，所提方法能有效地參考不同稀疏模板，靈活地重建任意角度位置的子孔徑圖像，且所提模板對齊與訓(xùn)練方法能有效地應(yīng)用于其它光場圖像超分辨率重建方法以提升其處理不同角度稀疏模板的能力。
- 光場圖像 /
- 角度重建 /
- 可伸縮編碼 /
- 稀疏模板
Abstract: Objective By placing a micro-lens array between the main lens and imaging sensor, a light field camera captures both intensity and directional information of light in a scene. However, due to sensor size, dense spatial sampling results in sparse angular sampling during light field imaging. Consequently, angular super-resolution reconstruction of Light Field Images (LFIs) is essential. Existing deep learning-based LFI angular super-resolution reconstruction typically achieves dense LFIs through two approaches. The direct generation approach models the correlation between spatial and angular information from sparse LFIs and then upsamples along the angular dimension to reconstruct the light field. The indirect approach, on the other hand, generates intermediate outputs, reconstructing LFIs through operations on these outputs and the inputs. LFI coding methods based on sparse sampling generally select partial Sub Aperture Images (SAIs) for compression and transmission, using angular super-resolution to reconstruct the LFI at the decoder. In LFI scalable coding, the SAIs are divided into multiple viewpoint layers, some of which are selectively transmitted based on bit rate allocation, while the remaining layers are reconstructed at the decoder. Although existing deep learning-based angular super-resolution methods yield promising results, they lack flexibility and generalizability across different numbers and positions of reference SAIs. This limits their ability to reconstruct SAIs from arbitrary viewpoints, making them unsuitable for LFI scalable coding. To address this, a Light Field Angular Reconstruction method based on Template Alignment and multi-stage Feature learning (LFAR-TAF) is proposed, capable of handling different angular sparse templates with a single network model. Methods The process involves alignment, Micro-Lens Array Image (MLAI) feature extraction, sub-aperture level feature fusion, feature mapping to the target angular position, and SAI synthesis at the target angular position. First, the different viewpoint layers used in LFI scalable encoding are treated as different representations of the MLAI, referred to as light field sparse templates. To minimize discrepancies between these sparse templates and reduce the complexity of fitting the single network model, bilinear interpolation is employed to align the templates and generate corresponding MLAIs. The MLAI Feature Learning (MLAIFL) module then uses a Residual Dense Block (RDB) to extract preliminary features from the MLAIs, thereby mitigating the differences introduced by bilinear interpolation. Since the MLAI feature extraction process may partially disrupt the angular consistency of the LFI, a conversion mechanism from MLAI features to sub-aperture features is devised, incorporating an SAI-level Feature fusion (SAIF) module. In this step, the input MLAI features are reorganized along the channel dimension to align with the SAI dimension. Three 1×1 convolutions and two agent attention mechanisms are then employed for progressive fusion, supported by residual connections to accelerate convergence. In the feature mapping module, the extracted SAI features are mapped and adjusted to the target angular position based on the given target angular coordinates. Specifically, the SAI features are expanded in dimension using a recombination operator to match the spatial dimensions of the input features, and are concatenated with the input features. The concatenated target angular information is then fused with the light field features using a 1×1 convolution and RDB. The fused features are subsequently input into two RDBs to generate intermediate convolution kernel weights and bias maps. In the SAI synthesis module for the target angular position, the common viewpoints of different sparse templates serve as reference SAIs for the indirect synthesis method, ensuring the stability of the proposed approach. Using non-shared weight convolution kernels of the same dimension, the reference SAIs are convolved, and the preliminary results are combined with the generated bias map to synthesize the target SAI with enhanced detail. Results and Discussions The performance of LFAR-TAF is evaluated using two publicly available natural scene LFI datasets: the STFLytro dataset and Kalantari et al.’s dataset. To ensure non-overlapping training and testing sets, the same partitioning method as in current advanced approaches for natural scene LFIs is adopted. Specifically, 100 natural scene LFIs (100 Scenes) from Kalantari et al.’s dataset are used for training, while the test set consists of 30 LFIs (30 Scenes) from the Kalantari et al.’s dataset, as well as 15 Reflective scenes and 25 occlusion scenes from the STFLytro dataset. LFAR-TAF is compared with six angular super-resolution reconstruction methods (ShearedEPI, Yeung et al.’s method, LFASR-geo, FS-GAF, DistgASR, and IRVAE) using PSNR and SSIM as objective quality metrics for angular reconstruction from 3×3 to 7×7. Experimental results demonstrate that LFAR-TAF achieves the highest objective quality scores across all three test datasets. Notably, the proposed method is capable of reconstructing SAIs at any viewpoint using either five reference SAIs or 3×3 reference SAIs, after training on angular reconstruction tasks from 3×3 to 7×7. Subjective visual comparisons further show that LFAR-TAF effectively restores color and texture details of the target SAI from the reference SAIs. Ablation experiments reveal that removing either the MLAIFL or SAIF module results in decreased objective quality scores on the three test datasets, with the loss being more pronounced when the MLAIFL module is omitted. This highlights the importance of MLAI feature learning in modeling the spatial and angular correlations of LFIs, while the SAIF module enhances the micro-lens array to sub-aperture feature conversion process. Additionally, coding experiments are conducted to assess the practical performance of the proposed method in LFI coding. Two angular sparse templates (five SAIs and 3×3 SAIs) are tested on four scenarios from the EPFL dataset. The results show that encoding five SAIs achieves high coding efficiency at lower bit rates, while encoding nine SAIs from the 3×3 sparse template provides better performance at higher bit rates. These findings suggest that, to improve LFI scalable coding compression efficiency, different sparse templates can be selected based on the bit rate, and LFAR-TAF demonstrates stable reconstruction capabilities for various sparse templates in a single training process. Conclusions The proposed LFAR-TAF effectively handles different sparse templates with a single network model, enabling the flexible reconstruction of SAIs at any viewpoint by referencing SAIs with varying numbers and positions. This flexibility is particularly beneficial for LFI scalable coding. Moreover, the designed training approach can be applied to other LFI angular super-resolution methods, enhancing their ability to handle diverse sparse templates.
- Light Field Image(LFI) /
- Angular reconstruction /
- Scalable coding /
- Sparse template

HTML全文

圖 1 所提 TAF-LFAR 網(wǎng)絡(luò)框架

下載: 全尺寸圖片幻燈片

圖 2 插值結(jié)果示意圖

下載: 全尺寸圖片幻燈片

圖 3 子孔徑級特征融合模塊示意圖

下載: 全尺寸圖片幻燈片

圖 4 特征映射模塊示意圖

下載: 全尺寸圖片幻燈片

圖 5 目標(biāo)角度位置的子孔徑圖像合成結(jié)構(gòu)圖

下載: 全尺寸圖片幻燈片

圖 6 不同光場圖像角度重建方法的視覺比較結(jié)果(可視化結(jié)果所處角度位置如(a1)和(b1)右下角網(wǎng)格所示

下載: 全尺寸圖片幻燈片

圖 7 所提方法針對光場圖像可伸縮編碼的應(yīng)用效果

下載: 全尺寸圖片幻燈片

表 1 實(shí)驗(yàn)所用訓(xùn)練和測試集劃分

使用方法	數(shù)據(jù)集	數(shù)據(jù)類型	場景個(gè)數(shù)
訓(xùn)練	100Scenes^[12]	真實(shí)場景	100
測試	Reflective^[21]	真實(shí)場景	15
	Occlusion^[21]	真實(shí)場景	25
	30Scenes^[12]	真實(shí)場景	30

下載: 導(dǎo)出CSV

表 2 不同光場角度超分辨率重建方法在3×3→7×7重建任務(wù)上的定量比較

方法	30 Scenes		Occlusion		Reflective
方法	PSNR(dB)	SSIM	PSNR(dB)	SSIM	PSNR(dB)	SSIM
ShearedEPI^[13]	42.74	0.986 7	39.84	0.981 9	40.32	0.964 7
Yeung et al.^[8]	44.53	0.990 0	42.06	0.987 0	42.56	0.971 1
LFASR-geo^[14]	44.16	0.988 9	41.71	0.986 6	42.04	0.969 3
FS-GAF^[15]	44.32	0.989 0	41.94	0.987 0	42.62	0.970 6
DistgASR^[11]	45.90	0.996 8	43.88	0.996 0	43.95	0.988 7
IRVAE^[16]	45.64	0.996 7	43.62	0.995 8	42.48	0.988 1
LFAR-TAF	46.07	0.997 0	44.06	0.996 2	43.95	0.989 5

下載: 導(dǎo)出CSV

表 3 所提模板對齊和訓(xùn)練策略在不同角度重建方法上的驗(yàn)證

方法	重建任務(wù)	30 Scenes		Occlusion		Reflective
方法	重建任務(wù)	PSNR(dB)	SSIM	PSNR(dB)	SSIM	PSNR(dB)	SSIM
DistgASR^[11]	角度 5→7×7	44.70	0.995 9	41.84	0.994 2	42.07	0.984 9
IRVAE^[16]		44.62	0.995 9	41.87	0.994 3	41.86	0.985 0
LFAR-TAF		45.08	0.996 0	42.53	0.994 9	42.24	0.984 1
DistgASR^[11]	角度 3×3→7×7	45.81	0.996 7	43.73	0.995 9	43.81	0.988 3
IRVAE^[16]		45.36	0.996 5	43.07	0.995 6	42.90	0.987 4
LFAR-TAF		46.07	0.997 0	44.06	0.996 2	43.95	0.989 5

下載: 導(dǎo)出CSV

表 4 所提方法各核心模塊的消融實(shí)驗(yàn)結(jié)果

方法	30 Scenes		Occlusion		Reflective
方法	PSNR(dB)	SSIM	PSNR(dB)	SSIM	PSNR(dB)	SSIM
w/o MLAIFL	45.31	0.995 8	43.78	0.996 0	43.28	0.988 3
w/o SAIF	45.88	0.996 8	43.78	0.996 0	43.88	0.989 5
LFAR-TAF	46.07	0.997 0	44.06	0.996 2	43.95	0.989 5

下載: 導(dǎo)出CSV

參考文獻(xiàn)(22)

[1]	CHEN Yeyao, JIANG Gangyi, YU Mei, et al. HDR light field imaging of dynamic scenes: A learning-based method and a benchmark dataset[J]. Pattern Recognition, 2024, 150: 110313. doi: 10.1016/j.patcog.2024.110313.
[2]	ZHAO Chun and JEON B. Compact representation of light field data for refocusing and focal stack reconstruction using depth adaptive multi-CNN[J]. IEEE Transactions on Computational Imaging, 2024, 10: 170–180. doi: 10.1109/TCI.2023.3347910.
[3]	ATTAL B, HUANG Jiabin, ZOLLHOFER M, et al. Learning neural light fields with ray-space embedding[C]. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, USA, 2022: 19787–19797. doi: 10.1109/CVPR52688.2022.01920.
[4]	HUR J, LEE J Y, CHOI J, et al. I see-through you: A framework for removing foreground occlusion in both sparse and dense light field images[C]. 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, USA, 2023: 229–238. doi: 10.1109/WACV56688.2023.00031.
[5]	鄧慧萍, 曹召洋, 向森, 等. 基于上下文感知跨層特征融合的光場圖像顯著性檢測[J]. 電子與信息學(xué)報(bào), 2023, 45(12): 4489–4498. doi: 10.11999/JEIT221270. DENG Huiping, CAO Zhaoyang, XIANG Sen, et al. Saliency detection based on context-aware cross-layer feature fusion for light field images[J]. Journal of Electronics & Information Technology, 2023, 45(12): 4489–4498. doi: 10.11999/JEIT221270.
[6]	武迎春, 王玉梅, 王安紅, 等. 基于邊緣增強(qiáng)引導(dǎo)濾波的光場全聚焦圖像融合[J]. 電子與信息學(xué)報(bào), 2020, 42(9): 2293–2301. doi: 10.11999/JEIT190723. WU Yingchun, WANG Yumei, WANG Anhong, et al. Light field all-in-focus image fusion based on edge enhanced guided filtering[J]. Journal of Electronics & Information Technology, 2020, 42(9): 2293–2301. doi: 10.11999/JEIT190723.
[7]	YOON Y, JEON H G, YOO D, et al. Learning a deep convolutional network for light-field image super-resolution[C]. 2015 IEEE International Conference on Computer Vision Workshop (ICCVW), Santiago, Chile, 2015: 57–65. doi: 10.1109/ICCVW.2015.17.
[8]	YEUNG H W F, HOU Junhui, CHEN Jie, et al. Fast light field reconstruction with deep coarse-to-fine modeling of spatial-angular clues[C]. 2018 European Conference on Computer Vision (ECCV), Munich, Germany, 2018: 138–154. doi: 10.1007/978-3-030-01231-1_9.
[9]	WU Gaochang, ZHAO Mandan, WANG Liangyong, et al. Light field reconstruction using deep convolutional network on EPI[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, USA, 2017: 1638–1646. doi: 10.1109/CVPR.2017.178.
[10]	WANG Yunlong, LIU Fei, WANG Zilei, et al. End-to-end view synthesis for light field imaging with pseudo 4DCNN[C]. 2018 European Conference on Computer Vision (ECCV), Munich, Germany, 2018, pp. 340–355. doi: 10.1007/978-3-030-01216-8_21.
[11]	WANG Yingqian, WANG Longguang, WU Gaochang, et al. Disentangling light fields for super-resolution and disparity estimation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(1): 425–443. doi: 10.1109/TPAMI.2022.3152488.
[12]	KALANTARI N K, WANG Tingchun, and RAMAMOORTHI R. Learning-based view synthesis for light field cameras[J]. ACM Transactions on Graphics, 2016, 35(6): 193. doi: 10.1145/2980179.2980251.
[13]	WU Gaochang, LIU Yebin, DAI Qionghai, et al. Learning sheared EPI structure for light field reconstruction[J]. IEEE Transactions on Image Processing, 2019, 28(7): 3261–3273. doi: 10.1109/TIP.2019.2895463.
[14]	JIN Jing, HOU Junhui, YUAN Hui, et al. Learning light field angular super-resolution via a geometry-aware network[C]. 2020 34th AAAI Conference on Artificial Intelligence, New York, USA, 2020: 11141–11148. doi: 10.1609/aaai.v34i07.6771.
[15]	JIN Jing, HOU Junhui, CHEN Jie, et al. Deep coarse-to-fine dense light field reconstruction with flexible sampling and geometry-aware fusion[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(4): 1819–1836. doi: 10.1109/TPAMI.2020.3026039.
[16]	HAN Kang and XIANG Wei. Inference-reconstruction variational autoencoder for light field image reconstruction[J]. IEEE Transactions on Image Processing, 2022, 31: 5629–5644. doi: 10.1109/TIP.2022.3197976.
[17]	?ETINKAYA E, AMIRPOUR H, and TIMMERER C. LFC-SASR: Light field coding using spatial and angular super-resolution[C]. 2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW), Taipei City, China, 2022: 1–6. doi: 10.1109/ICMEW56448.2022.9859373.
[18]	AMIRPOUR H, TIMMERER C, and GHANBARI M. SLFC: Scalable light field coding[C]. 2021 Data Compression Conference (DCC), Snowbird, USA, 2021: 43–52. doi: 10.1109/DCC50243.2021.00012.
[19]	ZHANG Yulun, TIAN Yapeng, KONG Yu, et al. Residual dense network for image super-resolution[C] 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, USA, 2018: 2472–2481. doi: 10.1109/CVPR.2018.00262.
[20]	HAN Dongchen, YE Tianzhu, HAN Yizeng, et al. Agent attention: On the integration of softmax and linear attention[C]. 18th European Conference on Computer Vision, Milan, Italy, 2024: 124–140. doi: 10.1007/978-3-031-72973-7_8.
[21]	RAJ A S, LOWNEY M, SHAH R, et al. Stanford Lytro light field archive[EB/OL]. http://lightfields.stanford.edu/LF2016.html, 2016.
[22]	RERABEK M and EBRAHIMI T. New light field image dataset[C]. 8th International Conference on Quality of Multimedia Experience, Lisbon, Portugal, 2016.