利用全球開源數(shù)字高程模型的高程誤差預(yù)測(cè)數(shù)據(jù)集

余翠琳; 王青松; 鐘梓炫; 張君豪; 賴濤; 黃海風(fēng)

doi:10.11999/JEIT240062

利用全球開源數(shù)字高程模型的高程誤差預(yù)測(cè)數(shù)據(jù)集

doi: 10.11999/JEIT240062 cstr: 32379.14.JEIT240062

中山大學(xué)電子與通信工程學(xué)院深圳 518107

基金項(xiàng)目: 國家自然科學(xué)基金(62273365, 62071499)，“小米青年學(xué)者”項(xiàng)目

詳細(xì)信息

作者簡介:
余翠琳：女，博士生，研究方向?yàn)闄C(jī)器學(xué)習(xí)交叉應(yīng)用、多源信息融合和遙感數(shù)據(jù)處理

王青松：男，博士，副教授，研究方向?yàn)檫b感圖像精化處理、智能視覺導(dǎo)航、協(xié)同探測(cè)感知與信息融合

賴濤：男，博士，副教授，研究方向?yàn)镾AR成像雷達(dá)系統(tǒng)設(shè)計(jì)與信息處理

黃海風(fēng)：男，博士，教授，研究方向?yàn)榭臻g電子和智能感知領(lǐng)域關(guān)鍵技術(shù)

通訊作者:
王青松　wangqs5@mail.sysu.edu.cn

中圖分類號(hào): TN957.52; TP7
計(jì)量
- 文章訪問數(shù): 530
- HTML全文瀏覽量: 226
- PDF下載量: 97
- 被引次數(shù): 0
出版歷程
- 收稿日期: 2024-01-29
- 修回日期: 2024-06-18
- 網(wǎng)絡(luò)出版日期: 2024-07-01
- 刊出日期: 2024-09-26

Elevation Error Prediction Dataset Using Global Open-source Digital Elevation Model

School of Electronics and Communication Engineering, Sun Yat-sen University, Shenzhen 518107, China

Funds: The National Natural Science Foundation of China (62273365, 62071499), Xiaomi Young Talents Program

摘要

摘要: 數(shù)字高程模型(DEM)校正一直是遙感地學(xué)研究中的重要內(nèi)容，近年來蓬勃發(fā)展的機(jī)器學(xué)習(xí)新方法為DEM高程誤差校正提供了新的解決途徑。由于機(jī)器學(xué)習(xí)等人工智能方法依賴大量的訓(xùn)練數(shù)據(jù)，考慮到目前缺少大區(qū)域公開的、統(tǒng)一的、大規(guī)模和規(guī)范化多源 DEM 高程誤差預(yù)測(cè)數(shù)據(jù)集，針對(duì)數(shù)據(jù)集缺失的問題，該文公開了多源DEM高程誤差預(yù)測(cè)數(shù)據(jù)集(DEEP-Dataset)。該數(shù)據(jù)集包括4個(gè)子數(shù)據(jù)集，分別基于中國廣東省研究區(qū)域的數(shù)字高程測(cè)量的 TerraSAR-X 附加組件(TanDEM-X) DEM和先進(jìn)陸地觀測(cè)衛(wèi)星世界3D-30米(AW3D30) DEM以及澳大利亞北領(lǐng)地研究區(qū)域的航天飛機(jī)雷達(dá)地形測(cè)繪任務(wù)(SRTM) DEM和先進(jìn)星載熱發(fā)射和反射輻射計(jì)全球數(shù)字高程模型 (ASTER) DEM構(gòu)成。其中，廣東省研究區(qū)域的樣本數(shù)量約為40 000，北領(lǐng)地研究區(qū)域的樣本數(shù)約量為1 600 000。數(shù)據(jù)集中的每個(gè)樣本均由10個(gè)特征組成，涵蓋了地理空間、地物種類以及地表形態(tài)等特征信息。通過設(shè)置機(jī)器學(xué)習(xí)模型測(cè)試、DEM校正以及特征重要性評(píng)估等對(duì)比實(shí)驗(yàn)，驗(yàn)證了DEEP-Dataset在實(shí)際模型訓(xùn)練和DEM校正中的有效性，也證明了該數(shù)據(jù)集的合理性和豐富性。
- 數(shù)字高程模型 /
- 人工智能 /
- 機(jī)器學(xué)習(xí) /
- 預(yù)測(cè)數(shù)據(jù)集
Abstract: The correction in Digital Elevation Models (DEMs) has always been a crucial aspect of remote sensing geoscience research. The burgeoning development of new machine learning methods in recent years has provided novel solutions for the correction of DEM elevation errors. Given the reliance of machine learning and other artificial intelligence methods on extensive training data, and considering the current lack of publicly available, unified, large-scale, and standardized multisource DEM elevation error prediction datasets for large areas, the multi-source DEM Elevation Error Prediction Dataset (DEEP-Dataset) is introduced in this paper. This dataset comprises four sub-datasets, based on the TerraSAR-X add-on for Digital Elevation Measurements (TanDEM-X) DEM and Advanced land observing satellite World 3D-30 m (AW3D30) DEM in the Guangdong Province study area of China, and the Shuttle Radar Topography Mission (SRTM) DEM and Advanced Spaceborne Thermal Emission and reflection Radiometer (ASTER) DEM in the Northern Territory study area of Australia. The Guangdong Province sample comprises approximately 40 000 instances, while the Northern Territory sample includes about 1 600 000 instances. Each sample in the dataset consists of ten features, encompassing geographic spatial information, land cover types, and topographic attributes. The effectiveness of the DEEP-Dataset in actual model training and DEM correction has been validated through a series of comparative experiments, including machine learning model testing, DEM correction, and feature importance assessment. These experiments demonstrate the dataset’s rationality, effectiveness, and comprehensiveness.
- Digital Elevation Model (DEM) /
- Artificial intelligence /
- Machine learning /
- Predictive datasets

HTML全文

圖 1 廣東省區(qū)域內(nèi)ICESat-2控制點(diǎn)分布、TanDEM-X DEM和AW3D30 DEM

下載: 全尺寸圖片幻燈片

圖 2 澳大利亞北領(lǐng)地區(qū)域內(nèi)ICESat-2控制點(diǎn)分布、SRTM DEM和ASTER DEM

下載: 全尺寸圖片幻燈片

圖 3 DEEP-Dataset構(gòu)建流程圖

下載: 全尺寸圖片幻燈片

圖 4 不同DEM校正前后的高程誤差的對(duì)比

下載: 全尺寸圖片幻燈片

圖 5 不同地形因子特征分布圖

下載: 全尺寸圖片幻燈片

圖 6 DEEP-Dataset不同子數(shù)據(jù)集的特征重要性權(quán)重分布

下載: 全尺寸圖片幻燈片

表 1 DEM和ICESat-2產(chǎn)品基本屬性介紹

DEM	傳感器類型	空間分辨率(m)	坐標(biāo)系	覆蓋范圍
SRTM	雷達(dá)	30	WGS84	56°S～60°N
ASTER	光學(xué)	30	WGS84	$ 83^\circ {\text{S}}{\text{～}}83^\circ {\text{N}} $
TanDEM-X	雷達(dá)	30	WGS84	$ 90^\circ {\text{S}}{\text{～}}90^\circ {\text{N}} $
AW3D30	光學(xué)	30	WGS84	$ 84^\circ {\text{S}}{\text{～}}84^\circ {\text{N}} $
ICESat-2	激光	–	WGS84	$ 88^\circ {\text{S}}{\text{～}}88^\circ {\text{N}} $

下載: 導(dǎo)出CSV

表 2 DEEP-Dataset介紹

研究區(qū)域	面積(km²)	地形特點(diǎn)	DEM	樣本數(shù)量	特征屬性	目標(biāo)變量
中國廣東省	179 725	高山、丘陵、臺(tái)地和平原	TanDEM-X	18 415	經(jīng)度、緯度、地物種類、坡度、坡向、坡位、地形起伏度、地表粗糙度、坡度變率、坡向變率	高程誤差
中國廣東省	179 725	高山、丘陵、臺(tái)地和平原	AW3D30	18 439
澳大利亞北領(lǐng)地	1 420 968	平原、高原、山地和沙漠	SRTM	795 391
澳大利亞北領(lǐng)地	1 420 968	平原、高原、山地和沙漠	ASTER	795 495

下載: 導(dǎo)出CSV

表 3 ICESat-2 激光控制點(diǎn)粗篩標(biāo)準(zhǔn)

指標(biāo)	參考值
與原有參考 DEM 對(duì)比高差	abs(h_te_best_fit–dem_h)<30 m
表征地表高度統(tǒng)計(jì)量之間的差距	max_diff(h_te_best_fit, h_te_interp, h_median)<0.5
地表光子絕對(duì)數(shù)量和占比	n_te_ photons >50, ratio_te_photos>50%
云量	cloud_flag_atm <10%
h_uncertainty 離群值篩除	<2×RMSE (h_uncertainty)

下載: 導(dǎo)出CSV

表 4 特征屬性介紹

特征屬性	公式	含義說明	符號(hào)
經(jīng)度	/	經(jīng)度是從本初子午線向東或向西測(cè)量的角度。	X
緯度	/	緯度是從赤道向北或向南測(cè)量的角度。	Y
地物種類	/	表示DEM單元格內(nèi)覆蓋的地表類型，如森林、城市、水體等9種。	$ \omega $
坡度	$ \arctan \left( {\sqrt {{{\left( {\dfrac{{\partial Z}}{{\partial X}}} \right)}^2} + {{\left( {\dfrac{{\partial Z}}{{\partial Y}}} \right)}^2}} } \right) $	坡度是指坡面的傾斜與陡峭程度，即高程變化值與距離的比值。Z是高程值，X和Y分別是東西方向和南北方向的空間坐標(biāo)。$ (\partial Z/\partial X) $和$ (\partial Z/\partial Y) $表示沿格網(wǎng)的高程變化率。	$ \theta $
坡向	$ {\text{atan}} 2\left( {\dfrac{{\partial Z}}{{\partial Y}},\dfrac{{\partial Z}}{{\partial X}}} \right) $	坡向是指地面某一點(diǎn)的最大降水方向，即水流從該點(diǎn)流向的方向。atan2是兩參數(shù)的反正切函數(shù)，處理了四個(gè)象限的坡向計(jì)算。	$ \kappa $
坡位	/	坡位是指某一點(diǎn)相對(duì)于周圍點(diǎn)的高度位置，通過分析鄰近的坡度和高程值來確的，沒有固定的公式，需要確定局部最大值和最小值以識(shí)別山脊、山谷和山坡。	$ \psi $
地形起伏度	$ {Z_{{\text{ref}}}}(i, j){\text{ }} = {Z_{{\text{max}}}} - {Z_{{\text{min}}}} $	地形起伏度是指在一個(gè)特定的區(qū)域內(nèi)，最高點(diǎn)海拔高度Z_max與最低點(diǎn)海拔高度Z_min的差值。	$ \alpha $
地表粗糙度	$ \sqrt{\dfrac{1}{n}\displaystyle\sum _{i=1}^{n}{\left({Z}_{i}-\overline{Z}\right)}^{2}} $	地表粗糙度是指地表表面的不規(guī)則程度，即地表表面起伏程度的大小。$ {Z_i} $是鄰近像素的高程值，$ \bar Z $是這些高程值的平均，n是像素?cái)?shù)量。	$ \beta $
坡度變率	$ \sqrt{\dfrac{1}{n}\displaystyle\sum _{i=1}^{n}{\left({\theta }_{i}-\overline{\theta }\right)}^{2}} $	坡度變率是指地面坡度在微分空間的變化率。$ {\theta _i} $是周圍像素的坡度值，$ \bar \theta $是平均坡度值，n是周圍像素的數(shù)量。	$ \varphi $
坡向變率	$ \sqrt {{\text{Var}}(\cos (\kappa )) + {\text{Var}}(\sin (\kappa ))} $	坡向變率是提取坡向基礎(chǔ)上提取坡向的變化率。$ \kappa $是坡向角度值，Var是方差。	$ \lambda $

下載: 導(dǎo)出CSV

表 5 中國廣東省研究區(qū)域模型測(cè)試和DEM校正實(shí)驗(yàn)結(jié)果

單位	DEM校正前(m)		方法	模型測(cè)試		DEM校正后(m)		提升精度(%)
單位	TanDEM-X	AW3D30	方法	TanDEM-X	AW3D30	TanDEM-X	AW3D30	TanDEM-X	AW3D30
MAE	4.734	3.094	RF	3.931	2.513	3.524	2.876	25.56	7.05
			ET	3.879	2.522	3.025	2.471	36.10	20.14
			ANN	4.157	2.821	4.614	2.672	2.53	13.64
			BA	3.935	2.515	3.507	1.838	25.92	40.59
SD	8.388	4.711	RF	6.881	4.223	7.825	4.542	6.71	3.59
			ET	6.835	4.212	7.900	4.578	5.82	2.82
			ANN	7.282	4.775	8.376	4.760	0.14	–1.04
			BA	6.882	4.213	7.804	3.952	6.96	16.11
RMSE	8.388	4.712	RF	6.881	4.225	7.826	4.543	6.70	3.59
			ET	6.836	4.214	7.901	4.590	5.81	2.59
			ANN	7.287	4.782	8.381	4.761	0.08	–1.04
			BA	6.883	4.213	7.810	3.963	6.89	15.90

下載: 導(dǎo)出CSV

表 6 澳大利亞北領(lǐng)地研究區(qū)域模型測(cè)試和DEM校正實(shí)驗(yàn)結(jié)果

單位	DEM校正前(m)		方法	模型測(cè)試		DEM校正后(m)		提升精度(%)
單位	SRTM	ASTER	方法	SRTM	ASTER	SRTM	ASTER	SRTM	ASTER
MAE	2.341	6.507	RF	0.892	1.998	2.036	3.282	13.03	49.56
			ET	0.893	1.997	0.395	0.840	83.13	87.09
			ANN	1.060	2.646	1.127	3.659	51.86	43.77
			BA	0.883	1.917	0.559	1.319	76.12	79.73
SD	2.955	7.756	RF	1.314	2.800	2.586	4.289	12.49	44.70
			ET	1.315	2.813	0.972	2.361	67.11	69.56
			ANN	1.493	3.573	2.436	4.798	17.56	38.14
			BA	1.311	2.706	1.019	2.268	65.52	70.76
RMSE	2.960	7.762	RF	1.315	2.801	2.586	4.289	12.64	44.74
			ET	1.317	2.816	0.973	2.362	67.13	69.57
			ANN	1.494	3.573	2.437	4.807	17.67	38.07
			BA	1.312	2.708	1.020	2.269	65.54	70.77

下載: 導(dǎo)出CSV

參考文獻(xiàn)(29)

[1]	OKOLIE C J and SMIT J L. A systematic review and meta-analysis of Digital elevation model (DEM) fusion: Pre-processing, methods and applications[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2022, 188: 1–29. doi: 10.1016/j.isprsjprs.2022.03.016.
[2]	ZHAO Yaqi and YE Hongxia. SqUNet: An high-performance network for crater detection with DEM data[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2023, 16: 8577–8585. doi: 10.1109/JSTARS.2023.3314128.
[3]	LUEDELING E, SIEBERT S, and BUERKERT A. Filling the voids in the SRTM elevation model — A TIN-based delta surface approach[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2007, 62(4): 283–294. doi: 10.1016/j.isprsjprs.2007.05.004.
[4]	FREY H and PAUL F. On the suitability of the SRTM DEM and ASTER GDEM for the compilation of topographic parameters in glacier inventories[J]. International Journal of Applied Earth Observation and Geoinformation, 2012, 18: 480–490. doi: 10.1016/J.JAG.2011.09.020.
[5]	SCHREYER J, BYRON WALKER B, and LAKES T. Implementing urban canopy height derived from a TanDEM-X-DEM: An expert survey and case study[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2022, 187: 345–361. doi: 10.1016/J.ISPRSJPRS.2022.02.015.
[6]	HUANG Huabing, CHEN Peimin, XU Xiaoqing, et al. Estimating building height in China from ALOS AW3D30[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2022, 185: 146–157. doi: 10.1016/j.isprsjprs.2022.01.022.
[7]	GONZALEZ J H, BACHMANN M, SCHEIBER R, et al. Definition of ICESat selection criteria for their use as height references for TanDEM-X[J]. IEEE Transactions on Geoscience and Remote Sensing, 2010, 48(6): 2750–2757. doi: 10.1109/TGRS.2010.2041355.
[8]	劉燕, 林赟, 譚維賢, 等. 基于圓跡干涉SAR的DEM提取[J]. 電子與信息學(xué)報(bào), 2015, 37(6): 1463–1469. doi: 10.11999/JEIT141022. LIU Yan, LIN Yun, TAN Weixian, et al. DEM extraction based on interferometric circular SAR[J]. Journal of Electronics & Information Technology, 2015, 37(6): 1463–1469. doi: 10.11999/JEIT141022.
[9]	HUESO GONZALEZ J, BACHMANN M, KRIEGER G, et al. Development of the TanDEM-X calibration concept: Analysis of systematic errors[J]. IEEE Transactions on Geoscience and Remote Sensing, 2010, 48(2): 716–726. doi: 10.1109/TGRS.2009.2034980.
[10]	LI Binbin, XIE Huan, TONG Xiaohua, et al. A global-scale DEM elevation correction model using ICESat-2 laser altimetry data[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61: 1–15. doi: 10.1109/TGRS.2023.3321956.
[11]	BAGHERI H, SCHMITT M, and ZHU Xiaoxiang. Fusion of TanDEM-X and cartosat-1 elevation data supported by neural network-predicted weight maps[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2018, 144: 285–297. doi: 10.1016/j.isprsjprs.2018.07.007.
[12]	TIAN Yu, LEI Shaogang, BIAN Zhengfu, et al. Improving the accuracy of open source digital elevation models with multi-scale fusion and a slope position-based linear regression method[J]. Remote Sensing, 2018, 10(12): 1861. doi: 10.3390/rs10121861.
[13]	POURSHAMSI M, XIA Junshi, YOKOYA N, et al. Tropical forest canopy height estimation from combined polarimetric SAR and LiDAR using machine-learning[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2021, 172: 79–94. doi: 10.1016/j.isprsjprs.2020.11.008.
[14]	MA Xiaojie, JI Kefeng, ZHANG Linbin, et al. SAR target open-set recognition based on joint training of class-specific sub-dictionary learning[J]. IEEE Geoscience and Remote Sensing Letters, 2024, 21: 1–5. doi: 10.1109/LGRS.2023.3342904.
[15]	HU Peng, ZHEN Liangli, PENG Xi, et al. Deep supervised multi-view learning with graph priors[J]. IEEE Transactions on Image Processing, 2024, 33: 123–133. doi: 10.1109/TIP.2023.3335825.
[16]	CHEN Yucong. Analysis and forecasting of California housing[J]. Highlights in Business, Economics and Management, 2023, 3: 128–135. doi: 10.54097/hbem.v3i.4704.
[17]	BALTRU?AITIS T, AHUJA C, and MORENCY L P. Multimodal machine learning: A survey and taxonomy[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(2): 423–443. doi: 10.1109/TPAMI.2018.2798607.
[18]	USGS. https://earthexplorer.usgs.gov/, 2014.
[19]	GSCloud. Geospatial data cloud[EB/OL]. https://www.gscloud.cn/search, 2009.
[20]	EOC. Eoc geoservice[EB/OL]. https://download.geoservice.dlr.de/TDM90/, 2016.
[21]	ALOS. Aw3d30 dsm data map[EB/OL]. https://www.eorc.jaxa.jp/ALOS/en/aw3d30/data/index.htm, 2021.
[22]	NASA. Icesat-2 (ice, cloud, and land elevation satellite2)[EB/OL]. https://icesat-2.gsfc.nasa.gov/science/specs, 2018.
[23]	王密, 韋鈺, 楊博, 等. ICESat-2/ATLAS全球高程控制點(diǎn)提取與分析[J]. 武漢大學(xué)學(xué)報(bào)(信息科學(xué)版), 2021, 46(2): 184–192. doi: 10.13203/j.whugis20200531. WANG Mi, WEI Yu, YANG Bo, et al. Extraction and analysis of global elevation control points from ICESat-2 /ATLAS data[J]. Geomatics and Information Science of Wuhan University, 2021, 46(2): 184–192. doi: 10.13203/j.whugis20200531.
[24]	ESA. Esa worldcover 10m 2020[EB/OL]. https://esa-worldcover.org/en, 2020.
[25]	National Earth System Science Data Center. Global 30-meter fine surface coverage products[EB/OL]. https://doi.org/10.12041/geodata.4200772.ver1.db, 2015.
[26]	ZHU Simin, GUENDEL R G, YAROVOY A, et al. Continuous human activity recognition with distributed radar sensor networks and CNN-RNN architectures[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 5115215. doi: 10.1109/TGRS.2022.3189746.
[27]	QUADRIANTO N and GHAHRAMANI Z. A very simple safe-Bayesian random forest[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(6): 1297–1303. doi: 10.1109/TPAMI.2014.2362751.
[28]	GEURTS P, ERNST D, and WEHENKEL L. Extremely randomized trees[J]. Machine Learning, 2006, 63(1): 3–42. doi: 10.1007/s10994-006-6226-1.
[29]	FUMERA G, ROLI F, and SERRAU A. A theoretical analysis of bagging as a linear combination of classifiers[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, 30(7): 1293–1299. doi: 10.1109/TPAMI.2008.30.