基于改进神经过程的缺失数据填充算法
更新日期:2021-05-27     浏览次数:150
核心提示:摘要缺失数据填充是数据分析处理领域的一个重要研究课题。特别是在采集数据量较少的情况下,缺失数据填充的难度极大。针对这个问题,提出一种基于改进神

摘要 缺失数据填充是数据分析处理领域的一个重要研究课题。特别是在采集数据量较少的情况下,缺失数据填充的难度极大。针对这个问题,提出一种基于改进神经过程模型的缺失数据填充算法,该算法可有效提升小数据集背景下的缺失数据填充性能。首先,将观测到的时间序列进行单一表示,由神经网络得到各自的表征向量;其次,通过神经过程模型获得数据的分布函数,并在训练阶段引入修正系数α,从而根据数据缺失率更加精确地确定训练数据的采样率;最后,加入填充过程,通过训练好的模型估计数据缺失值。为检验算法性能,在海洋表面温度数据集以及北京PM2.5含量数据集上进行仿真实验,结果表明该算法在小数据集背景下具有良好的填充效果。与其他算法相比,所提算法在高缺失率的情况下具有更低的均方根误差。 Missing data imputing is a serious problem in the field of data analysis and process,which is extremely intractable in the case of the small dataset especially.In view of this problem,a missing data imputing algorithm based on modified neural process is proposed,which can improve the imputing performance in the background of the small dataset.Firstly,the observed time series is single-represented and then obtain the symptomatic vector respectively through the neural network.Secondly,it can acquire the distribution function of the data via the neural process and introduce the correction coefficientαto determine the sampling rate more exactly based on missing rate in the training stage.Finally,it imported the imputing process and estimated the missing data via trained model.Experiments are carried out on the sea surface temperature dataset and the Beijing PM2.5 dataset to verify the performance of the algorithm.The experiments show that the algorithm has an excellent performance in the context of small datasets,and it has a lower root mean square error compared with other algorithms.
作者 孙晓丽 郭艳 李宁 宋晓祥 SUN Xiaoli;GUO Yan;LI Ning;SONG Xiaoxiang(PLA Army Engineering University, Nanjing 210007, China)
出处 《中国科学院大学学报》 CSCD 北大核心 2021年第2期280-287,共8页 Journal of University of Chinese Academy of Sciences
基金 国家自然科学基金(61871400) 江苏省自然科学基金(BK20171401)资助。
关键词 缺失数据填充 时间序列 改进神经过程 修正系数 missing data imputing time series modified neural process correction coefficient