典型文献
Minimum Epsilon-Kernel Computation for Large-Scale Data Processing
文献摘要:
Kernel is a kind of data summary which is elaborately extracted from a large dataset.Given a problem,the solution obtained from the kernel is an approximate version of the solution obtained from the whole dataset with a provable approximate ratio.It is widely used in geometric optimization,clustering,and approximate query processing,etc.,for scaling them up to massive data.In this paper,we focus on the minimum ε-kernel(MK)computation that asks for a kernel of the smallest size for large-scale data processing.For the open problem presented by Wang et al.that whether the minimumε-coreset(MC)problem and the MK problem can be reduced to each other,we first formalize the MK problem and analyze its complexity.Due to the NP-hardness of the MK problem in three or higher dimensions,an approximate algorithm,namely Set Cover-Based Minimum e-Kernel algorithm(SCMK),is developed to solve it.We prove that the MC problem and the MK problem can be Turing-reduced to each other.Then,we discuss the update of MK under insertion and deletion operations,respectively.Finally,a randomized algorithm,called the Randomized Algorithm of Set Cover-Based Minimumε-Kernel algorithm(RA-SCMK),is utilized to further reduce the complexity of SCMK.The efficiency and effectiveness of SCMK and RA-SCMK are verified by experimental results on real-world and synthetic datasets.Experiments show that the kernel sizes of SCMK are 2x and 17.6x smaller than those of an ANN-based method on real-world and synthetic datasets,respectively.The speedup ratio of SCMK over the ANN-based method is 5.67 on synthetic datasets.RA-SCMK runs up to three times faster than SCMK on synthetic datasets.
文献关键词:
中图分类号:
作者姓名:
Hong-Jie Guo;Jian-Zhong Li;Hong Gao
作者机构:
School of Computer Science and Technology,Harbin Institute of Technology,Harbin 150001,China;Faculty of Computer Science and Control Engineering,Shenzhen Institute of Advanced Technology,Chinese Academy of Sciences,Shenzhen 518000,China
文献出处:
引用格式:
[1]Hong-Jie Guo;Jian-Zhong Li;Hong Gao-.Minimum Epsilon-Kernel Computation for Large-Scale Data Processing)[J].计算机科学技术学报(英文版),2022(06):1398-1411
A类:
coreset,formalize,SCMK
B类:
Minimum,Epsilon,Kernel,Computation,Large,Scale,Data,Processing,kind,summary,which,elaborately,extracted,from,large,Given,problem,solution,obtained,kernel,approximate,version,whole,provable,It,widely,used,geometric,optimization,clustering,query,processing,etc,scaling,them,massive,In,this,paper,we,focus,minimum,computation,that,asks,smallest,scale,For,open,presented,by,Wang,whether,MC,can,be,reduced,each,other,first,analyze,its,complexity,Due,NP,hardness,three,higher,dimensions,algorithm,namely,Set,Cover,Based,developed,solve,We,prove,Turing,Then,discuss,update,under,insertion,deletion,operations,respectively,Finally,randomized,called,Randomized,Algorithm,RA,utilized,further,efficiency,effectiveness,are,verified,experimental,results,real,world,synthetic,datasets,Experiments,show,sizes,2x,6x,smaller,than,those,ANN,method,speedup,runs,times,faster
AB值:
0.471667
相似文献
机标中图分类号,由域田数据科技根据网络公开资料自动分析生成,仅供学习研究参考。