- 1、本文档共5页,可阅读全部内容。
- 2、有哪些信誉好的足球投注网站(book118)网站文档一经付费(服务费),不意味着购买了该文档的版权,仅供个人/单位学习、研究之用,不得用于商业用途,未经授权,严禁复制、发行、汇编、翻译或者网络传播等,侵权必究。
- 3、本站所有内容均由合作方或网友上传,本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺!文档内容仅供研究参考,付费前请自行鉴别。如您付费,意味着您自己接受本站规则且自行承担风险,本站不退款、不进行额外附加服务;查看《如何避免下载的几个坑》。如果您已付费下载过本站文档,您可以点击 这里二次下载。
- 4、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等,请点击“版权申诉”(推荐),也可以打举报电话:400-050-0827(电话支持时间:9:00-18:30)。
查看更多
基于改进smote的非平衡数据集分类研究_王超学
184 2013 ,49 (2 ) Computer Engineering and Applications 计算机工程与应用
基于改进SMOTE 的非平衡数据集分类研究
1 1 1 2 1
王超学 ,潘正茂 ,董丽丽 ,马春森 ,张 星
1 1 1 2 1
WANG Chaoxue , PAN Zhengmao , DONG Lili , MA Chunsen , ZHANG Xing
1.西安建筑科技大学 信息与控制工程学院,西安 710055
2.中国农业科学院 植物保护研究所,北京 100193
1.School of Information and Control Engineering, Xi ’an University of Architecture and Technology, Xi ’an 710055, China
2.Institute of Plant Protection, Chinese Academy of Agricultural Sciences, Beijing 100193, China
WANG Chaoxue, PAN Zhengmao, DONG Lili, et al. Research on classification for imbalanced dataset based on improved
SMOTE. Computer Engineering and Applications, 2013, 49 (2 ):184-187.
Abstract :Based on analyzing the shortages of SMOTE (Synthetic Minority Over-sampling Technique), an improved SMOTE
(SSMOTE)is presented. The key of SSMOTE lies on leading the concept of support and roulette wheel selection into SMOTE
and making full use of the heterogeneous nearest-neighbor distribution information to achieve the fine control of the synthesis
quality and quantity to the minority class samples. SSMOTE and KNN (K -Nearest Neighbor )are combined to handle the classi-
fication problem on imbalanced datasets, and extensive experiments are conducted to compare SSMOTE and algorithms in perti-
nent literatures on the UCI datasets. The simulation results show SSMOTE promises prominent synthesis effect to the minority
class samples, and brings better classification performance on imbalanced datasets with KNN.
Key words :imbalanced datasets; classification; support; roulette wheel selection; Synthetic Minority Over-sampling Technique
(SMOTE)
摘 要:针对SMOTE (Synthetic Minority Over-sampling Technique )在合成少数类新样本时存在的不足,提出了一种改进
的SMOTE 算法(SSMOTE )。该算法的关键是将支持度概念和轮盘赌选择技术引入到SMOTE 中,并充分利用了异类近邻
的分布信息,实现了对少数类样本合成质量和数量的精细控制。将SSMOTE 与KNN (K -Neares
文档评论(0)