- 1、本文档共101页,可阅读全部内容。
- 2、有哪些信誉好的足球投注网站(book118)网站文档一经付费(服务费),不意味着购买了该文档的版权,仅供个人/单位学习、研究之用,不得用于商业用途,未经授权,严禁复制、发行、汇编、翻译或者网络传播等,侵权必究。
- 3、本站所有内容均由合作方或网友上传,本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺!文档内容仅供研究参考,付费前请自行鉴别。如您付费,意味着您自己接受本站规则且自行承担风险,本站不退款、不进行额外附加服务;查看《如何避免下载的几个坑》。如果您已付费下载过本站文档,您可以点击 这里二次下载。
- 4、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等,请点击“版权申诉”(推荐),也可以打举报电话:400-050-0827(电话支持时间:9:00-18:30)。
查看更多
chap4_basic_classification分类
Confidence Interval for Accuracy For large test sets (N 30), acc has a normal distribution with mean p and variance p(1-p)/N Confidence Interval for p: Area = 1 - ? Z?/2 Z1- ? /2 Confidence Interval for Accuracy Consider a model that produces an accuracy of 80% when evaluated on 100 test instances: N=100, acc = 0.8 Let 1-? = 0.95 (95% confidence) From probability table, Z?/2=1.96 1-? Z 0.99 2.58 0.98 2.33 0.95 1.96 0.90 1.65 N 50 100 500 1000 5000 p(lower) 0.670 0.711 0.763 0.774 0.789 p(upper) 0.888 0.866 0.833 0.824 0.811 Comparing Performance of 2 Models Given two models, say M1 and M2, which is better? M1 is tested on D1 (size=n1), found error rate = e1 M2 is tested on D2 (size=n2), found error rate = e2 Assume D1 and D2 are independent If n1 and n2 are sufficiently large, then Approximate: Comparing Performance of 2 Models To test if performance difference is statistically significant: d = e1 – e2 d ~ N(dt,?t) where dt is the true difference Since D1 and D2 are independent, their variance adds up: At (1-?) confidence level, An Illustrative Example Given: M1: n1 = 30, e1 = 0.15 M2: n2 = 5000, e2 = 0.25 d = |e2 – e1| = 0.1 (2-sided test) At 95% confidence level, Z?/2=1.96= Interval contains 0 = difference may not be statistically significant Comparing Performance of 2 Algorithms Each learning algorithm may produce k models: L1 may produce M11 , M12, …, M1k L2 may produce M21 , M22, …, M2k If models are generated on the same test sets D1,D2, …, Dk (e.g., via cross-validation) For each set: compute dj = e1j – e2j dj has mean dt and variance ?t Estimate: Computing Impurity Measure Split on Refund: Entropy(Refund=Yes) = 0 Entropy(Refund=No) = -(2/6)log(2/6) – (4/6)log(4/6) = 0.9183 Entropy(Children) = 0.3 (0) + 0.6 (0.9183) = 0.551 Gain = 0.9 ? (0.8813 – 0.551) = 0.3303 Missing value Before Splitting: Entropy(Parent) = -0.3 log(0.3)-(0.7)log(0.7) = 0.8813 Distribute Instances Refund Y
您可能关注的文档
- chap-3纳米塑料的性能.ppt
- chap 1 系统与系统科学.ppt
- Chap007 Capital Allocation Between the Risky Asset(金融工程-南开大学,王小麓)).ppt
- Chap009 The Capital Asset Pricing Mode(金融工程-南开大学,王小麓))l.ppt
- chap008Return on Invested Capital(财务报表分析,台湾中兴大学).ppt
- chap011Risk, Return and Capital Budgeting(财务管理,台湾大学,Matthew Will).ppt
- Chap.5_直流控制与保护(最终版本)2014.ppt
- chap012The Weighted-Average Cost of Capital and Company Valuation(财务管理Matthew Will).ppt
- chap015Government Policy and Market Failures(中级微观经济学-江西财大,王秋石).ppt
- chap 4 煤的化学组成.ppt
文档评论(0)