Chapter13(PDF)-StanfordNLPGroup.pdf

  1. 1、本文档共35页,可阅读全部内容。
  2. 2、有哪些信誉好的足球投注网站(book118)网站文档一经付费(服务费),不意味着购买了该文档的版权,仅供个人/单位学习、研究之用,不得用于商业用途,未经授权,严禁复制、发行、汇编、翻译或者网络传播等,侵权必究。
  3. 3、本站所有内容均由合作方或网友上传,本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺!文档内容仅供研究参考,付费前请自行鉴别。如您付费,意味着您自己接受本站规则且自行承担风险,本站不退款、不进行额外附加服务;查看《如何避免下载的几个坑》。如果您已付费下载过本站文档,您可以点击 这里二次下载
  4. 4、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等,请点击“版权申诉”(推荐),也可以打举报电话:400-050-0827(电话支持时间:9:00-18:30)。
查看更多
Chapter13(PDF)-StanfordNLPGroup

DRAFT! ? April 1, 2009 Cambridge University Press. Feedback welcome. 13 Text classi?cation and Naive Bayes 253 STANDING QUERY C L A S S I FI C AT I O N ROUTING FI LT E R I N G TEXT CLASSIFICATION Thus far, this book has mainly discussed the process of ad hoc retrieval, where users have transient information needs that they try to address by posing one or more queries to a search engine. However, many users have ongoing information needs. For example, you might need to track developments in multicore computer chips. One way of doing this is to issue the query multicore AND computer AND chip against an index of recent newswire articles each morning. In this and the following two chapters we examine the question: How can this repetitive task be automated? To this end, many systems support standing queries. A standing query is like any other query except that it is periodically executed on a collection to which new documents are incrementally added over time. If your standing query is just multicore AND computer AND chip, you will tend to miss many relevant new articles which use other terms such as multicore processors. To achieve good recall, standing queries thus have to be re?ned over time and can gradually become quite complex. In this example, using a Boolean search engine with stemming, you might end up with a query like (multicore OR multi-core) AND (chip OR processor OR microprocessor). To capture the generality and scope of the problem space to which standing queries belong, we now introduce the general notion of a classi?cation problem. Given a set of classes, we seek to determine which class(es) a given object belongs to. In the example, the standing query serves to divide new newswire articles into the two classes: documents about multicore computer chips and documents not about multicore computer chips. We refer to this as two-class classi?cation. Classi?cation using standing queries is also called routing or ?lteringand will be discussed further in Sectio

文档评论(0)

***** + 关注
实名认证
内容提供者

该用户很懒,什么也没介绍

版权声明书
用户编号:8010045112000002

1亿VIP精品文档

相关文档