Motif identification neural design for rapid and sensitive protein family search.pdf

Motif identification neural design for rapid and sensitive protein family search.pdf

  1. 1、本文档共12页,可阅读全部内容。
  2. 2、有哪些信誉好的足球投注网站(book118)网站文档一经付费(服务费),不意味着购买了该文档的版权,仅供个人/单位学习、研究之用,不得用于商业用途,未经授权,严禁复制、发行、汇编、翻译或者网络传播等,侵权必究。
  3. 3、本站所有内容均由合作方或网友上传,本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺!文档内容仅供研究参考,付费前请自行鉴别。如您付费,意味着您自己接受本站规则且自行承担风险,本站不退款、不进行额外附加服务;查看《如何避免下载的几个坑》。如果您已付费下载过本站文档,您可以点击 这里二次下载
  4. 4、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等,请点击“版权申诉”(推荐),也可以打举报电话:400-050-0827(电话支持时间:9:00-18:30)。
查看更多
Motif identification neural design for rapid and sensitive protein family search

MOTIF IDENTIFICATION NEURAL DESIGN FOR RAPID AND SENSITIVE PROTEIN FAMILY SEARCH Cathy H. Wu, Hsi-Lien Chen, Chin-Ju Lo and Jerry W. McLarty Department of Epidemiology/Biomathematics The University of Texas Health Center at Tyler Tyler, TX 75710 Abstract The accelerated growth of the molecular sequencing data has generated a pressing need for advanced sequence annotation tools. This paper reports a new method, termed MOTIFIND (Motif Identification Neural Design), for rapid and sensitive protein family identification. The method is extended from our previous gene classification artificial neural system and employs two new designs to enhance the detection of distant relationships. These include an n-gram term weighting algorithm for extracting local motif patterns, and integrated neural networks for combining global and local sequence information. The system has been tested with three protein families of electron transferases, namely cytochrome c, cytochrome b and flavodoxin, with a 100% sensitivity and more than 99.6% specificity. The accuracy of MOTIFIND is comparable to the BLAST database search method, but its speed is more than 20 times faster. The system is much more robust than the PROSITE search which is based on simple signature patterns. MOTIFIND also compares favorably with the BLIMPS search of BLOCKS in detecting fragmentary sequences lacking complete motif regions. The method has the potential to become a full-scale database search and sequence analysis tool. Introduction As technology improves and molecular sequencing data accumulate nearly exponentially, progress in the Human Genome Project will depend increasingly on the development of advanced computational tools for rapid and accurate annotation of genomic sequences. Currently, a database search for sequence similarities is the most direct computational means of deciphering codes that connect molecular sequences with protein structure and function [Doolittle, 1990]. There are good algorithm

您可能关注的文档

文档评论(0)

l215322 + 关注
实名认证
内容提供者

该用户很懒,什么也没介绍

1亿VIP精品文档

相关文档