- 1、本文档共13页,可阅读全部内容。
- 2、有哪些信誉好的足球投注网站(book118)网站文档一经付费(服务费),不意味着购买了该文档的版权,仅供个人/单位学习、研究之用,不得用于商业用途,未经授权,严禁复制、发行、汇编、翻译或者网络传播等,侵权必究。
- 3、本站所有内容均由合作方或网友上传,本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺!文档内容仅供研究参考,付费前请自行鉴别。如您付费,意味着您自己接受本站规则且自行承担风险,本站不退款、不进行额外附加服务;查看《如何避免下载的几个坑》。如果您已付费下载过本站文档,您可以点击 这里二次下载。
- 4、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等,请点击“版权申诉”(推荐),也可以打举报电话:400-050-0827(电话支持时间:9:00-18:30)。
查看更多
硕 士 研 究 生 读 书 报 告
题目 Spark中离散流的研究
作者姓名
作者学号
指导教师 贝毅君
学科专业 大数据1502
所在学院 软件学院
提交日期 二○一六年三月
The Research On Discrete Flow In Spark
A Dissertation Submitted to
Zhejiang University
in partial fulfillment of the requirements for
the degree of
Master of Engineering
Major Subject: Software Engineering
Advisor: Bei Yijun
By
Zhejiang University, P.R. China
2016
摘要
本文从三个角度来进行描述,首先是简单介绍Spark的情况,初步了解Spark的发展历史、实现Spark的语言以及为什么用Spark来进行大数据的处理,而不是使用Hadoop来处理。
接着是对Spark的优缺点进行描述,通过与Scala、Hadoop等语言进行比较,发现Spark在迭代处理计算方面比Hadoop快100倍以上,同时它还提供了比Hadoop更加丰富的API接口,这些都是它的优势。但是它也有劣势,那就是不支持复杂的SQL统计、内存消耗过大以及稳定性方面还有不足之处。
最后介绍了Spark中的离散流这种新式流数据处理模型,从它如何克服两方面挑战、它的计算模型以及时序方面的考虑情况这三个方面来进行详细介绍,通过这三点来让读者深入了解离散流模型。
关键词: Spark、迭代处理、离散流
Abstract
The article describe Spark from three aspects. First of all is simple introduction to Spark, it makes readers understand the development history of Spark and the language which realizes the Spark and the reason for processing big data by Spark, instead of using Hadoop to process.
Then the advantages and disadvantages of the Spark is described. Compared with Scala, Hadoop and other languages, we find that the spark in the iterative processing calculation is 100 times faster than Hadoop. At the same time, it also provides more abundant API interface than the Hadoop. These are the advantages of it. But it also has disadvantages, it does not support complex SQL statistics and memory consumption of it is very high and stability of it is still inadequate.
At last, the paper introduces the discrete flow in the Spark which is the new data stream processing model, from how it overcomes two challenges and its calculation model and the time sequence of it to introduce the discrete flow in detail. This will enable readers to be impressed by the discrete flow model.
Keywords:Spark, Iterative processing, discrete flow
1 Spark介绍
Spark是UC Berkeley AMP lab所开源的
文档评论(0)