–33Νε-高性能计算-上海交通大学.doc

下载文档 降价啦

0
0
约1.01万字
约 10页
2017-03-16 发布于天津
举报
版权申诉
保障服务

–33Νε-高性能计算-上海交通大学.doc

1、有哪些信誉好的足球投注网站（book118）网站文档一经付费（服务费），不意味着购买了该文档的版权，仅供个人/单位学习、研究之用，不得用于商业用途，未经授权，严禁复制、发行、汇编、翻译或者网络传播等，侵权必究。。
2、本站所有内容均由合作方或网友上传，本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺！文档内容仅供研究参考，付费前请自行鉴别。如您付费，意味着您自己接受本站规则且自行承担风险，本站不退款、不进行额外附加服务；查看《如何避免下载的几个坑》。如果您已付费下载过本站文档，您可以点击这里二次下载。
3、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等，请点击“版权申诉”（推荐），也可以打举报电话：400-050-0827(电话支持时间：9:00-18:30)。

动态网格的DSMC方法在GPU上的并行Simon Chong Wee See 1,2 1(上海交通大学高性能计算中心,上海 200240) 2(NVIDIA Corporation) A GPU Based Parallel Method For Dynamic Collision Grid DSMC WEN Minhua1+, LIN Xinhua1, Simon Chong Wee See 1,2 1(High Performance Computing Center, Shanghai Jiao Tong University, Shanghai 200240, China) 2(NVIDIA Corporation ) + Corresponding author: Phn: +86 E-mail: wenminhua@sjtu.edu.cn Abstract: The Direct Simulation Monte Carlo (DSMC) method is a powerful computational tool in the field of rarefied gas dynamics. However, there are two main shortages of DSMC method: one is complex gridding processing and the other is its large time consumption. The dynamic collision grid DSMC method generates collision grids adaptively according to the flowfield, which overcomes the first shortage. For the other shortage, we port the dynamic collision grid DSMC method to GPU using CUDA. During our parallel implement, the main computation is performed on GPU while CPU only deals with the processes of initialization and output. A two-dimensional benchmark problem in different sizes is used to demonstrate the correctness of the parallelization. The results show that 10+X speedup is achieved based on NVIDIA Fermi C2050. For a same case, the performance on NVIDIA’s newly released Kepler K20 is 1.3-1.6x higher than that on Fermi C2050. Key words: CUDA, GPU, Dynamic Collision Grid DSMC, Parallel Simulation 摘要：直接模拟蒙特卡罗方法（Direct Simulation Monte Carlo，DSMC）是稀薄气体动力学领域的重要工具。然而，DSMC方法有两个比较主要的缺点：一是复杂的网格处理，另一个是庞大的计算量。使用动态网格的DSMC方法可以根据流场信息，动态生成自适应的碰撞网格，有效解决前一个缺点；针对后一个缺点，本文则基于动态网格的DSMC方法，使用CUDA编写并行程序，将其移植到GPU上以减少计算时间。在并行实现中，GPU负责绝大部分的计算，而CPU只负责初始化、结果输出等少量工作。我们使用一个二维超音速横掠平板问题作为算例验证了并行程序的正确性。对于不同规模的算例，NVIDIA Fermi C2050之上均获得了10倍以上的加速比 K20上的速度约为Fermi C2050上的1.3-1.6倍。关键词：CUDA, GPU, 动态网格DSMC, 并行模拟: TP39　　　文献标识码: B 引言连续性假设通常用于模拟气体流动，当气体十分稀薄时，粒子的间断效应十分明显，连续性假设不再成立，而应当使用稀薄气体动力学的方法才能得到正确的结果。用于描述稀薄气体的控制方程为Boltzmann方程： (1) 它是一个微分积分方程，右端的碰撞项极其复杂，而且方程的变量很多（达7个），因此对于一般的问题求得解析解几乎不可能。另外，对