性能库intel数学核心库（mkl）.ppt

下载文档 降价啦

354
0
约1.73万字
约 29页
2017-02-16 发布于天津
举报
版权申诉
保障服务

性能库intel数学核心库（mkl）.ppt

1、本文档共29页，可阅读全部内容。
2、有哪些信誉好的足球投注网站（book118）网站文档一经付费（服务费），不意味着购买了该文档的版权，仅供个人/单位学习、研究之用，不得用于商业用途，未经授权，严禁复制、发行、汇编、翻译或者网络传播等，侵权必究。
3、本站所有内容均由合作方或网友上传，本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺！文档内容仅供研究参考，付费前请自行鉴别。如您付费，意味着您自己接受本站规则且自行承担风险，本站不退款、不进行额外附加服务；查看《如何避免下载的几个坑》。如果您已付费下载过本站文档，您可以点击这里二次下载。
4、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等，请点击“版权申诉”（推荐），也可以打举报电话：400-050-0827(电话支持时间：9:00-18:30)。

性能库intel数学核心库（mkl）

* Intel? Math Kernel Library Contents Each of the BLAS has 4 data types: single and double precision real and complex data types. Most all the functions (with some exceptions) have identical functionality in each data type. The extended BLAS are a set of level 1 BLAS, which support sparse data. * Intel? Math Kernel Library Contents Intel MKL’s value-add to the LAPACK code includes: Just building the LAPACK code takes some effort Threading key portions of the functions Optimizing key functions through the use of recursion The new Fourier transforms meet the needs of a far wider audience than did the previous radix-2 FFTs. This list shows key features. Optimization of the functions will continue for some time yet, but the complex transforms are well optimized for IPF-2 now. VML and VSL offer improved performance over scalar implementations of the underlying functions provided the user can vectorize the code. * Roll Your Own/Dot Product Roll Your Own: This is a simple, straightforward dot product approach to matrix multiplication. Note that the innermost loop is a dot product, and thus can be replaced with a call to the dot product, which is shown in the second panel. * DGEMV/DGEMM The two innermost loops comprise a matrix-vector multiply, which can form the central operation of matrix multiplication. DGEMV parameters: incx = 1; incy = ldb; alpha = 1.0; beta = 0.0; transa = t; DGEMM parameters: alpha = 1.0; beta = 0.0; * Intel? Math Kernel Library Optimizations in LAPACK* Threading at higher levels (LAPACK factorization rather than at DGEMM, for instance) opens additional parallelization opportunities. The blocking strategy employed in traditional LAPACK can be extended to the factorization of the block columns to improve locality of reference and minimize vector operations. NETLIB LAPACK has numerous intrinsic function calls, which raises the need for run-time library support. All of these calls have been implemented within Intel MKL, so no run-time