最新消息: USBMI致力于为网友们分享Windows、安卓、IOS等主流手机系统相关的资讯以及评测、同时提供相关教程、应用、软件下载等服务。

稀疏矩阵向量乘的访存分析和优化

IT圈 admin 30浏览 0评论

2024年3月6日发(作者:苑建木)

稀疏矩阵向量乘的访存分析和优化

Memory Accessing Analysis of Sparse Matrix Vector Multiplication and

Optimization

Abstract: Sparse Matrix Vector multiplication (SpMV) is one of the most important kernel in computing science.

Both theoretical analysis and practices demonstrate that SpMV is a memory-intensive application. However

state-of-the-art compilers cannot make full use of the modern processors’ characters, the usage ratio of SpMV

bandwidth is only about 10%. In this paper, based on the memory access characters of current processors, we

optimize memory access in SpMV instruction pipeline using SIMD instruction. The experimental results show that

there are 63% and 89% performance improvement on Intel Nehalem and Sandy Bridge, 30% and 36%

improvement on AMD Opteron 6168 and Opteron 8374 compared with the standard SpMV implementation. As for

matrices in real applications, there are 10% performance gained on Nehalem and Sandy Bridge, nevertheless little

performance gain is achieved on the two AMD Opteron platforms.

Key words:

sparse matrix vector multiplication; SpMV; pipeline; SIMD; Memory Access

摘 要: 稀疏矩阵向量乘(SpMV)是科学计算中最重要的核心算法之一。理论分析和实际测试结果都表明,SpMV属于访存密集型应用。由于目前主流编译器尚不能充分利用现代处理器的访存特性,SpMV对带宽利用率仅为10%。本文通过探索现代处理器的访存特征,使用单指令流多数据流(SIMD)对SpMV进行流水线访存优化。实验表明与标准SpMV相比,优化后的SpMV在Intel Nehalem和Sandy Bridge架构上流水线性能分别有63%和89%的提升,在AMD Opteron 6168 和Opteron 8374 HE上分别有30%和36%的提升。SpMV在实际矩阵进行的性能测试中,Intel Nehalem和Sandy Bridge架构上均有10%的性能提高,而在两个AMD Opteron平台上性能提高不明显。

关键字:稀疏矩阵向量乘; SpMV; 流水线; SIMD; 访存

中图法分类号:

1 引言

很多科学和工程问题,如物理模拟(粒子碰撞,核爆炸模拟)、医学成像,通过建模,最终可以转化成线性方程组求解问题。由于计算时间的要求,大规模线性方程组往往采用迭代法求解,如Gauss-Seidel 迭代、Jacobi迭代、代数多重网格等。实际工程中,求解问题的矩阵规模一般很大、耗时较长,因此很有必要对其进行优化以提高工程效率。迭代方法的核心操作是矩阵向量乘(y

2024年3月6日发(作者:苑建木)

稀疏矩阵向量乘的访存分析和优化

Memory Accessing Analysis of Sparse Matrix Vector Multiplication and

Optimization

Abstract: Sparse Matrix Vector multiplication (SpMV) is one of the most important kernel in computing science.

Both theoretical analysis and practices demonstrate that SpMV is a memory-intensive application. However

state-of-the-art compilers cannot make full use of the modern processors’ characters, the usage ratio of SpMV

bandwidth is only about 10%. In this paper, based on the memory access characters of current processors, we

optimize memory access in SpMV instruction pipeline using SIMD instruction. The experimental results show that

there are 63% and 89% performance improvement on Intel Nehalem and Sandy Bridge, 30% and 36%

improvement on AMD Opteron 6168 and Opteron 8374 compared with the standard SpMV implementation. As for

matrices in real applications, there are 10% performance gained on Nehalem and Sandy Bridge, nevertheless little

performance gain is achieved on the two AMD Opteron platforms.

Key words:

sparse matrix vector multiplication; SpMV; pipeline; SIMD; Memory Access

摘 要: 稀疏矩阵向量乘(SpMV)是科学计算中最重要的核心算法之一。理论分析和实际测试结果都表明,SpMV属于访存密集型应用。由于目前主流编译器尚不能充分利用现代处理器的访存特性,SpMV对带宽利用率仅为10%。本文通过探索现代处理器的访存特征,使用单指令流多数据流(SIMD)对SpMV进行流水线访存优化。实验表明与标准SpMV相比,优化后的SpMV在Intel Nehalem和Sandy Bridge架构上流水线性能分别有63%和89%的提升,在AMD Opteron 6168 和Opteron 8374 HE上分别有30%和36%的提升。SpMV在实际矩阵进行的性能测试中,Intel Nehalem和Sandy Bridge架构上均有10%的性能提高,而在两个AMD Opteron平台上性能提高不明显。

关键字:稀疏矩阵向量乘; SpMV; 流水线; SIMD; 访存

中图法分类号:

1 引言

很多科学和工程问题,如物理模拟(粒子碰撞,核爆炸模拟)、医学成像,通过建模,最终可以转化成线性方程组求解问题。由于计算时间的要求,大规模线性方程组往往采用迭代法求解,如Gauss-Seidel 迭代、Jacobi迭代、代数多重网格等。实际工程中,求解问题的矩阵规模一般很大、耗时较长,因此很有必要对其进行优化以提高工程效率。迭代方法的核心操作是矩阵向量乘(y

发布评论

评论列表 (0)

  1. 暂无评论