稀疏矩阵向量乘的访存分析和优化-USB迷|专注于互联网分享

2024年3月6日发(作者：苑建木)

稀疏矩阵向量乘的访存分析和优化

Memory Accessing Analysis of Sparse Matrix Vector Multiplication and

Optimization

Abstract: Sparse Matrix Vector multiplication (SpMV) is one of the most important kernel in computing science.

Both theoretical analysis and practices demonstrate that SpMV is a memory-intensive application. However

state-of-the-art compilers cannot make full use of the modern processors’ characters, the usage ratio of SpMV

bandwidth is only about 10%. In this paper, based on the memory access characters of current processors, we

optimize memory access in SpMV instruction pipeline using SIMD instruction. The experimental results show that

there are 63% and 89% performance improvement on Intel Nehalem and Sandy Bridge, 30% and 36%

improvement on AMD Opteron 6168 and Opteron 8374 compared with the standard SpMV implementation. As for

matrices in real applications, there are 10% performance gained on Nehalem and Sandy Bridge, nevertheless little

performance gain is achieved on the two AMD Opteron platforms.

Key words:

sparse matrix vector multiplication; SpMV; pipeline; SIMD; Memory Access

摘要: 稀疏矩阵向量乘(SpMV)是科学计算中最重要的核心算法之一。理论分析和实际测试结果都表明，SpMV属于访存密集型应用。由于目前主流编译器尚不能充分利用现代处理器的访存特性，SpMV对带宽利用率仅为10%。本文通过探索现代处理器的访存特征，使用单指令流多数据流（SIMD）对SpMV进行流水线访存优化。实验表明与标准SpMV相比，优化后的SpMV在Intel Nehalem和Sandy Bridge架构上流水线性能分别有63%和89%的提升，在AMD Opteron 6168 和Opteron 8374 HE上分别有30%和36%的提升。SpMV在实际矩阵进行的性能测试中，Intel Nehalem和Sandy Bridge架构上均有10%的性能提高，而在两个AMD Opteron平台上性能提高不明显。

关键字：稀疏矩阵向量乘; SpMV; 流水线; SIMD; 访存

中图法分类号:

1 引言

很多科学和工程问题，如物理模拟(粒子碰撞，核爆炸模拟)、医学成像，通过建模，最终可以转化成线性方程组求解问题。由于计算时间的要求，大规模线性方程组往往采用迭代法求解，如Gauss-Seidel 迭代、Jacobi迭代、代数多重网格等。实际工程中，求解问题的矩阵规模一般很大、耗时较长，因此很有必要对其进行优化以提高工程效率。迭代方法的核心操作是矩阵向量乘（y