打算这几天上手MPI,开始学习MPI并行编程,目前的主要目的是能够快速使用MPI的API将现有程序并行化,例子代码均用C语言实现。

这篇主要记录使用自己用户在实验室的IBM刀片集群上搭建MPI并行环境以及简单的测试,具体什么是MPI这里就不多讲了,主要记录自己的MPI环境的搭建过程。

下载MPICH

现在MPICH官网上已经有MPICH3.2,但是安装了以后发现3.2的进程管理器并没有mpd而是Hydra Process Manager为了用上最新的我也就直接上最新版了肯定不会有错。

下下来的压缩包为: mpich-3.2.tar.gz

编译和安装

解压

1
2
tar zxvf mpich-3.2.tar.gz
cd mpich-3.2

编译

MPICH提供了很多的配置选项,可以通过如下命令查看:

1
./configure -h

使用默认的进程管理器以及支持调式,参数设置为:

1
2
3
./configure CC=gcc CXX=g++ --prefix=/data/home/zjshao/hpc/mpich-install --enable-debuginfo --enable-fast=all 2>&1 | tee c.txt
make
make install

然后就要经过漫长的编译过程 - -!。

设置环境变量:

1
export PATH=/data/home/zjshao/hpc/mpich-install:$PATH

配置和验证

MPI进程的创建、启动和管理需要借助进程管理器来完成,直观的讲PM就是MPI环境与操作系统的接口。
正常MPI会在多个主机上启动进程管理器进程,形成MPI运行时的环境,这样目标主机就需要配置集群环境支持节点间无密码登录,这里的原理主要是通过密钥进行身份验证的方式,由于实验室的集群上已经对这里进行了配置,我就不再进行配置了。

在安装目录下面的/example中有一个数值求pi的程序cpi可以帮助我们来运行验证MPI环境是否正确的安装和运行。

1
2
[zjshao@master examples]$ which mpiexec
~/hpc/mpich-install/bin/mpiexec

测试在主节点进行并行

1
2
3
4
5
6
7
[zjshao@master examples]$ mpiexec -n 4 ./cpi
Process 0 of 4 is on master.cluster
Process 1 of 4 is on master.cluster
Process 2 of 4 is on master.cluster
Process 3 of 4 is on master.cluster
pi is approximately 3.1415926544231239, Error is 0.0000000008333307
wall clock time = 0.000315

在计算节点node01上运行6个进程

1
2
3
4
5
6
7
[zjshao@master examples]$ mpiexec -host node01 -n 4 ./cpi
Process 0 of 4 is on node01
Process 2 of 4 is on node01
Process 3 of 4 is on node01
Process 1 of 4 is on node01
pi is approximately 3.1415926544231239, Error is 0.0000000008333307
wall clock time = 0.000119

接下来通过文件制定环境中节点和进程信息

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
[zjshao@master examples]$ cat > hosts <<!
> node01:2
> node02:4
> node03:6
> !
[zjshao@master examples]$ mpiexec -machinefile hosts -n 8 cpi
Process 1 of 8 is on node01
Process 6 of 8 is on node03
Process 0 of 8 is on node01
Process 7 of 8 is on node03
Process 5 of 8 is on node02
Process 2 of 8 is on node02
Process 4 of 8 is on node02
Process 3 of 8 is on node02
pi is approximately 3.1415926544231247, Error is 0.0000000008333316
wall clock time = 0.003747

查看mpi编译器信息

1
2
3
4
5
6
7
8
9
[zjshao@master examples]$ mpichversion
MPICH Version: 3.2
MPICH Release date: Wed Nov 11 22:06:48 CST 2015
MPICH Device: ch3:nemesis
MPICH configure: CC=gcc CXX=g++ --prefix=/data/home/zjshao/hpc/mpich-install --enable-debuginfo --enable-fast=all
MPICH CC: gcc -DNDEBUG -DNVALGRIND -O2
MPICH CXX: g++ -DNDEBUG -DNVALGRIND -O2
MPICH F77: ifort -O2
MPICH FC: ifort -O2

其实看一下所谓的mpiccmpicxx等,其实是一个shell的脚本,他会根据在编译前配置时候的信息自动生成,包括编译器,编译参数,头文件地址和库文件路径等。
例如mpicxx中:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
...
# Directory locations: Fixed for any MPI implementation
prefix=/data/home/zjshao/hpc/mpich-install
exec_prefix=/data/home/zjshao/hpc/mpich-install
sysconfdir=/data/home/zjshao/hpc/mpich-install/etc
includedir=/data/home/zjshao/hpc/mpich-install/include
libdir=/data/home/zjshao/hpc/mpich-install/lib
# Default settings for compiler, flags, and libraries
CXX="g++"
MPICH_VERSION="3.2"
enable_wrapper_rpath="yes"
# How to pass a linker flag through the compiler.
wl="-Wl,"
# Static library suffix (normally "a").
libext="a"
# Shared library suffix (normally "so").
shlibext="so"
# Format of library name prefix.
libname_spec="lib\$name"
...
...

同理mpif90, mpicc都是。


好了,目前MPI的环境基本上已经可以正确的运行啦。

Comments