Error: not enough free space for shared memory

典型报错内容:

Error: not enough free space for shared memory: need 11954331648, have 6760333312
KID0: Process received SIGBUS. Most likely cause: disk or shared memory full.
KID1: Process received SIGBUS. Most likely cause: disk or shared memory full.
Process received SIGBUS. Most likely cause: disk or shared memory full.

原因:硬盘空间不足,请在作业被提交的那个节点,输入df命令回车,检查有没有Uses% 列达到约99%的项目。该列指硬盘各个分区,使用情况。99%表示该区域被占尽,没有剩余空间了,请清理该区域的数据。

作业的*.out文件开头会有使用的节点信息,例如:

 Parallel Execution: Process Information
 ==============================================================================
 Rank   Node Name                              NodeID   MyNodeRank  NodeMaster
    0   node08                                    0          0          0
    1   node08                                    0          1         -1
    2   node08                                    0          2         -1
    3   node08                                    0          3         -1
    4   node08                                    0          4         -1
    5   node08                                    0          5         -1
    6   node08                                    0          6         -1
    7   node08                                    0          7         -1
    8   node08                                    0          8         -1
    9   node08                                    0          9         -1
   10   node08                                    0         10         -1
   11   node08                                    0         11         -1
   12   node08                                    0         12         -1
   13   node08                                    0         13         -1
   14   node08                                    1          0          1
   15   node08                                    1          1         -1
   16   node08                                    1          2         -1
   17   node08                                    1          3         -1
   18   node08                                    1          4         -1
   19   node08                                    1          5         -1
   20   node08                                    1          6         -1
   21   node08                                    1          7         -1
   22   node08                                    1          8         -1
   23   node08                                    1          9         -1
   24   node08                                    1         10         -1
   25   node08                                    1         11         -1
   26   node08                                    1         12         -1
   27   node08                                    1         13         -1
 ==============================================================================

表示该作业是投递到node08上面运行的,并且使用了其中2个CPU,每个CPU用了14核心。