RELIONトラブルシューティング
提供: Eospedia
目次
Motion Correction
ERROR: TIFF support was not enabled during compilation
症状
2019/5/14, v3.0.5, build from source in Ubuntu 16.04.6 LTS
RELIONチュートリアルでMotion correctionするとき、MoitonCor2のRELION実装使ったら表題のエラーが出た。
対処
- libtiffのdevel入れる必要がある。
- Issue上がってた。 https://github.com/3dem/relion/issues/383
- relion.gitのreadmeにも書いてある。 https://github.com/3dem/relion
- sudo apt install ... のところに列挙してなかったから見落としてた。マニュアルはちゃんと読もう。
# relionのルートディレクトリに移動 (cmakeとかsrcとかあるディレクトリ) $ rm -r build/ install/ $ sudo apt install libtiff5-dev $ mkdir build/ install/ $ cd build && cmake -DCMAKE_INSTALL_PREFIX=../install .. $ make -j10 $ make install
(普通は上記だけでlibtiffをリンクしたRELIONをビルドできるはずです。ただ、minicondaやanacondaを使ってEMAN2をビルドした環境とかでは、minicondaにCMakeの検索パスを持ってかれるというつらい現象が起きるかもしれません。)
2D classification
[[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file runtime/orte_init.c at line 129
症状
2019/5/21, v3.0.5, build from source in Ubuntu 16.04.6 LTS
RELION3チュートリアルで最初の2D classificationを実行したらMPI関連のエラーが出て異常終了した。
エラー全文は以下。DL-Boxはホストマシン名。
[DL-Box:00925] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file runtime/orte_init.c at line 129 [DL-Box:00926] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file runtime/orte_init.c at line 129 [DL-Box:00927] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file runtime/orte_init.c at line 129 [DL-Box:00928] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file runtime/orte_init.c at line 129 *** An error occurred in MPI_Init *** on a NULL communicator *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, *** and potentially your MPI job) *** An error occurred in MPI_Init *** on a NULL communicator *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, *** and potentially your MPI job) *** An error occurred in MPI_Init *** on a NULL communicator *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, *** and potentially your MPI job) *** An error occurred in MPI_Init *** on a NULL communicator *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, *** and potentially your MPI job) [DL-Box:926] Local abort before MPI_INIT completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed! [DL-Box:925] Local abort before MPI_INIT completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed! [DL-Box:928] Local abort before MPI_INIT completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed! [DL-Box:927] Local abort before MPI_INIT completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed! [DL-Box:00929] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file runtime/orte_init.c at line 129 *** An error occurred in MPI_Init *** on a NULL communicator *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, *** and potentially your MPI job) [DL-Box:929] Local abort before MPI_INIT completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed!
ジョブのパラメータのうちMPIに関係するかもしれない部分は以下(I/Oとかも含め)
Combine iterations through disc? == No Use parallel disc I/O? == Yes Pre-read all particles into RAM? == Yes Which GPUs to use: == 0:1:2:3 Minimum dedicated cores per node: == 1 Number of MPI procs: == 5 Number of pooled particles: == 30 Number of threads: == 3 Copy particles to scratch directory: == Use GPU acceleration? == Yes
対処
Use parallel disc I/OをNoにしてみる
- 変化なし。
MPIのチェック
$ mpirun --version mpirun (Open MPI) 2.0.2
$ mpirun --np 5 --cpus-per-proc 3 ls -------------------------------------------------------------------------- The following command line options and corresponding MCA parameter have been deprecated and replaced as follows: Command line options: Deprecated: --cpus-per-proc, -cpus-per-proc, --cpus-per-rank, -cpus-per-rank Replacement: --map-by <obj>:PE=N, default <obj>=NUMA Equivalent MCA parameter: Deprecated: rmaps_base_cpus_per_proc Replacement: rmaps_base_mapping_policy=<obj>:PE=N, default <obj>=NUMA The deprecated forms *will* disappear in a future version of Open MPI. Please update to the new syntax. -------------------------------------------------------------------------- -------------------------------------------------------------------------- WARNING: a request was made to bind a process. While the system supports binding the process itself, at least one node does NOT support binding memory to the process location. Node: DL-Box Open MPI uses the "hwloc" library to perform process and memory binding. This error message means that hwloc has indicated that processor binding support is not available on this machine. On OS X, processor and memory binding is not available at all (i.e., the OS does not expose this functionality). On Linux, lack of the functionality can mean that you are on a platform where processor and memory affinity is not supported in Linux itself, or that hwloc was built without NUMA and/or processor affinity support. When building hwloc (which, depending on your Open MPI installation, may be embedded in Open MPI itself), it is important to have the libnuma header and library files available. Different linux distributions package these files under different names; look for packages with the word "numa" in them. You may also need a developer version of the package (e.g., with "dev" or "devel" in the name) to obtain the relevant header files. If you are getting this message on a non-OS X, non-Linux platform, then hwloc does not support processor / memory affinity on this platform. If the OS/platform does actually support processor / memory affinity, then you should contact the hwloc maintainers: https://github.com/open-mpi/hwloc. This is a warning only; your job will continue, though performance may be degraded. -------------------------------------------------------------------------- -------------------------------------------------------------------------- A request was made to bind to that would result in binding more processes than cpus on a resource: Bind to: CORE:IF-SUPPORTED Node: DL-Box #processes: 2 #cpus: 1 You can override this protection by adding the "overload-allowed" option to your binding directive. --------------------------------------------------------------------------
...?
オプション与えずに実行すると普通に並列に走る
$ mpirun echo 'hello' hello hello hello hello hello hello