Saturday, April 21, 2018

Resolving Compiler issues with XgBoost GPU install on Amazon Linux

GPU accelerated xgboost has shown performance improvements especially on data set with large number of features, using 'gpu_hist' tree_method. More information can be found here:

http://xgboost.readthedocs.io/en/latest/gpu/

The installation on ubuntu based distribution is straight forward. Best results are obtained with latest generation of Nvidia GPU (AWS P3)
http://xgboost.readthedocs.io/en/latest/build.html#building-with-gpu-support

However, when compiling on Amazon Linux (including deep learning image), the following error is seen at "cmake" step:
cmake .. -DUSE_CUDA=ON
-- The C compiler identification is GNU 7.2.1
-- The CXX compiler identification is GNU 4.8.5
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Could NOT find OpenMP_C (missing: OpenMP_C_FLAGS OpenMP_C_LIB_NAMES) (found version "1.0")
-- Found OpenMP_CXX: -fopenmp (found version "3.1") 
...

CMake Error at /home/ec2-user/anaconda3/envs/python3/share/cmake-3.9/Modules/FindPackageHandleStandardArgs.cmake:137 (message):
  Could NOT find OpenMP_C (missing: OpenMP_C_FLAGS OpenMP_C_LIB_NAMES) (found
  version "1.0")


This can be a frustrating error to correct, if focusing on troubleshooting OpenMP. Turns out, OpenMP version support is "embedded" in the compilers (gcc and g++). So different versions of compilers come with different versions of OpenMP implementation. In this case, the error message indicates OpenMP_C is version 1.0, while OpenMP_CXX is version 3.1. Xgboost GPU will not compile when versions mismatch.

The mismatch is a result of Amazon Linux comes with mismatched C and C++ compiler versions:
$ gcc --version
gcc (GCC) 7.2.1 20170915 (Red Hat 7.2.1-2)
$ g++ --version

g++ (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11)

To fix this problem, remove default older version of cmake, remove gcc (twice since it has two packages), reinstall gcc and gcc-c++:
sudo yum remove cmake
sudo yum remove gcc –y
sudo yum remove gcc –y
sudo yum install gcc48 gcc48-cpp –y
sudo yum install gcc-c++

Also reinstall cmake (using your preferred method)
conda install -c anaconda cmake

Now cmake finds matching versions for xgboost GPU compile to proceed:
-- Found OpenMP_C: -fopenmp (found version "3.1")
-- Found OpenMP_CXX: -fopenmp (found version "3.1")

The lessons learned here: Amazon Linux may not come perfectly set up, check compiler environment before installing new software, especially when GPU and parallel processing is involved.