Intel Threading Building Blocks

Summary

In this post, I will introduce how to solve a parallel computation task using Intel Threading Building Blocks.

Problem

In the deep learning platform, given inputs contains several thousand images, we want to analyze the data path of a certain deep learning model. The analysis part of each image is identical, for example, we want to collect the minimum and maximum values of a certain layer in the model. Say that we only care about the output of the last Softmax layer which is basically a 1-D array. So the problem is we want to collect the minimum and maximum value (two float numbers) of that layer for all the image inputs.

Settings

  1. The deep learning platform works on a company self-designed compiler which only supports C++. Thus, our solution should base on C++.
  2. Say that we have 5,000 images.
  3. The model is quite complex which takes about 2 minutes to get the output result of 1 image. It means if we use sequential solution and calculate output one by one, it costs about $2 \times 5,000 / 60 /var/www/htmlrox 167 hours /var/www/htmlrox 7 days$. We can not afford such a long time.

Research

Since the behavior of each image is similar, the intuition is to utilize multiple threads to do parallel computations. We can easily create threads in C++ 14 thanks to the thread library. However, it makes no sense that we create 5000 threads (actually the overhead of creating and deleting those threads will ruin the performance, see this reference).

We can manually maintain a thread pool. We firstly create $MAX_THREADS$ (a typical value is the number of cores) threads and every time a thread finished, we start a new thread until all the images have been processed.

It is not a good idea to build wheels from scratch. There are two most popular libraries that implements thread pools, Intel Thread Building Blocks(TBB) and Microsoft Parallel Patterns Library(PPL).

After research, PPL is not good for multi-platform development. Thus, we choose TBB.

Install and Build

There are lots of bugs before I make it work. I only present the succeed way for you and for my later review.

  1. Install CodesThe easiest way is to use apt-get.
    sudo apt-get install libtbb-dev

    It will automatically add the lib and build of TBB to the system PATH. It helps for the following find_package in CMake.

    Try not to build TBB from source codes, since there is no make install and the scripts handling PATH in TBB do not work well.

  2. CMakeIn order to include the library and link the .so files of TBB, you should write a well-defined MakeFile. We use CMake instead which is more readable.However, to make find_package work, we should download another file.In your CMakeList, write the following codes. The codes are self-explained.
    ###############
    # Add FindTBB.cmake path file to the module path
    list(APPEND CMAKE_MODULE_PATH "${PROJECT_SOURCE_DIR}/")
    
    # Set RPATHS in executables
    set(CMAKE_INSTALL_RPATH "${CMAKE_INSTALL_PREFIX}/lib")
    
    # ==============================================================================
    #
    # Print input variables used by FindTBB.cmake
    
    message("CMAKE_SYSTEM_NAME = '${CMAKE_SYSTEM_NAME}'")
    message("CMAKE_BUILD_TYPE  = '${CMAKE_BUILD_TYPE}'")
    
    message("User Input Variables:")
    message("TBB_ROOT_DIR = '${TBB_ROOT_DIR}'")
    message("TBB_INCLUDE_DIR = '${TBB_INCLUDE_DIR}'")
    message("TBB_LIBRARY = '${TBB_LIBRARY}'")
    message("TBB_tbb_LIBRARY = '${TBB_tbb_LIBRARY}'")
    message("TBB_tbb_debug_LIBRARY = '${TBB_tbb_debug_LIBRARY}'")
    message("TBB_tbbmalloc_LIBRARY = '${TBB_tbbmalloc_LIBRARY}'")
    message("TBB_tbbmalloc_debug_LIBRARY = '${TBB_tbbmalloc_debug_LIBRARY}'")
    message("TBB_tbb_preview_LIBRARY = '${TBB_tbb_preview_LIBRARY}'")
    message("TBB_tbb_preview_debug_LIBRARY = '${TBB_tbb_preview_debug_LIBRARY}'")
    message("TBB_USE_DEBUG_BUILD = '${TBB_USE_DEBUG_BUILD}'")
    
    message("Environment Varaibles used by FindTBB:")
    message("TBB_INSTALL_DIR = '${TBB_INSTALL_DIR}'")
    message("TBBROOT         = '${TBBROOT}'")
    message("LIBRARY_PATH    = '${LIBRARY_PATH}'")
    
    #find_package(TBB COMPONENTS tbbmalloc tbbmalloc_proxy tbb_preview)
    find_package(TBB)
    
    # ==============================================================================
    # Print output variables from FindTBB.cmake
    
    set(TBB_SEARCH_COMPOMPONENTS tbb_preview tbbmalloc_proxy tbbmalloc tbb)
    
    message("FindTBB Result Variables:")
    message("TBB_FOUND = '${TBB_FOUND}'")
    message("TBB_tbbmalloc_FOUND = '${TBB_tbbmalloc_FOUND}'")
    message("TBB_tbbmalloc_proxy_FOUND = '${TBB_tbbmalloc_FOUND}'")
    message("TBB_tbb_preview_FOUND = '${TBB_tbb_preview_FOUND}'")
    message("TBB_VERSION = '${TBB_VERSION}'")
    message("TBB_VERSION_MAJOR = '${TBB_VERSION_MAJOR}'")
    message("TBB_VERSION_MINOR = '${TBB_VERSION_MINOR}'")
    message("TBB_INTERFACE_VERSION = '${TBB_INTERFACE_VERSION}'")
    foreach(_comp ${TBB_SEARCH_COMPOMPONENTS})
       message("TBB_${_comp}_LIBRARY_RELEASE = '${TBB_${_comp}_LIBRARY_RELEASE}'")
       message("TBB_${_comp}_LIBRARY_DEBUG = '${TBB_${_comp}_LIBRARY_DEBUG}'")
       message("TBB_${_comp}_LIBRARY = '${TBB_${_comp}_LIBRARY}'")
    endforeach()
    
    message("FindTBB Output Variables:")
    message("TBB_INCLUDE_DIRS = '${TBB_INCLUDE_DIRS}'")
    message("TBB_LIBRARIES_RELEASE = '${TBB_LIBRARIES_RELEASE}'")
    message("TBB_LIBRARIES_DEBUG = '${TBB_LIBRARIES_DEBUG}'")
    message("TBB_LIBRARIES = '${TBB_LIBRARIES}'")
    message("TBB_DEFINITIONS = '${TBB_DEFINITIONS}'")
    
    message("TBB_INCLUDE_DIRS = '${TBB_INCLUDE_DIRS}'")
    message("TBB_LIBRARIES_RELEASE = '${TBB_LIBRARIES_RELEASE}'")
    message("TBB_LIBRARIES_DEBUG = '${TBB_LIBRARIES_DEBUG}'")
    message("TBB_LIBRARIES = '${TBB_LIBRARIES}'")
    message("TBB_DEFINITIONS = '${TBB_DEFINITIONS}'")

    A good output of CMake is showed below.

    CMAKE_SYSTEM_NAME = 'Linux'
    CMAKE_BUILD_TYPE  = 'Debug'
    User Input Variables:
    TBB_ROOT_DIR = ''
    TBB_INCLUDE_DIR = ''
    TBB_LIBRARY = ''
    TBB_tbb_LIBRARY = ''
    TBB_tbb_debug_LIBRARY = ''
    TBB_tbbmalloc_LIBRARY = ''
    TBB_tbbmalloc_debug_LIBRARY = ''
    TBB_tbb_preview_LIBRARY = ''
    TBB_tbb_preview_debug_LIBRARY = ''
    TBB_USE_DEBUG_BUILD = ''
    Environment Varaibles used by FindTBB:
    TBB_INSTALL_DIR = ''
    TBBROOT         = ''
    LIBRARY_PATH    = ''
    FindTBB Result Variables:
    TBB_FOUND = 'TRUE'
    TBB_tbbmalloc_FOUND = ''
    TBB_tbbmalloc_proxy_FOUND = ''
    TBB_tbb_preview_FOUND = ''
    TBB_VERSION = '4.4'
    TBB_VERSION_MAJOR = '4'
    TBB_VERSION_MINOR = '4'
    TBB_INTERFACE_VERSION = '9002'
    TBB_tbb_preview_LIBRARY_RELEASE = ''
    TBB_tbb_preview_LIBRARY_DEBUG = ''
    TBB_tbb_preview_LIBRARY = ''
    TBB_tbbmalloc_proxy_LIBRARY_RELEASE = ''
    TBB_tbbmalloc_proxy_LIBRARY_DEBUG = ''
    TBB_tbbmalloc_proxy_LIBRARY = ''
    TBB_tbbmalloc_LIBRARY_RELEASE = ''
    TBB_tbbmalloc_LIBRARY_DEBUG = ''
    TBB_tbbmalloc_LIBRARY = ''
    TBB_tbb_LIBRARY_RELEASE = '/usr/lib/x86_64-linux-gnu/libtbb.so'
    TBB_tbb_LIBRARY_DEBUG = 'TBB_tbb_LIBRARY_DEBUG-NOTFOUND'
    TBB_tbb_LIBRARY = ''
    FindTBB Output Variables:
    TBB_INCLUDE_DIRS = '/usr/include'
    TBB_LIBRARIES_RELEASE = '/usr/lib/x86_64-linux-gnu/libtbb.so'
    TBB_LIBRARIES_DEBUG = ''
    TBB_LIBRARIES = '/usr/lib/x86_64-linux-gnu/libtbb.so'
    TBB_DEFINITIONS = ''
    TBB_INCLUDE_DIRS = '/usr/include'
    TBB_LIBRARIES_RELEASE = '/usr/lib/x86_64-linux-gnu/libtbb.so'
    TBB_LIBRARIES_DEBUG = ''
    TBB_LIBRARIES = '/usr/lib/x86_64-linux-gnu/libtbb.so'
    TBB_DEFINITIONS = ''

Usage Example

TBB has good documentations. Here we use parallel_for_each, note that when update the global analysis, we should avoid race condition (here I use mutex).

// use global varible to simplize codes
unordered_map<string, datapath_analysis_new::Layer> layer_name_to_all_files_analysis;
mutex all_files_analysis_mutex;

void safe_analysis_update(unordered_map<string, datapath_analysis_new::Layer> layer_name_to_one_file_analysis, ProgressBar &progressBar) {
    // while a thread is updating all_files_analysis
    // other threads will wait here
    all_files_analysis_mutex.lock();
    // now we can update all_files_analysis safely
    // it is the first file
    if (layer_name_to_all_files_analysis.empty()) {
        layer_name_to_all_files_analysis = layer_name_to_one_file_analysis;
    } else {
        update(layer_name_to_all_files_analysis, layer_name_to_one_file_analysis);
    }
//    // now release the lock
    all_files_analysis_mutex.unlock();
}

int main() {
    parallel_for_each(all_inputs.begin(), all_inputs.end(), [&] (string input_file) {
        try {
            auto layer_name_to_one_file_analysis = analyze_one_file(input_file);
            safe_analysis_update(layer_name_to_one_file_analysis);
        } catch (const std::exception& e) {
            cerr << e.what() << endl;
            return 1;
        }

    });
}

Leave a Reply

Your email address will not be published. Required fields are marked *