Bruhnspace OpenCV optimization project

Bruhnspace is engaged in research on optimized compute on embedded devices. This include core software stacks like the AMD ROCm high performance compute package for AMD APUs. In addition of big interest is computer vision and machine learning packages like OpenCV and TensorFlow lite. Our work is usually published in peer reviewed scientific journals. Our last co-authored and co-sponsored paper on artificial computing in space can be read here.

In this project or team have re-enabled support for AMD clBLAS and clFFT in OpenCV 4.1.2. Rather, resurrected support since its been broken since 2014 when AMD open sourced the closed source clAmdBlas and clAmdFft libraries and renamed both the libraries and the associated symbols. Dr. Harris Gasparakis from AMD wrote a paper about OpenCV support back in 2013, http://developer.amd.com/wordpress/media/2013/07/opencv-cl_instructions-246.pdf

Update

We are glad to see that our work was picked up and carried forward by Mr Joe Howse and merged into OpenCV, version 4.5.3. Checkout the merge request here.

Problem description

The problem description is rather simple, OpenCV does not recognize the AMD BLAS and AMD FFT functions, even if the OpenCV build process detects them and enables support.

Checking OpenCV support in Linux (Ubuntu 18.04 as an example for reference) quickly demonstrate that the libraries are not detected, and have not for a long time for any operating system (OS) distribution using the new libraries clFFT and clBLAS. In this case the OpenCV OpenCL test is performed on a Ubuntu 18.04 machine with MESA/libclc/LLVM backend.

$ opencv_version --opencl
4.1.2
OpenCL Platforms:
    Clover
        iGPU: AMD KABINI (DRM 3.35.0, 5.4.44-050444-generic, LLVM 10.0.1) (OpenCL 1.2 Mesa 20.1.1 (git-127c2be9c5))
Current OpenCL device:
    Type = iGPU
    Name = AMD KABINI (DRM 3.35.0, 5.4.44-050444-generic, LLVM 10.0.1)
    Version = OpenCL 1.2 Mesa 20.1.1 (git-127c2be9c5)
    Driver version = 20.1.1
    Address bits = 64
    Compute units = 2
    Max work group size = 256
    Local memory size = 32 KB
    Max memory allocation size = 901 MB 477 KB 204 B
    Double support = Yes
    Host unified memory = Yes
    Device extensions:
        cl_khr_byte_addressable_store
        cl_khr_global_int32_base_atomics
        cl_khr_global_int32_extended_atomics
        cl_khr_local_int32_base_atomics
        cl_khr_local_int32_extended_atomics
        cl_khr_int64_base_atomics
        cl_khr_int64_extended_atomics
        cl_khr_fp64
        cl_khr_fp16
    Has AMD Blas = No
    Has AMD Fft = No
    Preferred vector width char = 16
    Preferred vector width short = 8
    Preferred vector width int = 4
    Preferred vector width long = 2
    Preferred vector width float = 4
    Preferred vector width double = 2

Analysis

A quick analysis between the old clAmdBlas and clAmdFft libraries and the new clBLAS and clFFT libraries show a change in symbol names. Hence, OpenCV or any other software that have made use of the old libraries must hence be upgraded to understand the new symbols correspondingly.

BLAS
clamdBlasSetup() -> clblasSetup()
etc.

FFT
clAmdFftSetup() -> clfftSetup()
etc.

Also the library names themselves have changed which if not reflected in the code make the libraries useless.

clAmdBlas library name = libclAmdBlas.sp
clAmdFFT library name = libclAmdFft.Runtime.so

but

clBLAS library name = libclBLAS.so
clFFT library name = libclFFT.so

Solution to make OpenCV use the new clBLAS and clFFT

Our team searched in OpenCV source directory for the old symbols and library names and found that OpenCV have not been upgraded since 2014 to reflect the new AMD open source math libraries.

The file modules/core/src/opencl/runtime/opencl_clamdblas.cpp contains the loading mechanism of the AMD BLAS library,

” h = dlopen(“libclAmdBlas.so”, RTLD_LAZY | RTLD_GLOBAL);”

similarly the file modules/core/src/opencl/runtime/opencl_clamdfft.cpp contains the loading mechanism of the AMD FFT library.

“h = dlopen(“libclAmdFft.Runtime.so”, RTLD_LAZY | RTLD_GLOBAL);”

These clearly must be changed. But, what about the symbol names?

The symbol definitions are found in file modules/core/src/opencl/runtime/autogenerated/opencl_clamdblas_impl.hpp and modules/core/src/opencl/runtime/autogenerated/opencl_clamdfft_impl.hpp respectively.

These must also be changed to reflect the new symbol names.

static const struct DynamicFnEntry clAmdBlasSetup_definition = { "clAmdBlasSetup", (void**)&clAmdBlasSetup};

This must be changed to reflect the new symbol names,
static const struct DynamicFnEntry clAmdBlasSetup_definition = { "clblasSetup", (void**)&clAmdBlasSetup};

These changes must be done for all enabled features in OpenCV. Many of the libraries functions are not used.

Patches

Our team has put together patches that should work with almost any OpenCV version since this code has not changed since 2014. Please find two patches here for clBLAS and clFFT support in OpenCV.

Download patch for OpenCV clBLAS support.

Download patch for OpenCV clFFT support.

Results

Rebuilding OpenCV with support for clBLAS and clFFT in Ubuntu 18.04 shows that the libraries are now detected correctly again.

$ sudo apt install libclfft-dev libclblas-dev

Rebuild OpenCV (in our case, 4.1.2) with 
-DWITH_OPENCL=ON \ 
-DWITH_OPENCLAMDBLAS=ON \ 
-DWITH_OPENCLAMDFFT=ON \
-DWITH_OPENCL_SVM=ON (Optional on supported architectures, which we didn't use for this test with MESA Clover which does not support OpenCL 2.0 SV. _SVM=OFF in for this demo).

Gives this output.

$ opencv_version --opencl
4.1.2
OpenCL Platforms:
    Clover
        iGPU: AMD KABINI (DRM 3.35.0, 5.4.44-050444-generic, LLVM 10.0.1) (OpenCL 1.2 Mesa 20.1.1 (git-127c2be9c5))
Current OpenCL device:
    Type = iGPU
    Name = AMD KABINI (DRM 3.35.0, 5.4.44-050444-generic, LLVM 10.0.1)
    Version = OpenCL 1.2 Mesa 20.1.1 (git-127c2be9c5)
    Driver version = 20.1.1
    Address bits = 64
    Compute units = 2
    Max work group size = 256
    Local memory size = 32 KB
    Max memory allocation size = 901 MB 477 KB 204 B
    Double support = Yes
    Host unified memory = Yes
    Device extensions:
        cl_khr_byte_addressable_store
        cl_khr_global_int32_base_atomics
        cl_khr_global_int32_extended_atomics
        cl_khr_local_int32_base_atomics
        cl_khr_local_int32_extended_atomics
        cl_khr_int64_base_atomics
        cl_khr_int64_extended_atomics
        cl_khr_fp64
        cl_khr_fp16
    Has AMD Blas = Yes
    Has AMD Fft = Yes
    Preferred vector width char = 16
    Preferred vector width short = 8
    Preferred vector width int = 4
    Preferred vector width long = 2
    Preferred vector width float = 4
    Preferred vector width double = 2

OpenCV 4.1.2 with AMD clBLAS and clFFT support enabled again.