Bruhnspace is engaged in research on optimized compute on embedded devices. This include core software stacks like the AMD ROCm high performance compute package for AMD APUs. In addition of big interest is computer vision and machine learning packages like OpenCV and TensorFlow lite. Our work is usually published in peer reviewed scientific journals. Our last co-authored and co-sponsored paper on artificial computing in space can be read here.
In this project or team have re-enabled support for AMD clBLAS and clFFT in OpenCV 4.1.2. Rather, resurrected support since its been broken since 2014 when AMD open sourced the closed source clAmdBlas and clAmdFft libraries and renamed both the libraries and the associated symbols. Dr. Harris Gasparakis from AMD wrote a paper about OpenCV support back in 2013, http://developer.amd.com/wordpress/media/2013/07/opencv-cl_instructions-246.pdf
Update
We are glad to see that our work was picked up and carried forward by Mr Joe Howse and merged into OpenCV, version 4.5.3. Checkout the merge request here.
Problem description
The problem description is rather simple, OpenCV does not recognize the AMD BLAS and AMD FFT functions, even if the OpenCV build process detects them and enables support.
Checking OpenCV support in Linux (Ubuntu 18.04 as an example for reference) quickly demonstrate that the libraries are not detected, and have not for a long time for any operating system (OS) distribution using the new libraries clFFT and clBLAS. In this case the OpenCV OpenCL test is performed on a Ubuntu 18.04 machine with MESA/libclc/LLVM backend.
$ opencv_version --opencl 4.1.2 OpenCL Platforms: Clover iGPU: AMD KABINI (DRM 3.35.0, 5.4.44-050444-generic, LLVM 10.0.1) (OpenCL 1.2 Mesa 20.1.1 (git-127c2be9c5)) Current OpenCL device: Type = iGPU Name = AMD KABINI (DRM 3.35.0, 5.4.44-050444-generic, LLVM 10.0.1) Version = OpenCL 1.2 Mesa 20.1.1 (git-127c2be9c5) Driver version = 20.1.1 Address bits = 64 Compute units = 2 Max work group size = 256 Local memory size = 32 KB Max memory allocation size = 901 MB 477 KB 204 B Double support = Yes Host unified memory = Yes Device extensions: cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp64 cl_khr_fp16 Has AMD Blas = No Has AMD Fft = No Preferred vector width char = 16 Preferred vector width short = 8 Preferred vector width int = 4 Preferred vector width long = 2 Preferred vector width float = 4 Preferred vector width double = 2
Analysis
A quick analysis between the old clAmdBlas and clAmdFft libraries and the new clBLAS and clFFT libraries show a change in symbol names. Hence, OpenCV or any other software that have made use of the old libraries must hence be upgraded to understand the new symbols correspondingly.
BLAS clamdBlasSetup() -> clblasSetup() etc. FFT clAmdFftSetup() -> clfftSetup() etc.
Also the library names themselves have changed which if not reflected in the code make the libraries useless.
clAmdBlas library name = libclAmdBlas.sp clAmdFFT library name = libclAmdFft.Runtime.so but clBLAS library name = libclBLAS.so clFFT library name = libclFFT.so
Solution to make OpenCV use the new clBLAS and clFFT
Our team searched in OpenCV source directory for the old symbols and library names and found that OpenCV have not been upgraded since 2014 to reflect the new AMD open source math libraries.
The file modules/core/src/opencl/runtime/opencl_clamdblas.cpp
contains the loading mechanism of the AMD BLAS library,
” h = dlopen(“libclAmdBlas.so”, RTLD_LAZY | RTLD_GLOBAL);”
similarly the file modules/core/src/opencl/runtime/opencl_clamdfft.cpp contains the loading mechanism of the AMD FFT library.
“h = dlopen(“libclAmdFft.Runtime.so”, RTLD_LAZY | RTLD_GLOBAL);”
These clearly must be changed. But, what about the symbol names?
The symbol definitions are found in file modules/core/src/opencl/runtime/autogenerated/opencl_clamdblas_impl.hpp and modules/core/src/opencl/runtime/autogenerated/opencl_clamdfft_impl.hpp respectively.
These must also be changed to reflect the new symbol names.
static const struct DynamicFnEntry clAmdBlasSetup_definition = { "clAmdBlasSetup", (void**)&clAmdBlasSetup}; This must be changed to reflect the new symbol names, static const struct DynamicFnEntry clAmdBlasSetup_definition = { "clblasSetup", (void**)&clAmdBlasSetup};
These changes must be done for all enabled features in OpenCV. Many of the libraries functions are not used.
Patches
Our team has put together patches that should work with almost any OpenCV version since this code has not changed since 2014. Please find two patches here for clBLAS and clFFT support in OpenCV.
Download patch for OpenCV clBLAS support.
Download patch for OpenCV clFFT support.
Results
Rebuilding OpenCV with support for clBLAS and clFFT in Ubuntu 18.04 shows that the libraries are now detected correctly again.
$ sudo apt install libclfft-dev libclblas-dev Rebuild OpenCV (in our case, 4.1.2) with-DWITH_OPENCL=ON \
-DWITH_OPENCLAMDBLAS=ON \
-DWITH_OPENCLAMDFFT=ON \
-DWITH_OPENCL_SVM=ON
(Optional on supported architectures, which we didn't use for this test with MESA Clover which does not support OpenCL 2.0 SV. _SVM=OFF in for this demo).
Gives this output.
$ opencv_version --opencl 4.1.2 OpenCL Platforms: Clover iGPU: AMD KABINI (DRM 3.35.0, 5.4.44-050444-generic, LLVM 10.0.1) (OpenCL 1.2 Mesa 20.1.1 (git-127c2be9c5)) Current OpenCL device: Type = iGPU Name = AMD KABINI (DRM 3.35.0, 5.4.44-050444-generic, LLVM 10.0.1) Version = OpenCL 1.2 Mesa 20.1.1 (git-127c2be9c5) Driver version = 20.1.1 Address bits = 64 Compute units = 2 Max work group size = 256 Local memory size = 32 KB Max memory allocation size = 901 MB 477 KB 204 B Double support = Yes Host unified memory = Yes Device extensions: cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp64 cl_khr_fp16 Has AMD Blas = Yes Has AMD Fft = Yes Preferred vector width char = 16 Preferred vector width short = 8 Preferred vector width int = 4 Preferred vector width long = 2 Preferred vector width float = 4 Preferred vector width double = 2
OpenCV 4.1.2 with AMD clBLAS and clFFT support enabled again.