-
Notifications
You must be signed in to change notification settings - Fork 17
Description
Hi all. I encountered some problems when building SwiftTransforemer as the dependency for DistServer (https://github.com/LLMServe/DistServe).
The versions of dependencies in my machine are like following:
gcc (Spack GCC) 12.3.0
(NVCC)
Cuda compilation tools, release 12.2, V12.2.128
Build cuda_12.2.r12.2/compiler.33053471_0
cmake version 3.27.7
And my machine is Ubuntu 20.04.1 LTS.
The installation instruction for DistServe includes the following commands:
git clone https://github.com/LLMServe/SwiftTransformer.git && cd SwiftTransformer && git submodule update --init --recursive
cmake -B build
cmake --build build -j$(nproc)But when I execute cmake --build build -j$(nproc), it reports the following two errors caused by incorrect data types:
1):
[ 71%] Built target gmock_main
/scratch/work/liw10/DistServe/SwiftTransformer/src/unittest/util/cublas_wrapper.cc: In instantiation of 'void naiveGemmStridedBatched(cublasOperation_t, cublasOperation_t, int, int, int, T, const T*, long long int, const T*, long long int, T, T*, long long int, int) [with T = __half]':
/scratch/work/liw10/DistServe/SwiftTransformer/src/unittest/util/cublas_wrapper.cc:120:27: required from 'void CublasWrapperTestSuite_gemmStridedBatched_Test<gtest_TypeParam_>::TestBody() [with gtest_TypeParam_ = __half]'
/scratch/work/liw10/DistServe/SwiftTransformer/src/unittest/util/cublas_wrapper.cc:70:1: required from here
/scratch/work/liw10/DistServe/SwiftTransformer/src/unittest/util/cublas_wrapper.cc:50:64: error: ambiguous overload for 'operator*' (operand types are 'const __half' and 'float')
50 | Carray[batch * stride_c + i * ldc + j] = alpha * sum + beta * Carray[batch * stride_c + i * ldc + j];
| ~~~~~~^~~~~
/scratch/work/liw10/DistServe/SwiftTransformer/src/unittest/util/cublas_wrapper.cc:50:64: note: candidate: 'operator*(int, float)' (built-in)
/scratch/work/liw10/DistServe/SwiftTransformer/src/unittest/util/cublas_wrapper.cc:50:64: note: candidate: 'operator*(long long unsigned int, float)' (built-in)
/scratch/work/liw10/DistServe/SwiftTransformer/src/unittest/util/cublas_wrapper.cc:50:64: note: candidate: 'operator*(long long int, float)' (built-in)
/scratch/work/liw10/DistServe/SwiftTransformer/src/unittest/util/cublas_wrapper.cc:50:64: note: candidate: 'operator*(long unsigned int, float)' (built-in)
/scratch/work/liw10/DistServe/SwiftTransformer/src/unittest/util/cublas_wrapper.cc:50:64: note: candidate: 'operator*(long int, float)' (built-in)
/scratch/work/liw10/DistServe/SwiftTransformer/src/unittest/util/cublas_wrapper.cc:50:64: note: candidate: 'operator*(unsigned int, float)' (built-in)
/scratch/work/liw10/DistServe/SwiftTransformer/src/unittest/util/cublas_wrapper.cc:50:64: note: candidate: 'operator*(float, float)' (built-in)
This is caused by that the two operands alpha and sum cannot serve as compatible ones for operator *
2):
/scratch/work/liw10/DistServe/SwiftTransformer/src/unittest/util/../unittest_utils.h:93:45: error: call of overloaded 'fabs(__half)' is ambiguous
93 | fabs(answer[i]-reference[i]), fabs(answer[i]-reference[i])/fabs(reference[i]));
| ~~~~^~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/include/features.h:490,
from /appl/scibuilder-spack/aalto-rhel9-dev/2024-01-compilers/software/linux-rhel9-haswell/gcc-11.4.1/gcc-12.3.0-xh5vv5d/lib/gcc/x86_64-pc-linux-gnu/12.3.0/../../../../include/c++/12.3.0/x86_64-pc-linux-gnu/bits/os_defines.h:39,
from /appl/scibuilder-spack/aalto-rhel9-dev/2024-01-compilers/software/linux-rhel9-haswell/gcc-11.4.1/gcc-12.3.0-xh5vv5d/lib/gcc/x86_64-pc-linux-gnu/12.3.0/../../../../include/c++/12.3.0/x86_64-pc-linux-gnu/bits/c++config.h:655,
from /appl/scibuilder-spack/aalto-rhel9-dev/2024-01-compilers/software/linux-rhel9-haswell/gcc-11.4.1/gcc-12.3.0-xh5vv5d/lib/gcc/x86_64-pc-linux-gnu/12.3.0/../../../../include/c++/12.3.0/functional:48,
from /scratch/work/liw10/DistServe/SwiftTransformer/src/unittest/util/cublas_wrapper.cc:1:
/usr/include/bits/mathcalls.h:162:1: note: candidate: 'double fabs(double)'
162 | __MATHCALLX (fabs,, (_Mdouble_ __x), (__const__));
| ^~~~~~~~~~~
In file included from /appl/scibuilder-spack/aalto-rhel9-dev/2024-01-compilers/software/linux-rhel9-haswell/gcc-11.4.1/gcc-12.3.0-xh5vv5d/lib/gcc/x86_64-pc-linux-gnu/12.3.0/../../../../include/c++/12.3.0/random:38,
from /scratch/work/liw10/DistServe/SwiftTransformer/src/unittest/util/cublas_wrapper.cc:2:
/appl/scibuilder-spack/aalto-rhel9-dev/2024-01-compilers/software/linux-rhel9-haswell/gcc-11.4.1/gcc-12.3.0-xh5vv5d/lib/gcc/x86_64-pc-linux-gnu/12.3.0/../../../../include/c++/12.3.0/cmath:241:3: note: candidate: 'constexpr float std::fabs(float)'
241 | fabs(float __x)
| ^~~~
/appl/scibuilder-spack/aalto-rhel9-dev/2024-01-compilers/software/linux-rhel9-haswell/gcc-11.4.1/gcc-12.3.0-xh5vv5d/lib/gcc/x86_64-pc-linux-gnu/12.3.0/../../../../include/c++/12.3.0/cmath:245:3: note: candidate: 'constexpr long double std::fabs(long double)'
245 | fabs(long double __x)
| ^~~~
/scratch/work/liw10/DistServe/SwiftTransformer/src/unittest/util/../unittest_utils.h:93:75: error: call of overloaded 'fabs(__half)' is ambiguous
93 | fabs(answer[i]-reference[i]), fabs(answer[i]-reference[i])/fabs(reference[i]));
| ~~~~^~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/bits/mathcalls.h:162:1: note: candidate: 'double fabs(double)'
162 | __MATHCALLX (fabs,, (_Mdouble_ __x), (__const__));
| ^~~~~~~~~~~
/appl/scibuilder-spack/aalto-rhel9-dev/2024-01-compilers/software/linux-rhel9-haswell/gcc-11.4.1/gcc-12.3.0-xh5vv5d/lib/gcc/x86_64-pc-linux-gnu/12.3.0/../../../../include/c++/12.3.0/cmath:241:3: note: candidate: 'constexpr float std::fabs(float)'
241 | fabs(float __x)
| ^~~~
/appl/scibuilder-spack/aalto-rhel9-dev/2024-01-compilers/software/linux-rhel9-haswell/gcc-11.4.1/gcc-12.3.0-xh5vv5d/lib/gcc/x86_64-pc-linux-gnu/12.3.0/../../../../include/c++/12.3.0/cmath:245:3: note: candidate: 'constexpr long double std::fabs(long double)'
245 | fabs(long double __x)
| ^~~~
which is caused by that the value of answer[i]-reference[i] has type __half (more accurately template T as declared).
For these two problems, I made the following changes within my machine:
- Change line 49 in
.../SwiftTransformer/src/unittest/util/cublas_wrapper.ccinto:
Carray[batch * stride_c + i * ldc + j] = alpha * static_cast<T>(sum) + beta * Carray[batch * stride_c + i * ldc + j];- Change line 93 in
.../SwiftTransformer/src/unittest/util/../unittest_utils.hto be:
fabs(static_cast<float>(answer[i]-reference[i])),
fabs(static_cast<float>(answer[i]-reference[i])) / fabs(static_cast<float>(reference[i])));Now the build command can successfully be completed and the later pip install for DistServe can be done. But I am not sure whether such changes will result in extra overheads as changing __half to be float increases the bits required for the variable. (Sorry for my naiveness in this field).
If my modifications are reasonable, I can submit a new push request for that.