Skip to content

Data type error when building SwiftTransformer for DistServe #5

@HalberdOfPineapple

Description

@HalberdOfPineapple

Hi all. I encountered some problems when building SwiftTransforemer as the dependency for DistServer (https://github.com/LLMServe/DistServe).

The versions of dependencies in my machine are like following:

gcc (Spack GCC) 12.3.0

(NVCC)
Cuda compilation tools, release 12.2, V12.2.128
Build cuda_12.2.r12.2/compiler.33053471_0

cmake version 3.27.7

And my machine is Ubuntu 20.04.1 LTS.

The installation instruction for DistServe includes the following commands:

git clone https://github.com/LLMServe/SwiftTransformer.git && cd SwiftTransformer && git submodule update --init --recursive
cmake -B build
cmake --build build -j$(nproc)

But when I execute cmake --build build -j$(nproc), it reports the following two errors caused by incorrect data types:
1):

[ 71%] Built target gmock_main
/scratch/work/liw10/DistServe/SwiftTransformer/src/unittest/util/cublas_wrapper.cc: In instantiation of 'void naiveGemmStridedBatched(cublasOperation_t, cublasOperation_t, int, int, int, T, const T*, long long int, const T*, long long int, T, T*, long long int, int) [with T = __half]':
/scratch/work/liw10/DistServe/SwiftTransformer/src/unittest/util/cublas_wrapper.cc:120:27:   required from 'void CublasWrapperTestSuite_gemmStridedBatched_Test<gtest_TypeParam_>::TestBody() [with gtest_TypeParam_ = __half]'
/scratch/work/liw10/DistServe/SwiftTransformer/src/unittest/util/cublas_wrapper.cc:70:1:   required from here
/scratch/work/liw10/DistServe/SwiftTransformer/src/unittest/util/cublas_wrapper.cc:50:64: error: ambiguous overload for 'operator*' (operand types are 'const __half' and 'float')
   50 |                 Carray[batch * stride_c + i * ldc + j] = alpha * sum + beta * Carray[batch * stride_c + i * ldc + j];
      |                                                          ~~~~~~^~~~~
/scratch/work/liw10/DistServe/SwiftTransformer/src/unittest/util/cublas_wrapper.cc:50:64: note: candidate: 'operator*(int, float)' (built-in)
/scratch/work/liw10/DistServe/SwiftTransformer/src/unittest/util/cublas_wrapper.cc:50:64: note: candidate: 'operator*(long long unsigned int, float)' (built-in)
/scratch/work/liw10/DistServe/SwiftTransformer/src/unittest/util/cublas_wrapper.cc:50:64: note: candidate: 'operator*(long long int, float)' (built-in)
/scratch/work/liw10/DistServe/SwiftTransformer/src/unittest/util/cublas_wrapper.cc:50:64: note: candidate: 'operator*(long unsigned int, float)' (built-in)
/scratch/work/liw10/DistServe/SwiftTransformer/src/unittest/util/cublas_wrapper.cc:50:64: note: candidate: 'operator*(long int, float)' (built-in)
/scratch/work/liw10/DistServe/SwiftTransformer/src/unittest/util/cublas_wrapper.cc:50:64: note: candidate: 'operator*(unsigned int, float)' (built-in)
/scratch/work/liw10/DistServe/SwiftTransformer/src/unittest/util/cublas_wrapper.cc:50:64: note: candidate: 'operator*(float, float)' (built-in)

This is caused by that the two operands alpha and sum cannot serve as compatible ones for operator *

2):

/scratch/work/liw10/DistServe/SwiftTransformer/src/unittest/util/../unittest_utils.h:93:45: error: call of overloaded 'fabs(__half)' is ambiguous
   93 |                                         fabs(answer[i]-reference[i]), fabs(answer[i]-reference[i])/fabs(reference[i]));
      |                                         ~~~~^~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/include/features.h:490,
                 from /appl/scibuilder-spack/aalto-rhel9-dev/2024-01-compilers/software/linux-rhel9-haswell/gcc-11.4.1/gcc-12.3.0-xh5vv5d/lib/gcc/x86_64-pc-linux-gnu/12.3.0/../../../../include/c++/12.3.0/x86_64-pc-linux-gnu/bits/os_defines.h:39,
                 from /appl/scibuilder-spack/aalto-rhel9-dev/2024-01-compilers/software/linux-rhel9-haswell/gcc-11.4.1/gcc-12.3.0-xh5vv5d/lib/gcc/x86_64-pc-linux-gnu/12.3.0/../../../../include/c++/12.3.0/x86_64-pc-linux-gnu/bits/c++config.h:655,
                 from /appl/scibuilder-spack/aalto-rhel9-dev/2024-01-compilers/software/linux-rhel9-haswell/gcc-11.4.1/gcc-12.3.0-xh5vv5d/lib/gcc/x86_64-pc-linux-gnu/12.3.0/../../../../include/c++/12.3.0/functional:48,
                 from /scratch/work/liw10/DistServe/SwiftTransformer/src/unittest/util/cublas_wrapper.cc:1:
/usr/include/bits/mathcalls.h:162:1: note: candidate: 'double fabs(double)'
  162 | __MATHCALLX (fabs,, (_Mdouble_ __x), (__const__));
      | ^~~~~~~~~~~
In file included from /appl/scibuilder-spack/aalto-rhel9-dev/2024-01-compilers/software/linux-rhel9-haswell/gcc-11.4.1/gcc-12.3.0-xh5vv5d/lib/gcc/x86_64-pc-linux-gnu/12.3.0/../../../../include/c++/12.3.0/random:38,
                 from /scratch/work/liw10/DistServe/SwiftTransformer/src/unittest/util/cublas_wrapper.cc:2:
/appl/scibuilder-spack/aalto-rhel9-dev/2024-01-compilers/software/linux-rhel9-haswell/gcc-11.4.1/gcc-12.3.0-xh5vv5d/lib/gcc/x86_64-pc-linux-gnu/12.3.0/../../../../include/c++/12.3.0/cmath:241:3: note: candidate: 'constexpr float std::fabs(float)'
  241 |   fabs(float __x)
      |   ^~~~
/appl/scibuilder-spack/aalto-rhel9-dev/2024-01-compilers/software/linux-rhel9-haswell/gcc-11.4.1/gcc-12.3.0-xh5vv5d/lib/gcc/x86_64-pc-linux-gnu/12.3.0/../../../../include/c++/12.3.0/cmath:245:3: note: candidate: 'constexpr long double std::fabs(long double)'
  245 |   fabs(long double __x)
      |   ^~~~
/scratch/work/liw10/DistServe/SwiftTransformer/src/unittest/util/../unittest_utils.h:93:75: error: call of overloaded 'fabs(__half)' is ambiguous
   93 |                                         fabs(answer[i]-reference[i]), fabs(answer[i]-reference[i])/fabs(reference[i]));
      |                                                                       ~~~~^~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/bits/mathcalls.h:162:1: note: candidate: 'double fabs(double)'
  162 | __MATHCALLX (fabs,, (_Mdouble_ __x), (__const__));
      | ^~~~~~~~~~~
/appl/scibuilder-spack/aalto-rhel9-dev/2024-01-compilers/software/linux-rhel9-haswell/gcc-11.4.1/gcc-12.3.0-xh5vv5d/lib/gcc/x86_64-pc-linux-gnu/12.3.0/../../../../include/c++/12.3.0/cmath:241:3: note: candidate: 'constexpr float std::fabs(float)'
  241 |   fabs(float __x)
      |   ^~~~
/appl/scibuilder-spack/aalto-rhel9-dev/2024-01-compilers/software/linux-rhel9-haswell/gcc-11.4.1/gcc-12.3.0-xh5vv5d/lib/gcc/x86_64-pc-linux-gnu/12.3.0/../../../../include/c++/12.3.0/cmath:245:3: note: candidate: 'constexpr long double std::fabs(long double)'
  245 |   fabs(long double __x)
      |   ^~~~

which is caused by that the value of answer[i]-reference[i] has type __half (more accurately template T as declared).

For these two problems, I made the following changes within my machine:

  1. Change line 49 in .../SwiftTransformer/src/unittest/util/cublas_wrapper.cc into:
Carray[batch * stride_c + i * ldc + j] = alpha * static_cast<T>(sum) + beta * Carray[batch * stride_c + i * ldc + j];
  1. Change line 93 in .../SwiftTransformer/src/unittest/util/../unittest_utils.h to be:
fabs(static_cast<float>(answer[i]-reference[i])), 
fabs(static_cast<float>(answer[i]-reference[i])) / fabs(static_cast<float>(reference[i])));

Now the build command can successfully be completed and the later pip install for DistServe can be done. But I am not sure whether such changes will result in extra overheads as changing __half to be float increases the bits required for the variable. (Sorry for my naiveness in this field).
If my modifications are reasonable, I can submit a new push request for that.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions