Skip to content

Conversation

@rraminen
Copy link

@rraminen rraminen commented Feb 12, 2025

This PR contains the below two changes

  1. forceinline needs inline and always_inline on ROCm
  2. extra_include_paths is required for the hipification of quant_cuda header files.

@rraminen rraminen changed the title __forceinline__ needs inline and always_inline on ROCm Enablement on ROCm Feb 14, 2025
@rraminen rraminen marked this pull request as ready for review February 14, 2025 22:32
@rraminen
Copy link
Author

rraminen commented Mar 7, 2025

Hi @Tiiiger, could you please review this PR?

@k-artem
Copy link

k-artem commented Jul 22, 2025

hi @rraminen , +1 for this PR, I have one proposal, instead of adding definition of __forceinline__ we can add

diff --git a/qtorch/quant/quant_cuda/bit_helper.cu b/qtorch/quant/quant_cuda/bit_helper.cu
index 794255f..c741d58 100644
--- a/qtorch/quant/quant_cuda/bit_helper.cu
+++ b/qtorch/quant/quant_cuda/bit_helper.cu
@@ -1,3 +1,5 @@
+#include <cuda.h>
+
 #define FLOAT_TO_BITS(x) (*reinterpret_cast<unsigned int*>(x))
 #define BITS_TO_FLOAT(x) (*reinterpret_cast<float*>(x))

diff --git a/qtorch/quant/quant_cuda/sim_helper.cu b/qtorch/quant/quant_cuda/sim_helper.cu
index d165793..5a81493 100644
--- a/qtorch/quant/quant_cuda/sim_helper.cu
+++ b/qtorch/quant/quant_cuda/sim_helper.cu
@@ -1,3 +1,4 @@
+#include <cuda.h>
 #include "quant_kernel.h"
 #include <cmath>

in order to use this definition from hip library.

@k-artem
Copy link

k-artem commented Jul 22, 2025

hi @stevenygd @Tiiiger could you please look at this PR? Thanks in advance.

sfc-gh-truwase added a commit to deepspeedai/DeepSpeed that referenced this pull request Dec 10, 2025
The patch delivers several fixes for building issues for CUDA part of
DeepSpeed library.
Percentage of passed unit tests improved(tested on RDNA hardware,
gfx110x and gfx12x) Before:
collected 5298 items / 15 skipped
2773 failed, 862 passed, 1665 skipped, 13 errors
After:
collected 5851 items / 11 skipped
4187 failed, 1373 passed, 292 skipped, 10 errors

Regarding testing of **fp_quantizer(DS_BUILD_FP_QUANTIZER)** via
`tests/unit/ops/fp_quantizer/test_fp_quant.py`, this test depends on
QPyTorch which should be patched before run on AMD, please apply
Tiiiger/QPyTorch#71

---------

Signed-off-by: Artem Kuzmitckii <artem.kuzmitckii@amd.com>
Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants