Enablement on ROCm #71

rraminen · 2025-02-12T22:52:09Z

This PR contains the below two changes

forceinline needs inline and always_inline on ROCm
extra_include_paths is required for the hipification of quant_cuda header files.

…ader files.

rraminen · 2025-03-07T02:56:15Z

Hi @Tiiiger, could you please review this PR?

k-artem · 2025-07-22T13:17:34Z

hi @rraminen , +1 for this PR, I have one proposal, instead of adding definition of __forceinline__ we can add

diff --git a/qtorch/quant/quant_cuda/bit_helper.cu b/qtorch/quant/quant_cuda/bit_helper.cu
index 794255f..c741d58 100644
--- a/qtorch/quant/quant_cuda/bit_helper.cu
+++ b/qtorch/quant/quant_cuda/bit_helper.cu
@@ -1,3 +1,5 @@
+#include <cuda.h>
+
 #define FLOAT_TO_BITS(x) (*reinterpret_cast<unsigned int*>(x))
 #define BITS_TO_FLOAT(x) (*reinterpret_cast<float*>(x))

diff --git a/qtorch/quant/quant_cuda/sim_helper.cu b/qtorch/quant/quant_cuda/sim_helper.cu
index d165793..5a81493 100644
--- a/qtorch/quant/quant_cuda/sim_helper.cu
+++ b/qtorch/quant/quant_cuda/sim_helper.cu
@@ -1,3 +1,4 @@
+#include <cuda.h>
 #include "quant_kernel.h"
 #include <cmath>

in order to use this definition from hip library.

k-artem · 2025-07-22T13:35:05Z

hi @stevenygd @Tiiiger could you please look at this PR? Thanks in advance.

The patch delivers several fixes for building issues for CUDA part of DeepSpeed library. Percentage of passed unit tests improved(tested on RDNA hardware, gfx110x and gfx12x) Before: collected 5298 items / 15 skipped 2773 failed, 862 passed, 1665 skipped, 13 errors After: collected 5851 items / 11 skipped 4187 failed, 1373 passed, 292 skipped, 10 errors Regarding testing of **fp_quantizer(DS_BUILD_FP_QUANTIZER)** via `tests/unit/ops/fp_quantizer/test_fp_quant.py`, this test depends on QPyTorch which should be patched before run on AMD, please apply Tiiiger/QPyTorch#71 --------- Signed-off-by: Artem Kuzmitckii <artem.kuzmitckii@amd.com> Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>

rraminen added 3 commits February 12, 2025 22:47

__forceinline__ needs inline and always_inline on ROCm

271a33a

ifdef __HIP__

3799e5d

extra_include_paths is required for the hipification of quant_cuda he…

c542f51

…ader files.

rraminen changed the title ~~__forceinline__ needs inline and always_inline on ROCm~~ Enablement on ROCm Feb 14, 2025

rraminen marked this pull request as ready for review February 14, 2025 22:32

k-artem mentioned this pull request Aug 18, 2025

[AMD][ROCm] Improve support of AMD deepspeedai/DeepSpeed#7448

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enablement on ROCm #71

Enablement on ROCm #71

Uh oh!

rraminen commented Feb 12, 2025 •

edited

Loading

Uh oh!

rraminen commented Mar 7, 2025

Uh oh!

k-artem commented Jul 22, 2025 •

edited

Loading

Uh oh!

k-artem commented Jul 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Enablement on ROCm #71

Are you sure you want to change the base?

Enablement on ROCm #71

Uh oh!

Conversation

rraminen commented Feb 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rraminen commented Mar 7, 2025

Uh oh!

k-artem commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k-artem commented Jul 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rraminen commented Feb 12, 2025 •

edited

Loading

k-artem commented Jul 22, 2025 •

edited

Loading