[AMD][ROCm] Improve support of AMD #7448

k-artem · 2025-07-24T11:45:28Z

The patch delivers several fixes for building issues for CUDA part of DeepSpeed library.
Percentage of passed unit tests improved(tested on RDNA hardware, gfx110x and gfx12x) Before:
collected 5298 items / 15 skipped
2773 failed, 862 passed, 1665 skipped, 13 errors
After:
collected 5851 items / 11 skipped
4187 failed, 1373 passed, 292 skipped, 10 errors

Regarding testing of fp_quantizer(DS_BUILD_FP_QUANTIZER) via tests/unit/ops/fp_quantizer/test_fp_quant.py, this test depends on QPyTorch which should be patched before run on AMD, please apply Tiiiger/QPyTorch#71

deepspeed/inference/v2/kernels/cutlass_ops/mixed_gemm/mixed_gemm.cu

csrc/includes/reduction_utils.h

k-artem · 2025-07-31T15:20:38Z

@hwchen2017 kindly ask for review after fixed your comments.

csrc/deepspeed4science/evoformer_attn/gemm_kernel_utils.h

csrc/includes/reduction_utils.h

deepspeed/inference/v2/kernels/core_ops/cuda_linear/include/utils_paralleldequant.cuh

The patch delivers several fixes for building issues for CUDA part of DeepSpeed library. Percentage of passed unit tests improved(tested on RDNA hardware, gfx110x and gfx12x) Before: collected 5298 items / 15 skipped 2773 failed, 862 passed, 1665 skipped, 13 errors After: collected 5851 items / 11 skipped 4187 failed, 1373 passed, 292 skipped, 10 errors Signed-off-by: Artem Kuzmitckii <artem.kuzmitckii@amd.com>

Signed-off-by: Artem Kuzmitckii <artem.kuzmitckii@amd.com>

part 2 Signed-off-by: Artem Kuzmitckii <artem.kuzmitckii@amd.com>

Signed-off-by: Artem Kuzmitckii <artem.kuzmitckii@amd.com>

loadams · 2025-09-02T23:37:23Z

@k-artem - is this ready for final review? @hwchen2017 - any remaining review requests?

k-artem · 2025-11-12T11:40:35Z

@loadams could you please help with continue review?

jithunnair-amd · 2025-12-01T21:49:25Z

@tjruwase @loadams Can you please help move this PR forward? I believe we have addressed all review comments. This PR significantly improves DeepSpeed functionality on AMD hardware.

Also, we discussed this a while ago, but I don't think we moved forward on it: how do we remove the DeepSpeed dependency on this inactive repo?:

Regarding testing of fp_quantizer(DS_BUILD_FP_QUANTIZER) via tests/unit/ops/fp_quantizer/test_fp_quant.py, this test depends on QPyTorch which should be patched before run on AMD, please apply Tiiiger/QPyTorch#71

Related issue: #7216

sfc-gh-truwase · 2025-12-09T14:00:18Z

@tjruwase @loadams Can you please help move this PR forward? I believe we have addressed all review comments. This PR significantly improves DeepSpeed functionality on AMD hardware.

@jithunnair-amd, yes I will focus on this PR.

sfc-gh-truwase · 2025-12-09T14:02:20Z

Also, we discussed this a while ago, but I don't think we moved forward on it: how do we remove the DeepSpeed dependency on this inactive repo?:

Apologies for this question hanging for so long. Since so much has changed over the past months, I think it might be worth having a chat on this.

jithunnair-amd · 2025-12-09T14:07:49Z

Also, we discussed this a while ago, but I don't think we moved forward on it: how do we remove the DeepSpeed dependency on this inactive repo?:

Apologies for this question hanging for so long. Since so much has changed over the past months, I think it might be worth having a chat on this.

Sure, would you like to discuss here, or on a different platform eg. email? The gist of it is that we aren't aware of any alternatives for QPyTorch, so creating a deepspeed fork is the next best option to make updates to it. Currently, this lib is only used in unit tests (test_quantized_linear_module.py and test_fp_quant.py).

sfc-gh-truwase · 2025-12-09T15:28:56Z

alternatives for QPyTorch, so creating a deepspeed fork is the next best option to make updates to it. Currently, this lib is only used in unit tests (test_quantized_linear_module.py and test_fp_quant.py).

Got it. Unfortunately, we lack bandwidth to maintain QPyTorch fork. Moreover, our roadmap is to streamline by deprecating features subject to bandwidth and community interests. Are you interested in maintaining such a fork?

sfc-gh-truwase · 2025-12-10T13:22:33Z

@k-artem can you please address the formatting issues?

k-artem · 2025-12-10T17:30:00Z

@k-artem can you please address the formatting issues?

hi @sfc-gh-truwase , I checked it, actually it looks like a CI issue

yapf.....................................................................Failed
- hook id: yapf
- exit code: 1

Traceback (most recent call last):
  File "/home/runner/.cache/pre-commit/repoi51ipx2f/py_env-python3.10/bin/yapf", line 3, in <module>
    from yapf import run_main
  File "/home/runner/.cache/pre-commit/repoi51ipx2f/py_env-python3.10/lib/python3.10/site-packages/yapf/__init__.py", line 41, in <module>
    from yapf.yapflib import yapf_api
  File "/home/runner/.cache/pre-commit/repoi51ipx2f/py_env-python3.10/lib/python3.10/site-packages/yapf/yapflib/yapf_api.py", line 39, in <module>
    from yapf.pyparser import pyparser
  File "/home/runner/.cache/pre-commit/repoi51ipx2f/py_env-python3.10/lib/python3.10/site-packages/yapf/pyparser/pyparser.py", line 44, in <module>
    from yapf.yapflib import format_token
  File "/home/runner/.cache/pre-commit/repoi51ipx2f/py_env-python3.10/lib/python3.10/site-packages/yapf/yapflib/format_token.py", line 23, in <module>
    from yapf.pytree import pytree_utils
  File "/home/runner/.cache/pre-commit/repoi51ipx2f/py_env-python3.10/lib/python3.10/site-packages/yapf/pytree/pytree_utils.py", line 30, in <module>
    from yapf_third_party._ylib2to3 import pygram
  File "/home/runner/.cache/pre-commit/repoi51ipx2f/py_env-python3.10/lib/python3.10/site-packages/yapf_third_party/_ylib2to3/pygram.py", line 39, in <module>
    pattern_grammar = driver.load_grammar(_PATTERN_GRAMMAR_FILE)
  File "/home/runner/.cache/pre-commit/repoi51ipx2f/py_env-python3.10/lib/python3.10/site-packages/yapf_third_party/_ylib2to3/pgen2/driver.py", line 252, in load_grammar
    g.load(gp)
  File "/home/runner/.cache/pre-commit/repoi51ipx2f/py_env-python3.10/lib/python3.10/site-packages/yapf_third_party/_ylib2to3/pgen2/grammar.py", line 95, in load
    d = pickle.load(f)
EOFError: Ran out of input

…extensioni Details deepspeedai#7448 (comment) Signed-off-by: Artem Kuzmitckii <artem.kuzmitckii@amd.com>

k-artem requested review from hwchen2017, tjruwase and tohtana as code owners July 24, 2025 11:45

hwchen2017 reviewed Jul 24, 2025

View reviewed changes

deepspeed/inference/v2/kernels/cutlass_ops/mixed_gemm/mixed_gemm.cu Outdated Show resolved Hide resolved

hwchen2017 reviewed Jul 24, 2025

View reviewed changes

csrc/includes/reduction_utils.h Outdated Show resolved Hide resolved

k-artem requested a review from hwchen2017 July 25, 2025 16:01

k-artem force-pushed the improve_support_of_amd_hardware branch from 5851003 to 1dc6bb7 Compare July 31, 2025 10:25

hwchen2017 reviewed Aug 1, 2025

View reviewed changes

hwchen2017 marked this pull request as draft August 1, 2025 23:21

k-artem force-pushed the improve_support_of_amd_hardware branch from 09b1953 to f2dbbb7 Compare August 3, 2025 15:11

k-artem added 3 commits August 3, 2025 15:15

[AMD][ROCm] Fixes review comments

4490ea5

Signed-off-by: Artem Kuzmitckii <artem.kuzmitckii@amd.com>

[AMD][ROCm] Fixes review comments

77a7e06

part 2 Signed-off-by: Artem Kuzmitckii <artem.kuzmitckii@amd.com>

k-artem force-pushed the improve_support_of_amd_hardware branch from f2dbbb7 to 77a7e06 Compare August 3, 2025 15:18

k-artem marked this pull request as ready for review August 3, 2025 15:19

k-artem requested a review from hwchen2017 August 3, 2025 15:19

Merge branch 'master' into improve_support_of_amd_hardware

110d6dd

k-artem requested review from jomayeri and loadams as code owners August 18, 2025 17:21

[AMD][ROCm] Enable BF16 and fixes review's comment

0946828

Signed-off-by: Artem Kuzmitckii <artem.kuzmitckii@amd.com>

k-artem force-pushed the improve_support_of_amd_hardware branch from 45a01df to 0946828 Compare August 18, 2025 17:22

sfc-gh-truwase and others added 7 commits August 18, 2025 20:12

Merge branch 'master' into improve_support_of_amd_hardware

c75a4b4

Merge branch 'master' into improve_support_of_amd_hardware

f9934bb

Merge branch 'master' into improve_support_of_amd_hardware

2d16fb1

Merge branch 'master' into improve_support_of_amd_hardware

47cb5cc

[AMD][ROCm] Fix format

a23815a

Signed-off-by: Artem Kuzmitckii <artem.kuzmitckii@amd.com>

Merge branch 'master' into improve_support_of_amd_hardware

234920e

Merge branch 'master' into improve_support_of_amd_hardware

4eade1e

k-artem and others added 6 commits October 29, 2025 15:26

Merge branch 'master' into improve_support_of_amd_hardware

8e99074

Merge branch 'master' into improve_support_of_amd_hardware

5c42d05

Merge branch 'master' into improve_support_of_amd_hardware

9a796b1

Merge branch 'master' into improve_support_of_amd_hardware

d720af4

Merge branch 'master' into improve_support_of_amd_hardware

f55369a

Merge branch 'master' into improve_support_of_amd_hardware

d2714cd

k-artem added 3 commits November 17, 2025 10:43

Merge branch 'master' into improve_support_of_amd_hardware

3882cb2

Merge branch 'master' into improve_support_of_amd_hardware

3024bf2

Merge branch 'master' into improve_support_of_amd_hardware

8546ea9

k-artem and others added 2 commits December 8, 2025 10:27

Merge branch 'master' into improve_support_of_amd_hardware

ab6dfb8

Merge branch 'master' into improve_support_of_amd_hardware

6042428

sfc-gh-truwase removed request for hwchen2017 and jomayeri December 9, 2025 14:07

Merge branch 'master' into improve_support_of_amd_hardware

7d813f0

sfc-gh-truwase approved these changes Dec 9, 2025

View reviewed changes

sfc-gh-truwase added 2 commits December 9, 2025 19:00

Merge branch 'master' into improve_support_of_amd_hardware

8fec3e5

Merge branch 'master' into improve_support_of_amd_hardware

05690ff

sfc-gh-truwase enabled auto-merge (squash) December 10, 2025 00:47

sfc-gh-truwase merged commit b00b75f into deepspeedai:master Dec 10, 2025
12 of 13 checks passed

k-artem added a commit to k-artem/DeepSpeed that referenced this pull request Dec 16, 2025

[Not_for_merge][AMD][Workaround] Enable bf16 for transform_inference …

eb5e33c

…extensioni Details deepspeedai#7448 (comment) Signed-off-by: Artem Kuzmitckii <artem.kuzmitckii@amd.com>

k-artem mentioned this pull request Dec 16, 2025

[Not_for_merge][AMD][Workaround] Enable bf16 for transform_inference extension k-artem/DeepSpeed#1

Open

k-artem deleted the improve_support_of_amd_hardware branch December 22, 2025 14:25

[AMD][ROCm] Improve support of AMD #7448

[AMD][ROCm] Improve support of AMD #7448

Conversation

k-artem commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

k-artem commented Jul 31, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

loadams commented Sep 2, 2025

Uh oh!

k-artem commented Nov 12, 2025

Uh oh!

jithunnair-amd commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sfc-gh-truwase commented Dec 9, 2025

Uh oh!

sfc-gh-truwase commented Dec 9, 2025

Uh oh!

jithunnair-amd commented Dec 9, 2025

Uh oh!

sfc-gh-truwase commented Dec 9, 2025

Uh oh!

sfc-gh-truwase commented Dec 10, 2025

Uh oh!

k-artem commented Dec 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

k-artem commented Jul 24, 2025 •

edited

Loading

jithunnair-amd commented Dec 1, 2025 •

edited

Loading