-
Notifications
You must be signed in to change notification settings - Fork 4.7k
[AMD][ROCm] Improve support of AMD #7448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AMD][ROCm] Improve support of AMD #7448
Conversation
deepspeed/inference/v2/kernels/cutlass_ops/mixed_gemm/mixed_gemm.cu
Outdated
Show resolved
Hide resolved
5851003 to
1dc6bb7
Compare
|
@hwchen2017 kindly ask for review after fixed your comments. |
deepspeed/inference/v2/kernels/core_ops/cuda_linear/include/utils_paralleldequant.cuh
Outdated
Show resolved
Hide resolved
09b1953 to
f2dbbb7
Compare
The patch delivers several fixes for building issues for CUDA part of DeepSpeed library. Percentage of passed unit tests improved(tested on RDNA hardware, gfx110x and gfx12x) Before: collected 5298 items / 15 skipped 2773 failed, 862 passed, 1665 skipped, 13 errors After: collected 5851 items / 11 skipped 4187 failed, 1373 passed, 292 skipped, 10 errors Signed-off-by: Artem Kuzmitckii <artem.kuzmitckii@amd.com>
Signed-off-by: Artem Kuzmitckii <artem.kuzmitckii@amd.com>
part 2 Signed-off-by: Artem Kuzmitckii <artem.kuzmitckii@amd.com>
f2dbbb7 to
77a7e06
Compare
Signed-off-by: Artem Kuzmitckii <artem.kuzmitckii@amd.com>
45a01df to
0946828
Compare
Signed-off-by: Artem Kuzmitckii <artem.kuzmitckii@amd.com>
|
@k-artem - is this ready for final review? @hwchen2017 - any remaining review requests? |
|
@loadams could you please help with continue review? |
|
@tjruwase @loadams Can you please help move this PR forward? I believe we have addressed all review comments. This PR significantly improves DeepSpeed functionality on AMD hardware. Also, we discussed this a while ago, but I don't think we moved forward on it: how do we remove the DeepSpeed dependency on this inactive repo?:
Related issue: #7216 |
@jithunnair-amd, yes I will focus on this PR. |
Apologies for this question hanging for so long. Since so much has changed over the past months, I think it might be worth having a chat on this. |
Sure, would you like to discuss here, or on a different platform eg. email? The gist of it is that we aren't aware of any alternatives for QPyTorch, so creating a deepspeed fork is the next best option to make updates to it. Currently, this lib is only used in unit tests (test_quantized_linear_module.py and test_fp_quant.py). |
Got it. Unfortunately, we lack bandwidth to maintain QPyTorch fork. Moreover, our roadmap is to streamline by deprecating features subject to bandwidth and community interests. Are you interested in maintaining such a fork? |
|
@k-artem can you please address the formatting issues? |
hi @sfc-gh-truwase , I checked it, actually it looks like a CI issue |
…extensioni Details deepspeedai#7448 (comment) Signed-off-by: Artem Kuzmitckii <artem.kuzmitckii@amd.com>
The patch delivers several fixes for building issues for CUDA part of DeepSpeed library.
Percentage of passed unit tests improved(tested on RDNA hardware, gfx110x and gfx12x) Before:
collected 5298 items / 15 skipped
2773 failed, 862 passed, 1665 skipped, 13 errors
After:
collected 5851 items / 11 skipped
4187 failed, 1373 passed, 292 skipped, 10 errors
Regarding testing of fp_quantizer(DS_BUILD_FP_QUANTIZER) via
tests/unit/ops/fp_quantizer/test_fp_quant.py, this test depends on QPyTorch which should be patched before run on AMD, please apply Tiiiger/QPyTorch#71