Skip to content

Conversation

@AdamNiederer
Copy link
Contributor

@AdamNiederer AdamNiederer commented Jun 4, 2024

This lets you run two unsupported-but-really-supported cards of different architecture together in the same program. Works great w/ llama.cpp on my 7900XT + 6600; I'm seeing a 72% perf uplift running LLaMA3-70B Q2 across the two cards (7.4 vs 4.3 tok/s).

Example usage (device 0 is RDNA3, device 1 is RDNA2):

HSA_OVERRIDE_GFX_VERSION_1="11.0.0" HSA_OVERRIDE_GFX_VERSION_2="10.3.0" ollama serve

This lets you run two unsupported-but-really-supported cards of different architecture together in the same program. Works great w/ llama.cpp on my 7900XT + 6600.

Example usage (device 0 is RDNA3, device 1 is RDNA2):

HSA_OVERRIDE_GFX_VERSION_1="11.0.0" HSA_OVERRIDE_GFX_VERSION_2="10.3.0" ollama serve
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cleaner than the strcat, I think:
snprintf(per_device_override_name, sizeof(per_device_override_name), "HSA_OVERRIDE_GFX_VERSION_%d", node_id);

I'll test this out internally and if it works (and doesn't break any other flows), I'll get it pushed out. Hopefully for 6.2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's much better, thank you! Changed in da8055d. And thanks for giving it a spin internally!

@kentrussell
Copy link
Contributor

Internal testing looks good, we'll try to get this released in ROCm 6.2. Thanks for your contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants