Skip to content

Conversation

@Ag-Cu
Copy link

@Ag-Cu Ag-Cu commented Feb 17, 2025

Summary

This pull request introduces support for the RISC-V Vector Extension (RVV) with 128-bit vector size in the sonic-cpp library.

Key Changes

CMake Configuration:

Added a new build option ENABLE_RVV_128 to enable RVV support.
Updated set_arch_flags.cmake to include RVV-specific compile options.

RVV-Specific Implementations:

Added RVV-specific architecture files under include/sonic/internal/arch/rvv-128/.
Implemented various SIMD operations for RVV, including string processing, integer conversion, and JSON quoting.

Test Results

[----------] Global test environment tear-down
[==========] 175 tests from 25 test suites ran. (43717 ms total)
[  PASSED  ] 175 tests.

yintong.ustc@bytedance.com and others added 5 commits February 17, 2025 09:58
The build failed due to missing <algorithm> header when using std::remove
in the C++17 test section. The compiler incorrectly resolved to the C
library remove() function instead of STL algorithm.

This change:
1. Adds explicit #include <algorithm> for STL algorithms

Fixes compilation error:
error: cannot convert 'ValueIterator' to 'const char*'
note: initializing argument 1 of 'int remove(const char*)'
return __riscv_vreinterpret_v_u32m1_u16m1(v03);
}

static sonic_force_inline vuint16m1_t rvv_uzp2q_u16(vuint16m1_t a,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems that this part causes the func to be limited to vlen=128. vset will stitch the two regs together via VLMAX, resulting in wrong output after vlen=256. I have another implementation which has been tested(on vlen=256/512) and maybe helpful for it:

static sonic_force_inline vuint16m1_t rvv_uzp2q_u16(vuint16m1_t a, vuint16m1_t b) {
    a = __riscv_vlmul_ext_v_u16mf2_u16m1(__riscv_vnsrl_wx_u16mf2(__riscv_vreinterpret_v_u16m1_u32m1(a), 16, 4));
    b = __riscv_vlmul_ext_v_u16mf2_u16m1(__riscv_vnsrl_wx_u16mf2(__riscv_vreinterpret_v_u16m1_u32m1(b), 16, 4));
    return __riscv_vslideup_vx_u16m1(a, b, 4, 8);
}

since we know a and b contains 8 elems, we can just use nsrl and slideup to combine them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants