-
Notifications
You must be signed in to change notification settings - Fork 103
Description
I’m trying to understand the intended purpose of KernelBench and find some ambiguity in the paper:
-
Inline CUDA kernels: Designed to evaluate LLM-generated, from‑scratch CUDA kernels that do not reuse PyTorch primitives (e.g., as in AI CUDA Engineer workflow).
-
Forward kernels: Designed to evaluate LLM-generated wrappers or glue code that do call into existing PyTorch implementations (e.g., as in CUDA-L1 workflow).
The paper does not clearly state which of these two scenarios KernelBench is meant to measure. As a result, it’s unclear whether submissions are expected to:
- Write fully self‑contained CUDA kernels with no PyTorch calls, or
- Write high‑level
forward()functions that simply delegate to PyTorch’s optimized backend.
Questions
-
What exactly does KernelBench measure?
- Inline/custom CUDA kernels only?
- Forward wrappers calling PyTorch?
- Both, with separate tracks?
-
What are the benchmarking requirements?
- Are PyTorch calls disallowed in the “inline CUDA” track?
- If PyTorch calls are allowed, what level of originality is expected?
Please update the README and/or paper to explicitly define the two benchmarking modes (if both are intended), including:
- Allowed APIs and library calls for each track.
- Example submissions for each mode.
If only one mode is intended, please clarify in both the paper and the repository documentation.