-
Notifications
You must be signed in to change notification settings - Fork 73
Description
Describe the issue
Thank you for your inspiring work on SCBench!
I am currently interested in benchmarking several SOTA KV cache compression algorithms (such as ShadowKV, OmniKV, etc.) using SCBench. However, integrating these methods into the SCBench framework is proving to be quite challenging. Due to the complexity of their original implementations, modifying their codebases to fully support the specific multi-turn / shared-context mechanisms defined in SCBench involves significantly high engineering costs.
Questions Given these implementation constraints, I would like to ask for your advice on the validity of the following alternative evaluation strategies:
Single-turn Evaluation: Is it reasonable to utilize SCBench solely for single-turn evaluation (i.e., treating the first turn as the primary metric)? I understand that the core contribution of SCBench is analyzing the full lifecycle (especially cache reuse), but I am wondering if the dataset itself still holds value for validating compression quality in a single-turn setup compared to other benchmarks like LongBench or InfiniteBench.
One-pass / Batch Evaluation: Would it be methodologically sound to concatenate the context with all the questions (e.g., Context + Q1 + Q2 + Q3...) into a single prompt and have the model generate all answers in one pass?
Any insights or suggestions would be greatly appreciated.