Skip to content

[Question]: Feasibility of evaluating complex KV compression methods via single-turn or batch-input modes #200

@ponytaill

Description

@ponytaill

Describe the issue

Thank you for your inspiring work on SCBench!

I am currently interested in benchmarking several SOTA KV cache compression algorithms (such as ShadowKV, OmniKV, etc.) using SCBench. However, integrating these methods into the SCBench framework is proving to be quite challenging. Due to the complexity of their original implementations, modifying their codebases to fully support the specific multi-turn / shared-context mechanisms defined in SCBench involves significantly high engineering costs.

Questions Given these implementation constraints, I would like to ask for your advice on the validity of the following alternative evaluation strategies:

Single-turn Evaluation: Is it reasonable to utilize SCBench solely for single-turn evaluation (i.e., treating the first turn as the primary metric)? I understand that the core contribution of SCBench is analyzing the full lifecycle (especially cache reuse), but I am wondering if the dataset itself still holds value for validating compression quality in a single-turn setup compared to other benchmarks like LongBench or InfiniteBench.

One-pass / Batch Evaluation: Would it be methodologically sound to concatenate the context with all the questions (e.g., Context + Q1 + Q2 + Q3...) into a single prompt and have the model generate all answers in one pass?

Any insights or suggestions would be greatly appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions