Skip to content

[Bug]: Can't run run_infinitebench.py #195

@ruili-pml

Description

@ruili-pml

Describe the bug

Hi,

Great work and thanks for sharing the code. I was trying to run run_infinitebench.py in experiment but then got this error

Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/minference/lib/python3.10/pdb.py", line 1723, in main
    pdb._runscript(mainpyfile)
  File "/home/ubuntu/miniconda3/envs/minference/lib/python3.10/pdb.py", line 1583, in _runscript
    self.run(statement)
  File "/home/ubuntu/miniconda3/envs/minference/lib/python3.10/bdb.py", line 598, in run
    exec(cmd, globals, locals)
  File "<string>", line 1, in <module>
  File "/home/ubuntu/MInference/experiments/infinite_bench/run_infinitebench.py", line 274, in <module>
    pred = get_pred(
  File "/home/ubuntu/MInference/experiments/infinite_bench/run_infinitebench.py", line 100, in get_pred
    outputs = model.generate(**input_tensors, generation_config=generation_config)
  File "/home/ubuntu/miniconda3/envs/minference/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
  File "/home/ubuntu/miniconda3/envs/minference/lib/python3.10/site-packages/transformers/generation/utils.py", line 2564, in generate
    result = decoding_method(
  File "/home/ubuntu/miniconda3/envs/minference/lib/python3.10/site-packages/transformers/generation/utils.py", line 2756, in _sample
    model_kwargs = self._get_initial_cache_position(cur_len, input_ids.device, model_kwargs)
  File "/home/ubuntu/miniconda3/envs/minference/lib/python3.10/site-packages/transformers/generation/utils.py", line 1833, in _get_initial_cache_position
    past_length = cache.get_seq_length()
  File "/home/ubuntu/miniconda3/envs/minference/lib/python3.10/site-packages/minference/modules/kvcompression.py", line 439, in get_seq_length
    if len(self.key_cache) <= layer_idx:
AttributeError: 'DynamicCacheWithRepeat' object has no attribute 'key_cache'

If it helps, I'm using transformer 4.57.1 and vllm 0.11.0.

In addition, is it necessary to install vllm flash attn?

from vllm_flash_attn import flash_attn_varlen_func, flash_attn_with_kvcache

It only works with a quite old torch and vllm.

Would be great if you can take a look.

Thanks,
Rui

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions