-
Notifications
You must be signed in to change notification settings - Fork 73
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
Hi,
Great work and thanks for sharing the code. I was trying to run run_infinitebench.py in experiment but then got this error
Traceback (most recent call last):
File "/home/ubuntu/miniconda3/envs/minference/lib/python3.10/pdb.py", line 1723, in main
pdb._runscript(mainpyfile)
File "/home/ubuntu/miniconda3/envs/minference/lib/python3.10/pdb.py", line 1583, in _runscript
self.run(statement)
File "/home/ubuntu/miniconda3/envs/minference/lib/python3.10/bdb.py", line 598, in run
exec(cmd, globals, locals)
File "<string>", line 1, in <module>
File "/home/ubuntu/MInference/experiments/infinite_bench/run_infinitebench.py", line 274, in <module>
pred = get_pred(
File "/home/ubuntu/MInference/experiments/infinite_bench/run_infinitebench.py", line 100, in get_pred
outputs = model.generate(**input_tensors, generation_config=generation_config)
File "/home/ubuntu/miniconda3/envs/minference/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
return func(*args, **kwargs)
File "/home/ubuntu/miniconda3/envs/minference/lib/python3.10/site-packages/transformers/generation/utils.py", line 2564, in generate
result = decoding_method(
File "/home/ubuntu/miniconda3/envs/minference/lib/python3.10/site-packages/transformers/generation/utils.py", line 2756, in _sample
model_kwargs = self._get_initial_cache_position(cur_len, input_ids.device, model_kwargs)
File "/home/ubuntu/miniconda3/envs/minference/lib/python3.10/site-packages/transformers/generation/utils.py", line 1833, in _get_initial_cache_position
past_length = cache.get_seq_length()
File "/home/ubuntu/miniconda3/envs/minference/lib/python3.10/site-packages/minference/modules/kvcompression.py", line 439, in get_seq_length
if len(self.key_cache) <= layer_idx:
AttributeError: 'DynamicCacheWithRepeat' object has no attribute 'key_cache'
If it helps, I'm using transformer 4.57.1 and vllm 0.11.0.
In addition, is it necessary to install vllm flash attn?
| from vllm_flash_attn import flash_attn_varlen_func, flash_attn_with_kvcache |
It only works with a quite old torch and vllm.
Would be great if you can take a look.
Thanks,
Rui
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working