forked from abetlen/llama-cpp-python
-
Notifications
You must be signed in to change notification settings - Fork 17
Open
Description
Hey, thanks for maintaining! I recently tried Next and Nemotron-3-Nano-30B-A3B-Q4_K_M. Both failed after the second completion (the first one always works), with sth. like:
init: the tokens of sequence 0 in the input batch have inconsistent sequence positions:
- the last position stored in the memory module of the context (i.e. the KV cache) for sequence 0 is X = 16
- the tokens for sequence 0 in the input batch have a starting position of Y = 2
it is required that the sequence positions remain consecutive: Y = X + 1
decode: failed to initialize batch
llama_decode: failed to decode, ret = -1
Is it just something on my setup? Can you run these newer models successfully?
Metadata
Metadata
Assignees
Labels
No labels