Using the JVM with this cmdl:
java -server -Xmx26g --enable-preview --add-modules jdk.incubator.vector -jar llama3.jar --model Llama-3.2-3B-Instruct-Q8_0.gguf --chat -n 75000 --temperature 0
under JDK 25 Graalvm EA Win11 Ryzen 7 32 gig and also 3.2-1B model both under Win11 and Ubuntu 22.04 Ryzen 9 Hx370 same VM except -Xmx96g and -n 128000 since that machine has 128 gig I notice the gradient vanishes around 7-10k tokens and the models either repeat an endless loop of doggerel or issue only sullen 2 or 3 word repetitive responses. Going to try 3.1-8B under Ubuntu and see what happens but so far its consistently become deranged at right around the same 7-10k token mark.