I ran this code with the command of
python carve_sigil.py \
name=llama2_gcg_sys \
wandb.tags=[extraction] \
model=llama2-7b-chat \
optimizer=gcg \
sigil.num_tokens=32 \
sigil=sysrepeater
I evaluated the result prompt with 50 test samples in awesome-chatgpt-prompts dataset, but only 3 samples was correctly responsed (correctly repeat the system prompt). I don't know if it is normal or I just did something wrong. Could you please offer me the best setting of prompt extraction attack or tell me where I went wrong? Thank you very much.