diff --git a/README.md b/README.md
index 5787648..5710012 100644
--- a/README.md
+++ b/README.md
@@ -46,10 +46,11 @@ DeepMath implements both. The model learns to generate short Python snippets, wh
- Inference: based on [SmolAgents](https://github.com/huggingface/smolagents/), a math agent was created. vLLM is used as the inference engine.
- Training: based on the GRPO trainer in [TRL](https://github.com/huggingface/trl), we modified TRL's vLLM client and server to generate GRPO completions using our DeepMath agent.
-
-
-
Figure 1: The vLLM client and server were modified to use the DeepMath agent in generating the candidates, while using the vLLM backend.
-
+
+
+
+Figure 1: The vLLM client and server were modified to use the DeepMath agent in generating the candidates, while using the vLLM backend.
+
- **Agent Interface:** During inference, the model can output normal tokens or special agent calls containing Python snippets.
@@ -63,10 +64,9 @@ DeepMath implements both. The model learns to generate short Python snippets, wh
- **Interpretability:** Snippets are readable and auditable.
-
-
-
Figure 2: Output example where python code is generated, evaluated and the answer is inserted into the trace and used for context.
-
+
+
+Figure 2: Output example where python code is generated, evaluated and the answer is inserted into the trace and used for context.
## Training with GRPO
@@ -92,7 +92,13 @@ We benchmarked DeepMath against baselines on four datasets. Metrics include:
- **Mean output length** (brevity).
+
+
+
+- We compare a baseline configuration ([Qwen3-4B-Thinking-2507](https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507), no agenting) with our DeepMath model. As ablation, we evaluate the agentic framework we developed running with the untrained Qwen3 model, denoted by **+Agent**. Additionally, we examine whether the GRPO training (for agentic use) improves non-agentic inference, denoted by **+GRPO**. Thus the two ablations are independent, not additive.
+
+- We observe the agentic inference reduces output lengths, with mixed accuracy results. The DeepMath model is both GRPO-trained and run in agentic mode, and shows the highest accuracy with shortened traces. We conclude **both GRPO training and agentic inference are needed** for best results.
**Key Insight:** DeepMath reduces output length by up to **66%** while improving accuracy on challenging datasets.
diff --git a/assets/trl-grpo-vllm-deepmath.png b/assets/trl-grpo-vllm-deepmath.png
index 79d69ee..2281c47 100644
Binary files a/assets/trl-grpo-vllm-deepmath.png and b/assets/trl-grpo-vllm-deepmath.png differ