Skip to content

Conversation

@ankrgyl
Copy link
Contributor

@ankrgyl ankrgyl commented Aug 8, 2025

No description provided.

@github-actions
Copy link

github-actions bot commented Aug 8, 2025

Braintrust eval report

‼️ Autoevals (gpt-5-1754668222)

Score Average Improvements Regressions
NumericDiff 78.3% (+4pp) 20 🟢 9 🔴
Start 1754668221.91s - -
End 1754668229.86s - -
Duration 7.95s (+2.86s) 48 🟢 171 🔴
Llm_duration 9.92s (+5.62s) - 119 🔴
Prompt_tokens 317.7tok (-209.2tok) 119 🟢 -
Completion_tokens 249.19tok (+203.62tok) - 119 🔴
Total_tokens 566.89tok (-5.58tok) 72 🟢 47 🔴
Estimated_cost 0$ (0$) 119 🟢 -
Prompt_cached_tokens 0tok (+0tok) - -
Prompt_cache_creation_tokens 0tok (+0tok) - -
Expand to see errors
Error: Unknown score choice undefined
    at parseResponse (/home/runner/work/autoevals/autoevals/jsdist/index.js:430:11)
    at OpenAIClassifier (/home/runner/work/autoevals/autoevals/jsdist/index.js:406:10)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async Object.ClosedQA (/home/runner/work/autoevals/autoevals/jsdist/index.js:456:12)
    at async /home/runner/work/autoevals/autoevals/node_modules/.pnpm/braintrust@0.0.140/node_modules/braintrust/dist/index.js:20435:22
    at async /home/runner/work/autoevals/autoevals/node_modules/.pnpm/braintrust@0.0.140/node_modules/braintrust/dist/index.js:20394:26
    at async Object.task (eval at <anonymous> (/home/runner/work/autoevals/autoevals/node_modules/.pnpm/braintrust@0.0.140/node_modules/braintrust/dist/cli.js:29191:5), <anonymous>:4191:20)
    at async rootSpan.traced.name (/home/runner/work/autoevals/autoevals/node_modules/.pnpm/braintrust@0.0.140/node_modules/braintrust/dist/cli.js:24576:26)
    at async callback (/home/runner/work/autoevals/autoevals/node_modules/.pnpm/braintrust@0.0.140/node_modules/braintrust/dist/cli.js:24572:11)
    at async /home/runner/work/autoevals/autoevals/node_modules/.pnpm/braintrust@0.0.140/node_modules/braintrust/dist/cli.js:24707:16

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants