Can I use https://github.com/microsoft/llm-as-judge  as the search reward for GRPO training by combining it in?

Openai fake code, is it correct?
```
from agl import emit_reward, span

def search_tool(query):
    docs = real_search(query)

    # 1) 组织 judge 输入
    judge_input = {
        "query": query,
        "docs": [{"title": d.title, "snippet": d.snippet} for d in docs[:5]],
    }

    # 2) 调 llm-as-judge 打分（返回 1~10）
    score_1_10 = llm_as_judge(judge_input, rubric="relevance+coverage")
    reward = (score_1_10 - 1) / 9.0  # -> 0~1

    # 3) 发 reward（step-level）
    emit_reward(name="search_quality", value=reward)

    return docs
```

if Chinese People, we can talk with Wechat: johnsongzc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Can I use https://github.com/microsoft/llm-as-judge as the search reward for GRPO training by combining it in? #362

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Can I use https://github.com/microsoft/llm-as-judge as the search reward for GRPO training by combining it in? #362

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions