Skip to content

Can I use https://github.com/microsoft/llm-as-judge as the search reward for GRPO training by combining it in? #362

@johnson7788

Description

@johnson7788

Openai fake code, is it correct?

from agl import emit_reward, span

def search_tool(query):
    docs = real_search(query)

    # 1) 组织 judge 输入
    judge_input = {
        "query": query,
        "docs": [{"title": d.title, "snippet": d.snippet} for d in docs[:5]],
    }

    # 2) 调 llm-as-judge 打分(返回 1~10)
    score_1_10 = llm_as_judge(judge_input, rubric="relevance+coverage")
    reward = (score_1_10 - 1) / 9.0  # -> 0~1

    # 3) 发 reward(step-level)
    emit_reward(name="search_quality", value=reward)

    return docs

if Chinese People, we can talk with Wechat: johnsongzc

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionQuestion about a feature or some usagetopic/rag

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions