Skip to content

Conversation

@adshmh
Copy link
Contributor

@adshmh adshmh commented May 9, 2025

Summary

  • JUDGE: new framework for building QoS systems on PATH.

Primary Changes:

  • Introduce the core concepts of JUDGE: 4 Contexts of operation
  • Define and implement key components for implementing the 4 defined key contexts.
  • Define new proto files as data containers for JUDGE
  • Separate Data Container and logic functionality in components

Secondary Changes:

  • Automate observations
  • Automate Request processing
  • Automate Response processing
  • Automate Edge cases processing (e.g. protocol-level errors)

Issue

Repetitive/boiler plate code in qos package, which masked the actual QoS logic.

  • Issue or PR: #{ISSUE_OR_PR_NUMBER}

Type of change

Select one or more from the following:

  • New feature, functionality or library
  • Bug fix
  • Code health or cleanup
  • Documentation
  • Other (specify)

QoS Checklist

E2E Validation & Tests

  • make path_up
  • make test_e2e_evm_shannon
  • make test_e2e_evm_morse

Observability

  • 1. make path_up
  • 2. Run one of the following:
    • For Shannon with anvil: make test_request__shannon_relay_util_100
    • For Morse with F00C: make test_request__morse_relay_util_100
  • 3. Visit PATH Relay Grafana Dashboard to view results

Sanity Checklist

  • I have updated the GitHub Issue assignees, reviewers, labels, project, iteration and milestone
  • For docs, I have run make docusaurus_start
  • For code, I have run make test_all
  • For configurations, I have updated the documentation
  • I added TODOs where applicable

@adshmh adshmh added this to the Portal -> PATH Migration milestone May 9, 2025
@adshmh adshmh self-assigned this May 9, 2025
@adshmh adshmh added enhancement qos Intended to improve quality of service labels May 9, 2025
@adshmh
Copy link
Contributor Author

adshmh commented May 9, 2025

Please note the UPNEXT items below:

  • Architecture.md doc.
  • toolkit package implementation (JUDGE works without it, but this will make QoS implementation even easier).
  • Resolving conflicts
  • Remove the Draft status.

@adshmh adshmh requested review from Olshansk and commoddity May 9, 2025 19:05
Copy link
Contributor

@commoddity commoddity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm signing off for the day and leaving this as a partial review that I will pick up tomorrow.

@adshmh just want to say that I love the design and direction of this. IMO this is a major improvements and I can really see how it's incorporating all of the learnings we've made in developing the QoS system from 0 to where it's at now.

This is definitely the right direction and I'm excited to see the finished PR. 🚀

}

// RequestError contains details about a request error
message RequestError {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this file be called request_error?

@@ -0,0 +1,3 @@
package evm

// TODO_IN_THIS_PR: use the framework's EndpointQualityCheckContext for adding QoS checks.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adshmh I'm confused about the purpose of the example_evm folder. Is this meant to be an in-progress development folder or is this something that will be included in the final version of JUDGE?


```go
// Create a QoS specification with probes for each method
spec := toolkit.QoSSpec{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far this design looks amazing; very intuitive.

Is the final idea for configuring PATH that we would setup a series of toolkit.QoSSpec to replace the current service configurations in config/service_qos_config.go?

// and provides a standardized way to return HTTP responses to clients.
type ClientHTTPResponse struct {
StatusCode int
Headers map[string]string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason we can't use http.Header as the type here?

endpointChecksToPerform []*jsonrpc.Request
}

func (ctx *EndpointQualityChecksContext) buildEndpointQualityCheckContexts() []gateway.RequestQoSContext {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#PUC for this method.

"github.com/buildwithgrove/path/protocol"
)

// TODO_FUTURE(@adshmh): Rank qualified endpoints, e.g. based on latency, for selection.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// TODO_FUTURE(@adshmh): Rank qualified endpoints, e.g. based on latency, for selection.
// TODO_FUTURE(@adshmh,@commoddity): Rank qualified endpoints, e.g. based on latency, for selection.

I have a functional PR that does this already so I'm happy to take on this work once JUDGE is merged to main.


// request's journal.
// Provide read-only access to the request, e.g. the JSONRPC method.
*requestJournal
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the name 'journal` for this logic.

customSelector EndpointSelector

// Endpoints disqualified from current selection context.
disqualifiedEndpoints map[protocol.EndpointAddr]struct{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice to provide this alongside valid endpoints.

}

// Call the endpoint filters on the endpoint.
// Endpoint will be disqualified if any filter reutrns an error.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Endpoint will be disqualified if any filter reutrns an error.
// Endpoint will be disqualified if any filter returns an error.


// TODO_IN_THIS_PR: sort out the scope of fields and methods: private/public on private structs.
//
// requestQoSContext holds the context for a request through its lifecycl.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// requestQoSContext holds the context for a request through its lifecycl.
// requestQoSContext holds the context for a request through its lifecycle.

Copy link
Contributor

@commoddity commoddity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finished my first round of review.

First of, want to reiterate that I really love the direction of this PR. IT's a great improvement all around over the repetitive code we currently have. Great work and I'm looking forward to reviewing the finished PR.

Not requesting changes since I know it's a WIP so just leaving some observations and comments on the current state. 🔥

paramsToUpdate := ctx.stateUpdater(ctx)

// Update the state parameters through the service state.
return ctx.ServiceState.updateParameters(paramsToUpdate)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the above comment re: ServiceState being read-only, it's a little confusing that there is an updateParameters method here. Is it possible to clarify this in comments?

// queryResults maps keys to query results for this endpoint.
// The map key is the method of the JSONRPC request for which the query result was built.
// Examples:
// - "eth_blockNumber": &EndpointQueryResult{IntValues: {"blockNumber": 0x1234}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great comment here; makes it nice and clear what the data actually is.

parsedJSONRPCResponse *jsonrpc.Response

// The set of values/attributes extracted from the endpoint query and the endpoint's parsed JSONRPC response.
// e.g. for a Solana `getEpochInfo` request, the custom service could derive two endpoint attributes as follows:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once again great comment here.

ErrorKind: EndpointErrKindInvalidResult,
Description: description,
RecommendedSanction: &Sanction{
Type: SanctionTypeTemporary,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#PUC on why this is always SanctionTypeTemporary.

EDIT - nm just saw PermanentSanction below. Should we maybe call this TemporarilySanctionEndpoint or something similar?

SanctionTypePermanent // Permanent exclusion
)

// Sanction represents a recommendation to limit endpoint usage.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a strong opinion but perhaps this file would benefit from more detail for the reader on what exactly a "sanction" is in the context of JUDGE? ie. making this comment more explicit that a sanction means not selecting an endpoint.

Comment on lines +34 to +39
if len(eqr.StrValues) > 0 {
obs.StringValues = make(map[string]string)
}
for key, value := range eqr.StrValues {
obs.StringValues[key] = value
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if len(eqr.StrValues) > 0 {
obs.StringValues = make(map[string]string)
}
for key, value := range eqr.StrValues {
obs.StringValues[key] = value
}
if len(eqr.StrValues) > 0 {
obs.StringValues = make(map[string]string,)
maps.Copy(obs.StringValues, eqr.StrValues)
}

Comment on lines +42 to +47
if len(eqr.IntValues) > 0 {
obs.IntValues = make(map[string]int64)
}
for key, value := range eqr.IntValues {
obs.IntValues[key] = int64(value)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason that eqr.IntValues must be map[string]int but obs.IntValues is map[string]int64?

If it's not possible to change feel free to ignore the below suggestion.

Suggested change
if len(eqr.IntValues) > 0 {
obs.IntValues = make(map[string]int64)
}
for key, value := range eqr.IntValues {
obs.IntValues[key] = int64(value)
}
if len(eqr.IntValues) > 0 {
obs.IntValues = make(map[string]int64)
maps.Copy(obs.IntValues, eqr.IntValues)
}

Comment on lines +76 to +90
// Copy string values
if len(observation.StringValues) > 0 {
result.StrValues = make(map[string]string)
}
for key, value := range observation.StringValues {
result.StrValues[key] = value
}

// Copy int values
if len(observation.IntValues) > 0 {
result.IntValues = make(map[string]int)
}
for key, value := range observation.IntValues {
result.IntValues[key] = int(value)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto the above. Feel free to ignore if result.IntValue cannot be map[string]int64.

Suggested change
// Copy string values
if len(observation.StringValues) > 0 {
result.StrValues = make(map[string]string)
}
for key, value := range observation.StringValues {
result.StrValues[key] = value
}
// Copy int values
if len(observation.IntValues) > 0 {
result.IntValues = make(map[string]int)
}
for key, value := range observation.IntValues {
result.IntValues[key] = int(value)
}
// Copy string values
if len(observation.StringValues) > 0 {
result.StrValues = make(map[string]string)
maps.Copy(result.StrValues, observation.StringValues)s
}
// Copy int values
if len(observation.IntValues) > 0 {
result.IntValues = make(map[string]int)
maps.Copy(result.IntValues, observation.IntValues)
}

// mu protects the state map from concurrent access
mu sync.RWMutex

// stateParameters
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps examples of these values could be would be helpful in the comment.


}

func (s *ServiceState) GetConsensusParam(paramName string) (map[string]int, bool) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#PUC for this since it's a bit more complicated that the simple values. Specifically what it's used for (eg. archival checks).

Copy link
Contributor

@Olshansk Olshansk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adshmh I'm leaving a comment for posterity.

After you tend/reply to @commoddity's comments AND merge with main, I'll aim to do a first round next week.

Thank you in advance for maintaining and upkeeping such a major PR - I know it's not easy!

Copy link
Contributor

@Olshansk Olshansk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adshmh I'm leaving a comment for posterity.

After you tend/reply to @commoddity's comments AND merge with main, I'll aim to do a first round next week.

Thank you in advance for maintaining and upkeeping such a major PR - I know it's not easy!

Copy link
Contributor

@Olshansk Olshansk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adshmh I'm leaving a comment for posterity.

After you tend/reply to @commoddity's comments AND merge with main, I'll aim to do a first round next week.

Thank you in advance for maintaining and upkeeping such a major PR - I know it's not easy!

@Olshansk Olshansk changed the title JUDGE QoS framework v0 [WIP] JUDGE QoS framework v0 Jul 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

qos Intended to improve quality of service

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants