Skip to content

Conversation

@moizpgedge
Copy link
Contributor

@moizpgedge moizpgedge commented Dec 10, 2025

Summary

Adds validation to the get-join-options endpoint to ensure incoming host IDs are unique within the cluster. When a control plane server attempts to join with a duplicate host ID, it now returns a clear invalid_input error that properly propagates back to the caller of the join-cluster endpoint.

Changes

  • Added validateHostIDUniqueness() function in validate.go to check if a host ID already exists
  • Added ErrHostAlreadyExists error definition in errors.go
  • Updated GetJoinOptions() handler to validate host ID uniqueness before creating credentials
  • Added invalid_input error to get-join-options endpoint in API design
  • Updated generated error encoder to handle invalid_input with HTTP 400 Bad Request

Testing

Manual testing steps:

  1. Initialize cluster and join host-2:h
    restish control-plane-local-1 init-cluster
    2. Test duplicate host ID detection:
   TOKEN_VALUE=$(restish control-plane-local-1 get-join-token | jq -r '.token')
   curl -X POST "http://localhost:3000/v1/internal/cluster/join-options" \
     -H "Content-Type: application/json" \
     -d "{\"token\": \"$TOKEN_VALUE\", \"host_id\": \"host-2\", \"hostname\": \"test\", \"ipv4_address\": \"127.0.0.1\", \"embedded_etcd_enabled\": true}" | jq
      Expected: HTTP 400 with `{"name": "invalid_input", "message": "host with ID \"host-2\" already exists in the cluster"}`
  1. Verify error propagates through join-cluster endpoint when called with duplicate host ID

Checklist

Notes for Reviewers

  • Validation occurs before etcd.AddHost() to avoid creating credentials for duplicate hosts
  • Error handling follows existing patterns in the codebase
  • Manual updates were made to generated files (encode_decode.go and types.go) to support invalid_input error; these match what the generator would produce
  • The validation function treats any error from GetHost() (except success) as "host doesn't exist", which is correct for uniqueness checking
  • Error properly propagates through HTTP client back to users calling join-cluster

- Add validateHostIDUniqueness function in validate.go to check if host ID already exists
- Add ErrHostAlreadyExists error definition in errors.go
- Update GetJoinOptions handler to validate host ID before creating credentials
- Add invalid_input error support to get-join-options endpoint in API design
- Update generated error encoder to handle invalid_input with HTTP 400

Fixes PLAT-205: get-join-options endpoint should validate incoming host ID uniqueness
Remove explicit error handling for non-NotFound errors. The function
now only checks if host exists (err == nil) and returns error for
duplicates. Any other error (including NotFound) is treated as
host not existing, which is valid for uniqueness check.
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds validation to prevent duplicate host IDs when control plane servers attempt to join a cluster. When a duplicate host ID is detected, the system now returns a clear invalid_input error instead of allowing the operation to proceed.

  • Added host ID uniqueness validation in the get-join-options endpoint
  • Introduced new error type ErrHostAlreadyExists with invalid_input classification
  • Updated API design and generated HTTP handlers to support the new error response

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
server/internal/api/apiv1/validate.go Adds validateHostIDUniqueness() function to check for duplicate host IDs
server/internal/api/apiv1/post_init_handlers.go Integrates uniqueness validation into GetJoinOptions() handler
server/internal/api/apiv1/errors.go Defines ErrHostAlreadyExists error constant
api/apiv1/gen/http/control_plane/server/types.go Adds response body type for invalid_input error
api/apiv1/gen/http/control_plane/server/encode_decode.go Adds HTTP 400 encoder for invalid_input error
api/apiv1/design/api.go Adds invalid_input error to API design specification

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 437 to 443
_, err := hostSvc.GetHost(ctx, hostID)
if err == nil {
// Host already exists - this is a duplicate
return fmt.Errorf("host with ID %q already exists in the cluster", hostID)
}
// Host doesn't exist (storage.ErrNotFound) - good, host ID is unique
return nil
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function assumes any error from GetHost() means the host doesn't exist, but errors other than 'not found' (such as connection failures or permission errors) should be propagated to the caller rather than being silently treated as success. Check if the error is specifically storage.ErrNotFound before returning nil, and return other errors.

Copilot uses AI. Check for mistakes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Copilot. Rather than an if statement, this is a good place to use a switch, errors.Is, as well as the new ErrHostAlreadyExists error that you added:

	switch {
	case errors.Is(err, storage.ErrNotFound):
		return nil
	case err != nil:
		return fmt.Errorf("failed to check for existing host: %w", err)
	default:
		return ErrHostAlreadyExists
	}

@moizpgedge moizpgedge marked this pull request as draft December 11, 2025 15:07
Copy link
Member

@jason-lynch jason-lynch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good so far! I've requested a few small changes.

Comment on lines 437 to 443
_, err := hostSvc.GetHost(ctx, hostID)
if err == nil {
// Host already exists - this is a duplicate
return fmt.Errorf("host with ID %q already exists in the cluster", hostID)
}
// Host doesn't exist (storage.ErrNotFound) - good, host ID is unique
return nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Copilot. Rather than an if statement, this is a good place to use a switch, errors.Is, as well as the new ErrHostAlreadyExists error that you added:

	switch {
	case errors.Is(err, storage.ErrNotFound):
		return nil
	case err != nil:
		return fmt.Errorf("failed to check for existing host: %w", err)
	default:
		return ErrHostAlreadyExists
	}


// Validate that the host ID is unique in the cluster
if err := validateHostIDUniqueness(ctx, s.hostSvc, hostID); err != nil {
return nil, makeInvalidInputErr(err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you take my other advice in validate.go, you should change this to

Suggested change
return nil, makeInvalidInputErr(err)
return nil, apiErr(err)

This works because ErrHostAlreadyExists is already an invalid input error. So when validateHostIDUniqueness returns ErrHostAlreadyExists, apiErr will return it as-is. And when validateHostIDUniqueness returns any other error, apiErr will wrap that error in a server error, which is what we want for any unexpected errors.

g.Payload(ClusterJoinRequest)
g.Error("cluster_not_initialized")
g.Error("invalid_join_token")
g.Error("invalid_input")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please also add this error to the join-cluster endpoint below and re-run make -C api generate? This error is required in both places so that we return the correct error to the client.

- Add invalid_input error to join-cluster endpoint in API design
- Create ErrHostAlreadyExistsWithID function in errors.go to return error with specific host ID
- Update validateHostIDUniqueness to use new error function and properly handle storage errors
- Error message now includes the specific host ID for better client feedback
Copy link
Member

@jason-lynch jason-lynch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for visibility - I've requested a few further changes in Slack.

- Remove unused ErrHostAlreadyExists constant
- Change ErrHostAlreadyExistsWithID to return *api.APIError instead of generic error
- Use apiErr() instead of makeInvalidInputErr() in GetJoinOptions handler
- Improve error message format (remove escaped quotes for better readability)

This ensures proper error routing:
- Duplicate host ID errors return HTTP 400 (invalid_input)
- Connection failures return HTTP 500 (server_error)
- Error messages are user-friendly and consistent with API patterns
@moizpgedge moizpgedge force-pushed the Fix/PLAT-205/get-join-options-endpoint-should-validate-incoming-host-ID-uniqueness branch from 39a8253 to e5e9833 Compare December 15, 2025 18:30
@moizpgedge moizpgedge marked this pull request as ready for review December 15, 2025 18:55
@moizpgedge moizpgedge merged commit f342675 into main Dec 15, 2025
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants