-
Notifications
You must be signed in to change notification settings - Fork 10
feat: cluster API support for Verda Cloud Python SDK #70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
25 commits
Select commit
Hold shift + click to select a range
0d0e7ab
Add Clusters API wrapper
claude d6b9918
feat: clusters api
814d02e
fix: unit test
99668e8
fix: polishing
7f86615
fix: format, lint and unit test fixing
471e089
fix: integration tests
f009010
fix: review fixes
d2c3a04
fix: full features cluster example, add integration test
f5c275c
fix: unit tests
7f9a1c5
fix: revert to the correct OS images in prod
0a94fab
remove "scale" verb
shamrin 4f52e85
calling public API every 2 seconds is too much
shamrin f57532b
add TODO comment about backoff logic reuse
shamrin 1e53597
keyword-only args, optional description, order to match instances.create
shamrin f65b848
TODO comment in _instances module
shamrin b4c5e67
import ClusterStatus directly
shamrin d3a6203
use ClusterStatus.RINNING
shamrin b6c08c8
Update examples/clusters_example.py
shamrin a9e3e60
clean cluster example
shamrin c3ddff6
use isinstance
shamrin 1c6ffd3
do not send not yet supported 'delete' action for multiple clusters
shamrin 7597d89
make ruff happy
shamrin 37f9c32
remove dummy function
shamrin 09e138b
remove unneeded returns
shamrin e34d36b
remove unnecessary url variable
shamrin File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,137 @@ | ||
| """ | ||
| Example demonstrating how to use the Clusters API. | ||
|
|
||
| This example shows how to: | ||
| - Create a new compute cluster | ||
| - List all clusters | ||
| - Get a specific cluster by ID | ||
| - Get cluster nodes | ||
| - Delete a cluster | ||
| """ | ||
|
|
||
| import os | ||
| import time | ||
|
|
||
| from verda import VerdaClient | ||
| from verda.constants import Actions, ClusterStatus, Locations | ||
|
|
||
| # Get credentials from environment variables | ||
| CLIENT_ID = os.environ.get('VERDA_CLIENT_ID') | ||
| CLIENT_SECRET = os.environ.get('VERDA_CLIENT_SECRET') | ||
| BASE_URL = os.environ.get('VERDA_BASE_URL', 'https://api.verda.com/v1') | ||
|
|
||
| # Create client | ||
| verda = VerdaClient(CLIENT_ID, CLIENT_SECRET, base_url=BASE_URL) | ||
|
|
||
|
|
||
| def create_cluster_example(): | ||
| """Create a new compute cluster.""" | ||
| # Get SSH keys | ||
| ssh_keys = [key.id for key in verda.ssh_keys.get()] | ||
|
|
||
| # Check if cluster type is available | ||
| if not verda.clusters.is_available('16B200', Locations.FIN_03): | ||
| raise ValueError('Cluster type 16B200 is not available in FIN_03') | ||
|
|
||
| # Get available images for cluster type | ||
| images = verda.clusters.get_cluster_images('16B200') | ||
| if 'ubuntu-22.04-cuda-12.9-cluster' not in images: | ||
| raise ValueError('Ubuntu 22.04 CUDA 12.9 cluster image is not supported for 16B200') | ||
|
|
||
| # Create a 16B200 cluster | ||
| cluster = verda.clusters.create( | ||
| hostname='my-compute-cluster', | ||
| cluster_type='16B200', | ||
| image='ubuntu-22.04-cuda-12.9-cluster', | ||
| description='Example compute cluster for distributed training', | ||
| ssh_key_ids=ssh_keys, | ||
| location=Locations.FIN_03, | ||
| shared_volume_name='my-shared-volume', | ||
| shared_volume_size=30000, | ||
| wait_for_status=None, | ||
| ) | ||
|
|
||
| print(f'Creating cluster: {cluster.id}') | ||
| print(f'Cluster hostname: {cluster.hostname}') | ||
| print(f'Cluster status: {cluster.status}') | ||
| print(f'Cluster cluster_type: {cluster.cluster_type}') | ||
| print(f'Location: {cluster.location}') | ||
|
|
||
| # Wait for cluster to enter RUNNING status | ||
| while cluster.status != ClusterStatus.RUNNING: | ||
| time.sleep(30) | ||
| print(f'Waiting for cluster to enter RUNNING status... (status: {cluster.status})') | ||
| cluster = verda.clusters.get_by_id(cluster.id) | ||
|
|
||
| print(f'Public IP: {cluster.ip}') | ||
| print('Cluster is now running and ready to use!') | ||
|
|
||
| return cluster | ||
|
|
||
|
|
||
| def list_clusters_example(): | ||
| """List all clusters.""" | ||
| # Get all clusters | ||
| clusters = verda.clusters.get() | ||
|
|
||
| print(f'\nFound {len(clusters)} cluster(s):') | ||
| for cluster in clusters: | ||
| print( | ||
| f' - {cluster.hostname} ({cluster.id}): {cluster.status} - {len(cluster.worker_nodes)} nodes' | ||
| ) | ||
|
|
||
| # Get clusters with specific status | ||
| running_clusters = verda.clusters.get(status=ClusterStatus.RUNNING) | ||
| print(f'\nFound {len(running_clusters)} running cluster(s)') | ||
|
|
||
| return clusters | ||
|
|
||
|
|
||
| def get_cluster_by_id_example(cluster_id: str): | ||
| """Get a specific cluster by ID.""" | ||
| cluster = verda.clusters.get_by_id(cluster_id) | ||
|
|
||
| print('\nCluster details:') | ||
| print(f' ID: {cluster.id}') | ||
| print(f' Name: {cluster.hostname}') | ||
| print(f' Description: {cluster.description}') | ||
| print(f' Status: {cluster.status}') | ||
| print(f' Cluster type: {cluster.cluster_type}') | ||
| print(f' Created at: {cluster.created_at}') | ||
| print(f' Public IP: {cluster.ip}') | ||
| print(f' Worker nodes: {len(cluster.worker_nodes)}') | ||
|
|
||
| return cluster | ||
|
|
||
|
|
||
| def delete_cluster_example(cluster_id: str): | ||
| """Delete a cluster.""" | ||
| print(f'\nDeleting cluster {cluster_id}...') | ||
|
|
||
| verda.clusters.action(cluster_id, Actions.DELETE) | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. maybe nicer to use the |
||
|
|
||
| print('Cluster deleted successfully') | ||
|
|
||
|
|
||
| def main(): | ||
| """Run all cluster examples.""" | ||
| print('=== Clusters API Example ===\n') | ||
|
|
||
| print('Creating a new cluster...') | ||
| cluster = create_cluster_example() | ||
| cluster_id = cluster.id | ||
|
|
||
| print('\nListing all clusters...') | ||
| list_clusters_example() | ||
|
|
||
| print('\nGetting cluster details...') | ||
| get_cluster_by_id_example(cluster_id) | ||
|
|
||
| print('\nDeleting the cluster...') | ||
| delete_cluster_example(cluster_id) | ||
|
|
||
| print('\n=== Example completed successfully ===') | ||
|
|
||
|
|
||
| if __name__ == '__main__': | ||
| main() | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,69 @@ | ||
| import logging | ||
| import os | ||
|
|
||
| import pytest | ||
|
|
||
| from verda import VerdaClient | ||
| from verda.constants import Locations | ||
|
|
||
| logging.basicConfig(level=logging.DEBUG) | ||
| logger = logging.getLogger() | ||
|
|
||
|
|
||
| IN_GITHUB_ACTIONS = os.getenv('GITHUB_ACTIONS') == 'true' | ||
|
|
||
|
|
||
| @pytest.mark.skipif(IN_GITHUB_ACTIONS, reason="Test doesn't work in Github Actions.") | ||
| @pytest.mark.withoutresponses | ||
| class TestClusters: | ||
| def test_create_cluster(self, verda_client: VerdaClient): | ||
| # get ssh key | ||
| ssh_key = verda_client.ssh_keys.get()[0] | ||
|
|
||
| if not verda_client.clusters.is_available('16B200', Locations.FIN_03): | ||
| raise ValueError('Cluster type 16B200 is not available in FIN_03') | ||
| logger.debug('[x] Cluster type 16B200 is available in FIN_03') | ||
|
|
||
| availabilities = verda_client.clusters.get_availabilities(Locations.FIN_03) | ||
| assert len(availabilities) > 0 | ||
| assert '16B200' in availabilities | ||
| logger.debug( | ||
| '[x] Cluster type 16B200 is one of the available cluster types in FIN_03: %s', | ||
| availabilities, | ||
| ) | ||
|
|
||
| images = verda_client.clusters.get_cluster_images('16B200') | ||
| assert len(images) > 0 | ||
| assert 'ubuntu-22.04-cuda-12.9-cluster' in images | ||
| logger.debug('[x] Ubuntu 22.04 CUDA 12.9 cluster image is supported for 16B200') | ||
|
|
||
| # create instance | ||
| cluster = verda_client.clusters.create( | ||
| hostname='test-instance', | ||
| location=Locations.FIN_03, | ||
| cluster_type='16B200', | ||
| description='test instance', | ||
| image='ubuntu-22.04-cuda-12.9-cluster', | ||
| ssh_key_ids=[ssh_key.id], | ||
| # Set to None to not wait for provisioning but return immediately | ||
| wait_for_status=verda_client.constants.cluster_status.PROVISIONING, | ||
| ) | ||
|
|
||
| # assert instance is created | ||
| assert cluster.id is not None | ||
| assert ( | ||
| cluster.status == verda_client.constants.cluster_status.PROVISIONING | ||
| or cluster.status == verda_client.constants.cluster_status.RUNNING | ||
| ) | ||
|
|
||
| # If still provisioning, we don't have worker nodes yet and ip is not available | ||
| if cluster.status != verda_client.constants.instance_status.PROVISIONING: | ||
huksley marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| assert cluster.worker_nodes is not None | ||
| assert len(cluster.worker_nodes) == 2 | ||
| assert cluster.ip is not None | ||
|
|
||
| # Now we need to wait for RUNNING status to connect to the jumphost (public IP is available) | ||
| # After that, we can connect to the jumphost and run commands on the cluster nodes: | ||
| # | ||
| # ssh -i ssh_key.pem root@<public_ip> | ||
| # | ||
Empty file.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.