From f0684cbf849464023d1d2e3246f4dc9c27c35fab Mon Sep 17 00:00:00 2001
From: "pr-test1[bot]" <226697212+pr-test1[bot]@users.noreply.github.com>
Date: Tue, 16 Sep 2025 13:32:25 +0000
Subject: [PATCH 01/18] docs: sync CONTRIBUTING.md with latest code

---
 CONTRIBUTING.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index a70cce7..5a70e59 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -65,4 +65,4 @@ By contributing, you agree that your contributions will be licensed under the Ap
 
 ## Questions?
 
-Feel free to contact us at [support@slashml.com](mailto:support@slashml.com) if you have any questions about contributing!
\ No newline at end of file
+Feel free to contact us at [support@slashml.com](mailto:support@slashml.com) if you have any questions about contributing!

From 501414edbc32bbcc6c6060ad5be0b6a5969815ba Mon Sep 17 00:00:00 2001
From: "pr-test1[bot]" <226697212+pr-test1[bot]@users.noreply.github.com>
Date: Tue, 16 Sep 2025 13:32:26 +0000
Subject: [PATCH 02/18] docs: sync about.mdx with latest code

---
 about.mdx | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/about.mdx b/about.mdx
index d9c04a4..a0c0a60 100644
--- a/about.mdx
+++ b/about.mdx
@@ -22,9 +22,9 @@ Do submit your feature requests at https://magemaker.featurebase.app/
 - Querying within Magemaker currently only works with text-based models
 - Deleting a model is not instant, it may show up briefly after deletion
 - Deploying the same model within the same minute will break
-- Hugging-face models on Azure have different Ids than their Hugging-face counterparts. Follow the steps specified in the quick-start guide to find the relevant models
-- For Azure deploying models other than Hugging-face is not supported yet. 
-- Python3.13 is not supported because of an open-issue by Azure. https://github.com/Azure/azure-sdk-for-python/issues/37600
+- Hugging Face models on Azure have different IDs than their Hugging Face counterparts. Follow the steps specified in the quick-start guide to find the relevant models.
+- For Azure, deploying models other than Hugging Face is not supported yet.
+- Python 3.12 is **not** supported because of an open issue in the Azure SDK (see https://github.com/Azure/azure-sdk-for-python/issues/37600).
 
 
 If there is anything we missed, do point them out at https://magemaker.featurebase.app/

From e5c2fd9def0fc42e4afe0b55e44e7c96a9ae60f8 Mon Sep 17 00:00:00 2001
From: "pr-test1[bot]" <226697212+pr-test1[bot]@users.noreply.github.com>
Date: Tue, 16 Sep 2025 13:32:28 +0000
Subject: [PATCH 03/18] docs: sync concepts/contributing.mdx with latest code

---
 concepts/contributing.mdx | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/concepts/contributing.mdx b/concepts/contributing.mdx
index 8c61908..6779ff4 100644
--- a/concepts/contributing.mdx
+++ b/concepts/contributing.mdx
@@ -165,4 +165,4 @@ We are committed to providing a welcoming and inclusive experience for everyone.
 
 ## License
 
-By contributing to Magemaker, you agree that your contributions will be licensed under the Apache 2.0 License.
\ No newline at end of file
+By contributing to Magemaker, you agree that your contributions will be licensed under the Apache 2.0 License.

From 43661e018644aeb566fae31fcbc8b95fd92607d1 Mon Sep 17 00:00:00 2001
From: "pr-test1[bot]" <226697212+pr-test1[bot]@users.noreply.github.com>
Date: Tue, 16 Sep 2025 13:32:30 +0000
Subject: [PATCH 04/18] docs: sync concepts/deployment.mdx with latest code

---
 concepts/deployment.mdx | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/concepts/deployment.mdx b/concepts/deployment.mdx
index 66ca7a9..e75578d 100644
--- a/concepts/deployment.mdx
+++ b/concepts/deployment.mdx
@@ -62,8 +62,8 @@ deployment: !Deployment
   destination: gcp
   endpoint_name: opt-125m-gcp
   instance_count: 1
-  machine_type: n1-standard-4
-  accelerator_type: NVIDIA_TESLA_T4
+  instance_type: n1-standard-4
+  accelerator_type: NVIDIA_TESLA_T4   # or NVIDIA_L4
   accelerator_count: 1
 
 models:
@@ -113,6 +113,7 @@ deployment: !Deployment
   instance_count: 1
   instance_type: ml.g5.12xlarge
   num_gpus: 4
+  # quantization: bitsandbytes   # Optional
 
 models:
   - !Model
@@ -202,10 +203,9 @@ Choose your instance type based on your model's requirements:
 4. Set up monitoring and alerting for your endpoints
 
 <Warning>
-Make sure you setup budget monitory and alerts to avoid unexpected charges.
+Make sure you set up budget monitoring and alerts to avoid unexpected charges.
 </Warning>
 
-
 ## Troubleshooting Deployments
 
 Common issues and their solutions:
@@ -225,4 +225,4 @@ Common issues and their solutions:
    - Verify model ID and version
    - Check instance memory requirements
    - Validate Hugging Face token if required
-   - Endpoing deployed but deployment failed. Check the logs, and do report this to us if you see this issue.
+   - Endpoint deployed but deployment failed. Check the logs, and report this to us if you see this issue.

From 7b66f5543e1405bd5f6ae53eba949dd01ad80b9b Mon Sep 17 00:00:00 2001
From: "pr-test1[bot]" <226697212+pr-test1[bot]@users.noreply.github.com>
Date: Tue, 16 Sep 2025 13:32:31 +0000
Subject: [PATCH 05/18] docs: sync concepts/fine-tuning.mdx with latest code

---
 concepts/fine-tuning.mdx | 115 ++++++++++++++++++++++++++-------------
 1 file changed, 76 insertions(+), 39 deletions(-)

diff --git a/concepts/fine-tuning.mdx b/concepts/fine-tuning.mdx
index 88835aa..69fa845 100644
--- a/concepts/fine-tuning.mdx
+++ b/concepts/fine-tuning.mdx
@@ -5,7 +5,7 @@ description: Guide to fine-tuning models with Magemaker
 
 ## Fine-tuning Overview
 
-Fine-tuning allows you to adapt pre-trained models to your specific use case. Magemaker simplifies this process through YAML configuration.
+Fine-tuning allows you to adapt pre-trained models to your specific use case. Currently, Magemaker supports fine-tuning **on AWS SageMaker only** (GCP and Azure fine-tuning are on the roadmap). The workflow is entirely YAML-driven so it can be automated in CI/CD pipelines.
 
 ### Basic Command
 
@@ -19,19 +19,25 @@ magemaker --train .magemaker_config/train-config.yaml
 
 ```yaml
 training: !Training
-  destination: aws
-  instance_type: ml.p3.2xlarge
+  destination: aws                 # only "aws" is supported for now
+  instance_type: ml.p3.2xlarge      # GPU instance for training
   instance_count: 1
-  training_input_path: s3://your-bucket/training-data.csv
+  training_input_path: s3://your-bucket/training-data.csv  # points to your dataset
 
 models:
-- !Model
-  id: your-model-id
-  source: huggingface
+  - !Model
+    id: your-model-id
+    source: huggingface
 ```
 
+• **destination** – must be `aws` at the moment.  
+• **training_input_path** – S3 URI that Magemaker will pass directly to SageMaker.  
+• **instance_type / instance_count** – any SageMaker training instance type is supported.
+
 ### Advanced Configuration
 
+Beyond the basics, you can supply custom hyperparameters. If omitted, Magemaker will attempt to infer sensible defaults based on the model family (see `get_hyperparameters_for_model()` in the codebase).
+
 ```yaml
 training: !Training
   destination: aws
@@ -49,20 +55,38 @@ training: !Training
     save_steps: 1000
 ```
 
+<Note>
+If you omit `hyperparameters`, Magemaker will fall back to task-specific defaults. For example, text-generation models automatically receive the hyperparameters returned by `get_hyperparameters_for_model()`.
+</Note>
+
+### Optional Parameters
+
+The `Training` schema also supports the following optional fields (all are optional and have sensible defaults):
+
+| Field                | Description                                                               |
+| -------------------- | ------------------------------------------------------------------------- |
+| `output_path`        | S3 URI where training artifacts should be stored                          |
+| `max_run`            | Maximum training job runtime in seconds                                   |
+| `volume_size_in_gb`  | Size of the EBS volume attached to the training instance                  |
+| `spot`               | `true/false` – use SageMaker Spot Training to save costs                  |
+| `checkpoint_s3_uri`  | S3 URI for incremental checkpoints (only relevant if `spot: true`)        |
+
+*(See the `Training` Pydantic model in `magemaker/schemas/training.py` for the full list.)*
+
 ## Data Preparation
 
 ### Supported Formats
 
 <CardGroup>
   <Card title="CSV Format" icon="file-csv">
-    - Simple tabular data
-    - Easy to prepare
+    - Simple tabular data<br />
+    - Easy to prepare<br />
     - Good for classification tasks
   </Card>
   
   <Card title="JSON Lines" icon="file-code">
-    - Flexible data format
-    - Good for complex inputs
+    - Flexible data format<br />
+    - Good for complex inputs<br />
     - Supports nested structures
   </Card>
 </CardGroup>
@@ -71,48 +95,43 @@ training: !Training
 
 <Steps>
   <Step title="Prepare Data">
-    Format your data according to model requirements
+    Format your dataset according to the model requirements (e.g., one JSON line per training example).
   </Step>
   <Step title="Upload to S3">
-    Use AWS CLI or console to upload data
+    Use the AWS CLI or console to upload your dataset:
+    ```bash
+    aws s3 cp local/path/to/data.csv s3://your-bucket/data.csv
+    ```
   </Step>
   <Step title="Configure Path">
-    Specify S3 path in training configuration
+    Reference the S3 URI in `training_input_path` of your YAML file.
   </Step>
 </Steps>
 
 ## Instance Selection
 
-### Training Instance Types
-
-Choose based on:
-- Dataset size
-- Model size
-- Training time requirements
-- Cost constraints
+Choosing the right training instance impacts both training time and cost.
 
 Popular choices:
-- ml.p3.2xlarge (1 GPU)
-- ml.p3.8xlarge (4 GPUs)
-- ml.p3.16xlarge (8 GPUs)
 
-## Hyperparameter Tuning
+| Instance            | GPUs | Typical use-case                              |
+| ------------------- | ---- | -------------------------------------------- |
+| `ml.p3.2xlarge`     | 1    | Small to medium models (<7B parameters)       |
+| `ml.p3.8xlarge`     | 4    | Larger models / shorter turnaround            |
+| `ml.p3.16xlarge`    | 8    | Large-scale training / distributed workloads  |
 
-### Basic Parameters
+<Warning>
+Always check your current SageMaker GPU quota and request increases if necessary.
+</Warning>
 
-```yaml
-hyperparameters: !Hyperparameters
-  epochs: 3
-  learning_rate: 2e-5
-  batch_size: 32
-```
+## Hyperparameter Tuning
 
-### Advanced Tuning
+While you can hard-code values, you can also pass ranges to let SageMaker perform hyperparameter tuning (HPO). Use the `min`, `max`, or `values` keys as shown below:
 
 ```yaml
 hyperparameters: !Hyperparameters
   epochs: 3
-  learning_rate: 
+  learning_rate:
     min: 1e-5
     max: 1e-4
     scaling: log
@@ -122,9 +141,27 @@ hyperparameters: !Hyperparameters
 
 ## Monitoring Training
 
-### CloudWatch Metrics
+Magemaker streams CloudWatch metrics for every training job. Key metrics include:
+
+- `Train/Loss`
+- `Eval/Loss`
+- `LearningRate`
+- `GPUUtilization`
+
+You can access logs directly in the SageMaker console or via the AWS CLI:
 
-Available metrics:
-- Loss
-- Learning rate
-- GPU utilization
\ No newline at end of file
+```bash
+aws logs tail /aws/sagemaker/TrainingJobs --follow --since 1h
+```
+
+<Note>
+Job status (Started, InProgress, Completed, Failed) is also surfaced in the CLI output.
+</Note>
+
+## Cleaning Up
+
+Training jobs store artifacts (model checkpoints, logs) in S3. Delete these objects when no longer needed to avoid storage costs:
+
+```bash
+aws s3 rm --recursive s3://your-bucket/<training-job-name>
+```

From f86598719d839ced8e714b5250adf80cf4b86716 Mon Sep 17 00:00:00 2001
From: "pr-test1[bot]" <226697212+pr-test1[bot]@users.noreply.github.com>
Date: Tue, 16 Sep 2025 13:32:33 +0000
Subject: [PATCH 06/18] docs: sync concepts/models.mdx with latest code

---
 concepts/models.mdx | 231 ++++++++++++++++++--------------------------
 1 file changed, 95 insertions(+), 136 deletions(-)

diff --git a/concepts/models.mdx b/concepts/models.mdx
index 0161380..d262957 100644
--- a/concepts/models.mdx
+++ b/concepts/models.mdx
@@ -6,17 +6,21 @@ description: Guide to supported models and their requirements
 ## Supported Models
 
 <Note>
-Currently, Magemaker supports deployment of Hugging Face models only. Support for cloud provider marketplace models is coming soon!
+Magemaker now supports three model sources on AWS SageMaker:
+1. Hugging Face models (previously documented)
+2. AWS JumpStart Marketplace models *(new)*
+3. **Custom models** that you package yourself as a `.tar.gz` or directory and upload to S3 *(new)*
+
+On GCP Vertex AI and Azure ML only Hugging Face models are supported for now.
 </Note>
 
-### Hugging Face Models
+### 1. Hugging Face Models
 
 <CardGroup>
   <Card title="Text Generation" icon="pen-to-square" href="https://huggingface.co/models?pipeline_tag=text-generation">
-    - LLaMA
-    - BERT
-    - GPT-2
-    - T5
+    - LLaMA 2 / 3
+    - GPT-2 / GPT-NeoX
+    - T5 / MT0
   </Card>
   
   <Card title="Feature Extraction" icon="vector-square" href="https://huggingface.co/models?pipeline_tag=feature-extraction">
@@ -26,180 +30,135 @@ Currently, Magemaker supports deployment of Hugging Face models only. Support fo
   </Card>
 </CardGroup>
 
-### Future Support
+### 2. SageMaker JumpStart Models (AWS-only)
+JumpStart models are first-party or marketplace models that Amazon has pre-packaged.  Magemaker can deploy any model whose **model_id** appears in the [JumpStart model list](https://docs.aws.amazon.com/sagemaker/latest/dg/jumpstart-model-zoo.html).
 
-We plan to add support for the following model sources:
+Example YAML:
+```yaml
+models:
+- !Model
+  id: huggingface-text2text-flan-t5-large  # JumpStart model_id
+  source: sagemaker
 
-<CardGroup>
-  <Card title="AWS SageMaker" icon="aws">
-    Models from AWS Marketplace and SageMaker built-in algorithms
-  </Card>
-  
-  <Card title="GCP Vertex AI" icon="google">
-    Models from Vertex AI Model Garden and Foundation Models
-  </Card>
-  
-  <Card title="Azure ML" icon="microsoft">
-    Models from Azure ML Model Catalog and Azure OpenAI
-  </Card>
-</CardGroup>
-## Model Requirements
+deployment: !Deployment
+  destination: aws
+  instance_type: ml.g5.2xlarge
+  instance_count: 1
+```
+See the new page [JumpStart & Custom Models](/concepts/jumpstart-custom-models) for a step-by-step guide.
 
-### Instance Type Recommendations by Cloud Provider
+### 3. Custom Models (AWS-only)
+If you have a fine-tuned model saved locally or on S3 you can deploy it by setting `location` on the `!Model` block.  Magemaker will automatically upload local folders/archives to S3.
+
+```yaml
+models:
+- !Model
+  id: my-custom-bert
+  source: huggingface   # still HF architecture
+  location: ./model-artifacts/bert-fine-tuned.tar.gz  # local path or s3:// bucket
+
+deployment: !Deployment
+  destination: aws
+  instance_type: ml.m5.xlarge
+  instance_count: 1
+```
+
+### Roadmap
+We plan to add first-class marketplace support on GCP (Model Garden) and Azure ML (Model Catalog / Azure OpenAI).
+
+---
+## Instance-Type Recommendations by Cloud Provider
 
 #### AWS SageMaker
-1. **Small Models** (ml.m5.xlarge)
-   ```yaml
-   instance_type: ml.m5.xlarge
-   ```
-2. **Medium Models** (ml.g4dn.xlarge)
-   ```yaml
-   instance_type: ml.g4dn.xlarge
-   ```
-3. **Large Models** (ml.g5.12xlarge)
-   ```yaml
-   instance_type: ml.g5.12xlarge
-   num_gpus: 4
-   ```
+1. **Small** – `ml.m5.xlarge`  
+2. **Medium (GPU)** – `ml.g4dn.xlarge`  
+3. **Large LLM** – `ml.g5.12xlarge` (set `num_gpus: 4`)
 
 #### GCP Vertex AI
-1. **Small Models** (n1-standard-4)
-   ```yaml
-   machine_type: n1-standard-4
-   ```
-2. **Medium Models** (n1-standard-8 + GPU)
-   ```yaml
-   machine_type: n1-standard-8
-   accelerator_type: NVIDIA_TESLA_T4
-   accelerator_count: 1
-   ```
-3. **Large Models** (a2-highgpu-1g)
-   ```yaml
-   machine_type: a2-highgpu-1g
-   ```
+1. **Small** – `n1-standard-4`  
+2. **Medium (GPU)** – `n1-standard-8` + `NVIDIA_TESLA_T4`  
+3. **Large LLM** – `a2-highgpu-1g`
 
 #### Azure ML
-1. **Small Models** (Standard_DS3_v2)
-   ```yaml
-   instance_type: Standard_DS3_v2
-   ```
-2. **Medium Models** (Standard_NC6s_v3)
-   ```yaml
-   instance_type: Standard_NC6s_v3
-   ```
-3. **Large Models** (Standard_ND40rs_v2)
-   ```yaml
-   instance_type: Standard_ND40rs_v2
-   ```
+1. **Small** – `Standard_DS3_v2`  
+2. **Medium (GPU)** – `Standard_NC6s_v3`  
+3. **Large LLM** – `Standard_ND40rs_v2`
 
+---
 ## Example Deployments
 
-### Example Hugging Face Model Deployment
-
-Deploy the same Hugging Face model to different cloud providers:
-
-AWS SageMaker:
+### Hugging Face across Clouds
 ```yaml
 models:
 - !Model
   id: facebook/opt-125m
   source: huggingface
+
 deployment: !Deployment
-  destination: aws
+  destination: aws   # gcp | azure
 ```
 
-GCP Vertex AI:
+### JumpStart (AWS-only)
 ```yaml
 models:
 - !Model
-  id: facebook/opt-125m
-  source: huggingface
+  id: huggingface-text2text-flan-t5-large
+  source: sagemaker
+
 deployment: !Deployment
-  destination: gcp
+  destination: aws
+  instance_type: ml.g5.2xlarge
 ```
 
-Azure ML:
+### Custom S3 Model (AWS-only)
 ```yaml
 models:
 - !Model
-  id: facebook-opt-125m
+  id: my-distilbert-finetuned
   source: huggingface
+  location: s3://my-bucket/models/distilbert.tar.gz
+
 deployment: !Deployment
-  destination: azure
+  destination: aws
+  instance_type: ml.m5.xlarge
 ```
 
 <Note>
-  The model ids for Azure are different from AWS and GCP. Make sure to use the one provided by Azure in the Azure Model Catalog. 
-
-  To find the relevnt model id, follow the following steps
-  <Steps>
-    <Step title="Go to your workpsace studio">
-      Find the workpsace in the Azure portal and click on the studio url provided. Click on the `Model Catalog` on the left side bar
-    	![Azure ML Creation](../Images/workspace-studio.png)
-    </Step>
-
-      <Step title="Select Hugging Face in the Collections List">
-      Select Hugging-Face from the collections list. The id of the model card is the id you need to use in the yaml file
-    	![Azure ML Creation](../Images/hugging-face.png)
-    </Step>
-
-  </Steps>
+Azure model IDs differ from Hugging Face.
+Follow the steps in the [Quick Start › Azure section](/quick-start#azure-ml) to retrieve the correct ID from the Azure Model Catalog.
 </Note>
 
-
-## Model Configuration
+---
+## Model Block Reference
 
 ### Basic Parameters
-
 ```yaml
-models:
 - !Model
-  id: your-model-id
-  source: huggingface|sagemaker # we don't support vertex and azure specific models yet
-  revision: latest  # Optional: specify model version
+  id: model-id
+  source: huggingface | sagemaker
+  location: <local-path | s3://...>  # optional, for custom models only
+  revision: latest                   # optional
 ```
 
-### Advanced Parameters
-
+### Advanced Inference Parameters (Text-generation)
 ```yaml
-models:
-- !Model
-  id: your-model-id
-  source: huggingface
-  predict:
-    temperature: 0.7
-    top_p: 0.9
-    top_k: 50
-    max_new_tokens: 500
-    do_sample: true
+predict:
+  temperature: 0.7
+  top_p: 0.9
+  top_k: 50
+  max_new_tokens: 500
+  do_sample: true
 ```
 
-## Best Practices
-
-1. **Model Selection**
-   - Compare pricing across cloud providers
-   - Consider data residency requirements
-   - Test latency from different regions
-
-3. **Cost Management**
-   - Compare instance pricing
-   - Make sure you set up the relevant alerting
-
-## Troubleshooting
-
-Common model-related issues:
-
-1. **Cloud-Specific Issues**
-   - Check quota limits
-   - Verify regional availability
-   - Review cloud-specific logs
+---
+## Best Practices & Troubleshooting
 
-2. **Performance Issues**
-   - Compare cross-cloud latencies
-   - Check network connectivity
-   - Monitor resource utilization
+1. Match model size to instance RAM/GPU.
+2. Start with the cheapest instance type that fits and scale up.
+3. For gated models (e.g. Llama 3) set `HUGGING_FACE_HUB_KEY` in `.env`.
+4. Monitor endpoint cost: stop or delete unused endpoints.
 
-3. **Authentication Issues**
-   - Verify cloud credentials
-   - Check model access permissions
-   - Validate API keys
\ No newline at end of file
+If deployment fails:
+- Check CloudWatch / Vertex AI / Azure logs for OOM errors.
+- Verify your region has quota for the selected instance/GPU.
+- Re-deploy with a unique `endpoint_name` (Magemaker auto-suffixes when omitted).

From 5cb4781b9c3fc91e871c922b4ee6024dd9449220 Mon Sep 17 00:00:00 2001
From: "pr-test1[bot]" <226697212+pr-test1[bot]@users.noreply.github.com>
Date: Tue, 16 Sep 2025 13:32:34 +0000
Subject: [PATCH 07/18] docs: sync configuration/AWS.mdx with latest code

---
 configuration/AWS.mdx | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/configuration/AWS.mdx b/configuration/AWS.mdx
index cdc4b9f..5fa355b 100644
--- a/configuration/AWS.mdx
+++ b/configuration/AWS.mdx
@@ -50,6 +50,7 @@ You should see the following screen after clicking IAM.
    - `AmazonSagemakerFullAccess`
    - `IAMFullAccess`
    - `ServiceQuotasFullAccess`
+   - `AmazonS3FullAccess` <!-- NEW: required for uploading custom artifacts & training data -->
 
 Then click Next.
 
@@ -59,6 +60,10 @@ The final list should look like the following:
 
 ![Enter image alt description](../Images/Dfp_Image_7.png)
 
+<Note>
+`AmazonS3FullAccess` is needed because Magemaker automatically uploads training data and custom model artefacts to S3 during deployments and fine-tuning jobs.
+</Note>
+
 Click "Create user" on the following screen.
 </Step>
 

From bfd40222130be6249cef39c5dbc9e344a380f689 Mon Sep 17 00:00:00 2001
From: "pr-test1[bot]" <226697212+pr-test1[bot]@users.noreply.github.com>
Date: Tue, 16 Sep 2025 13:32:36 +0000
Subject: [PATCH 08/18] docs: sync configuration/Azure.mdx with latest code

---
 configuration/Azure.mdx | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/configuration/Azure.mdx b/configuration/Azure.mdx
index 3c1104a..85531e8 100644
--- a/configuration/Azure.mdx
+++ b/configuration/Azure.mdx
@@ -81,4 +81,4 @@ From the Azure portal
    </Note>
 
   </Step>
-</Steps>
+</Steps>
\ No newline at end of file

From 6e920f2b9dc19f01188c875da1ec72620623a666 Mon Sep 17 00:00:00 2001
From: "pr-test1[bot]" <226697212+pr-test1[bot]@users.noreply.github.com>
Date: Tue, 16 Sep 2025 13:32:38 +0000
Subject: [PATCH 09/18] docs: sync configuration/Environment.mdx with latest
 code

---
 configuration/Environment.mdx | 63 +++++++++++++++++++++++------------
 1 file changed, 42 insertions(+), 21 deletions(-)

diff --git a/configuration/Environment.mdx b/configuration/Environment.mdx
index 0781ec3..2dabc7d 100644
--- a/configuration/Environment.mdx
+++ b/configuration/Environment.mdx
@@ -2,29 +2,50 @@
 title: Environment Variables
 ---
 
-### Required Config File
-A `.env` file is automatically created when you run `magemaker --cloud <cloud-provider>`. This file contains the necessary environment variables for your cloud provider(s).
+### Auto-generated `.env`
+A `.env` file is automatically created the first time you run one of the configuration commands:
 
-By default, Magemaker will look for a `.env` file in your project root with the following variables based on which cloud provider(s) you plan to use:
+```bash
+magemaker --cloud [aws|gcp|azure|all]
+```
+
+The wizard writes the credentials you supply and any cloud-specific defaults.  You can edit this file later or set the variables in your CI system / shell-profile instead.
+
+### Core variables recognised by Magemaker
 
 ```bash
-# AWS Configuration
-AWS_ACCESS_KEY_ID="your-access-key"      # Required for AWS
-AWS_SECRET_ACCESS_KEY="your-secret-key"   # Required for AWS
-SAGEMAKER_ROLE="arn:aws:iam::..."        # Required for AWS
-
-# GCP Configuration
-PROJECT_ID="your-project-id"             # Required for GCP
-GCLOUD_REGION="us-central1"              # Required for GCP
-
-# Azure Configuration
-AZURE_SUBSCRIPTION_ID="your-sub-id"      # Required for Azure
-AZURE_RESOURCE_GROUP="ml-resources"      # Required for Azure
-AZURE_WORKSPACE_NAME="ml-workspace"      # Required for Azure
-AZURE_REGION="eastus"                    # Required for Azure
-
-# Optional configurations
-HUGGING_FACE_HUB_KEY="your-hf-token"    # Required for gated HF models like llama
+# ────────────────────────  AWS  ────────────────────────
+AWS_ACCESS_KEY_ID="xxx"          # Required – programmatic IAM user key
+AWS_SECRET_ACCESS_KEY="xxx"     # Required – matching secret key
+AWS_REGION="us-east-1"           # Optional – defaults to us-west-2 if unset
+SAGEMAKER_ROLE="arn:aws:iam::…"  # Required – execution role used by endpoints
+
+# ────────────────────────  GCP  ────────────────────────
+PROJECT_ID="my-gcp-project"      # Required – Google Cloud project ID
+GCLOUD_REGION="us-central1"      # Required – Vertex AI region
+
+# ───────────────────────  Azure  ───────────────────────
+AZURE_SUBSCRIPTION_ID="xxx"      # Required – subscription that hosts the workspace
+AZURE_RESOURCE_GROUP="ml-rg"      # Required – resource-group name
+AZURE_WORKSPACE_NAME="ml-ws"      # Required – Azure ML workspace
+AZURE_REGION="eastus"             # Required – region where the workspace lives
+
+# ─────────────────────── Optional  ──────────────────────
+HUGGING_FACE_HUB_KEY="hf_xxx"     # Required for gated HF models (e.g. Llama-3)
+CONFIG_DIR=".magemaker_configs"    # Override default location for YAML artefacts
 ```
 
-<Warning>Never commit your .env file to version control!</Warning>
+<Warning>
+Never commit your `.env` file to version control or share it publicly!
+</Warning>
+
+#### Notes
+1. **AWS_REGION** – Magemaker will fall back to `us-west-2` if the variable is absent, but setting it explicitly prevents surprises when your SageMaker quotas exist in a different region.
+2. **CONFIG_DIR** – If you prefer to store deployment YAMLs outside the project root, set this variable before running Magemaker.  Any new endpoint YAMLs will be written there.
+3. **Execution role permissions** – The IAM role referenced by `SAGEMAKER_ROLE` must include:
+   * `AmazonSageMakerFullAccess`
+   * `IAMFullAccess`
+   * `ServiceQuotasFullAccess`
+   * `AmazonS3FullAccess` (required for custom-model uploads)
+
+For a complete walk-through of cloud-specific setup, see the dedicated guides under **Configuration** in the sidebar.

From 584be61dc54807aa88b9d2a2808fbca741eeba7a Mon Sep 17 00:00:00 2001
From: "pr-test1[bot]" <226697212+pr-test1[bot]@users.noreply.github.com>
Date: Tue, 16 Sep 2025 13:32:39 +0000
Subject: [PATCH 10/18] docs: sync getting_started.md with latest code

---
 getting_started.md | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/getting_started.md b/getting_started.md
index 0bc86fa..4e39a49 100644
--- a/getting_started.md
+++ b/getting_started.md
@@ -16,7 +16,7 @@ To get a local copy up and running follow these simple steps.
 
 ### Prerequisites
 
-* Python 3.11 (3.13 is not supported because of azure)
+* Python 3.11+ **(Python 3.12 is currently unsupported due to an Azure SDK issue)**
 * Cloud Configuration
     * An account to your preferred cloud provider, AWS, GCP and Azure.
         * Each cloud requires slightly different accesses, Magemaker will guide you through getting the necessary credentials to the selected cloud provider
@@ -178,7 +178,6 @@ If you’re using the `ml.m5.xlarge` instance type, here are some small Hugging
 <br>
 
 
-
 ## Deactivating Models
 
 Any model endpoints you spin up will run continuously unless you deactivate them! Make sure to delete endpoints you’re no longer using so you don’t keep getting charged for your SageMaker instance.

From 159d91a83a82b0d8f829138853e7c454dcde8345 Mon Sep 17 00:00:00 2001
From: "pr-test1[bot]" <226697212+pr-test1[bot]@users.noreply.github.com>
Date: Tue, 16 Sep 2025 13:32:41 +0000
Subject: [PATCH 11/18] docs: sync installation.mdx with latest code

---
 installation.mdx | 40 ++++++++++++++++++++++------------------
 1 file changed, 22 insertions(+), 18 deletions(-)

diff --git a/installation.mdx b/installation.mdx
index 1d843eb..69ac2b6 100644
--- a/installation.mdx
+++ b/installation.mdx
@@ -5,7 +5,9 @@ description: Configure Magemaker for your cloud provider
 
 
 <Note>
-  For Macs, maxOS >= 13.6.6 is required. Apply Silicon devices (M1) must use Rosetta terminal. You can verify, your terminals architecture by running `arch`. It should print `i386` for Rosetta terminal.
+  For Macs, macOS \>= 13.6.6 is required. Apple Silicon devices (M-series) must use a Rosetta terminal. You can verify your terminal’s architecture by running `arch`. It should print `i386` for a Rosetta terminal.
+  
+  Magemaker currently supports Python 3.11 and 3.10; Python 3.12 is **not** supported due to an upstream Azure SDK issue.
 </Note>
 
 
@@ -29,7 +31,7 @@ Once you have your AWS credentials, you can configure Magemaker by running:
 magemaker --cloud aws
 ```
 
-It will prompt you for aws credentials and set up the necessary configurations.
+It will prompt you for AWS credentials and set up the necessary configurations.
 
 
 ### GCP (Vertex AI) Configuration
@@ -38,7 +40,7 @@ It will prompt you for aws credentials and set up the necessary configurations.
   [GCP Setup Guide](/configuration/GCP)
 
 
-once you have your GCP credentials, you can configure Magemaker by running:
+Once you have your GCP credentials, you can configure Magemaker by running:
 
 ```bash
 magemaker --cloud gcp
@@ -47,7 +49,7 @@ magemaker --cloud gcp
 ### Azure Configuration
 
 - Follow this detailed guide for setting up Azure credentials:
-  [GCP Setup Guide](/configuration/Azure)
+  [Azure Setup Guide](/configuration/Azure)
 
 
 Once you have your Azure credentials, you can configure Magemaker by running:
@@ -72,28 +74,30 @@ By default, Magemaker will look for a `.env` file in your project root with the
 
 ```bash
 # AWS Configuration
-AWS_ACCESS_KEY_ID="your-access-key"      # Required for AWS
-AWS_SECRET_ACCESS_KEY="your-secret-key"   # Required for AWS
-SAGEMAKER_ROLE="arn:aws:iam::..."        # Required for AWS
+AWS_ACCESS_KEY_ID="your-access-key"        # Required for AWS
+AWS_SECRET_ACCESS_KEY="your-secret-key"     # Required for AWS
+AWS_REGION="us-east-1"                     # Optional (defaults to us-east-1)
+SAGEMAKER_ROLE="arn:aws:iam::..."           # Required for AWS
 
 # GCP Configuration
-PROJECT_ID="your-project-id"             # Required for GCP
-GCLOUD_REGION="us-central1"              # Required for GCP
+PROJECT_ID="your-project-id"               # Required for GCP
+GCLOUD_REGION="us-central1"                # Required for GCP
 
 # Azure Configuration
-AZURE_SUBSCRIPTION_ID="your-sub-id"      # Required for Azure
-AZURE_RESOURCE_GROUP="ml-resources"      # Required for Azure
-AZURE_WORKSPACE_NAME="ml-workspace"      # Required for Azure
-AZURE_REGION="eastus"                    # Required for Azure
-
-# Optional configurations
-HUGGING_FACE_HUB_KEY="your-hf-token"    # Required for gated HF models like llama
+AZURE_SUBSCRIPTION_ID="your-sub-id"        # Required for Azure
+AZURE_RESOURCE_GROUP="ml-resources"        # Required for Azure
+AZURE_WORKSPACE_NAME="ml-workspace"        # Required for Azure
+AZURE_REGION="eastus"                      # Required for Azure
+
+# General / Optional
+HUGGING_FACE_HUB_KEY="your-hf-token"       # Required for gated HF models like Llama
+CONFIG_DIR=".magemaker_config"             # Override default config directory if desired
 ```
 
 <Warning>Never commit your .env file to version control!</Warning>
 
 <Note>
-   For gated models like llama-3.1 from Meta, you might have to accept terms of use for model on hugging face and adding Hugging face token to the environment are necessary for deployment to go through.
+   For gated models like Llama-3 from Meta, you must accept the model’s terms of use on Hugging Face and set `HUGGING_FACE_HUB_KEY` in your environment for deployment to succeed.
 </Note>
 
 {/* ## Verification
@@ -119,7 +123,7 @@ magemaker verify
 
 3. **Security**
 
-   - Follow principle of least privilege
+   - Follow the principle of least privilege
    - Use service accounts where possible
    - Enable audit logging
 

From 67b73c874a54cf30fc8243604c44aa541d1a43ef Mon Sep 17 00:00:00 2001
From: "pr-test1[bot]" <226697212+pr-test1[bot]@users.noreply.github.com>
Date: Tue, 16 Sep 2025 13:32:42 +0000
Subject: [PATCH 12/18] docs: sync mint.json with latest code

---
 mint.json | 30 ++++++++++++++++++++++++------
 1 file changed, 24 insertions(+), 6 deletions(-)

diff --git a/mint.json b/mint.json
index ccb1843..4a70f72 100644
--- a/mint.json
+++ b/mint.json
@@ -38,9 +38,13 @@
     "mode": "auto"
   },
   "navigation": [
-    { 
+    {
       "group": "Getting Started",
-      "pages": ["about", "installation", "quick-start"]
+      "pages": [
+        "about",
+        "installation",
+        "quick-start"
+      ]
     },
     {
       "group": "Tutorials",
@@ -64,6 +68,8 @@
       "pages": [
         "concepts/deployment",
         "concepts/models",
+        "concepts/jumpstart-custom-models",
+        "concepts/openai-proxy",
         "concepts/contributing"
       ]
     }
@@ -77,15 +83,27 @@
       {
         "title": "Documentation",
         "links": [
-          { "label": "Getting Started", "url": "/" },
-          { "label": "Contributing", "url": "/contributing" }
+          {
+            "label": "Getting Started",
+            "url": "/"
+          },
+          {
+            "label": "Contributing",
+            "url": "/contributing"
+          }
         ]
       },
       {
         "title": "Resources",
         "links": [
-          { "label": "GitHub", "url": "https://github.com/slashml/magemaker" },
-          { "label": "Support", "url": "mailto:support@slashml.com" }
+          {
+            "label": "GitHub",
+            "url": "https://github.com/slashml/magemaker"
+          },
+          {
+            "label": "Support",
+            "url": "mailto:support@slashml.com"
+          }
         ]
       }
     ]

From f7fc47306513f9e229f000ec88dfb67bef428840 Mon Sep 17 00:00:00 2001
From: "pr-test1[bot]" <226697212+pr-test1[bot]@users.noreply.github.com>
Date: Tue, 16 Sep 2025 13:32:44 +0000
Subject: [PATCH 13/18] docs: sync tutorials/deploying-llama-3-to-aws.mdx with
 latest code

---
 tutorials/deploying-llama-3-to-aws.mdx | 1 -
 1 file changed, 1 deletion(-)

diff --git a/tutorials/deploying-llama-3-to-aws.mdx b/tutorials/deploying-llama-3-to-aws.mdx
index 46f0659..ac4d865 100644
--- a/tutorials/deploying-llama-3-to-aws.mdx
+++ b/tutorials/deploying-llama-3-to-aws.mdx
@@ -107,4 +107,3 @@ if __name__ == "__main__":
 ```
 ## Conclusion
 You have successfully deployed and queried Llama 3 on AWS SageMaker using Magemaker's interactive dropdown menu. For any questions or feedback, feel free to contact us at [support@slashml.com](mailto:support@slashml.com).
-

From 3545ff943a36f7568ae3f626cdca6c9be1a76e86 Mon Sep 17 00:00:00 2001
From: "pr-test1[bot]" <226697212+pr-test1[bot]@users.noreply.github.com>
Date: Tue, 16 Sep 2025 13:32:45 +0000
Subject: [PATCH 14/18] docs: sync tutorials/deploying-llama-3-to-azure.mdx
 with latest code

---
 tutorials/deploying-llama-3-to-azure.mdx | 182 +++++++++++------------
 1 file changed, 88 insertions(+), 94 deletions(-)

diff --git a/tutorials/deploying-llama-3-to-azure.mdx b/tutorials/deploying-llama-3-to-azure.mdx
index 679ba23..8bbfbfa 100644
--- a/tutorials/deploying-llama-3-to-azure.mdx
+++ b/tutorials/deploying-llama-3-to-azure.mdx
@@ -3,141 +3,135 @@ title: Deploying Llama 3 to Azure
 ---
 
 ## Introduction
-This tutorial guides you through deploying Llama 3 to Azure ML platform using Magemaker and querying it using the interactive dropdown menu. Ensure you have followed the [installation](installation) steps before proceeding. 
+This tutorial walks you through **deploying the Meta‐Llama 3 8B Instruct model to Azure Machine Learning (Azure ML)** using Magemaker and then querying the endpoint. Make sure you have completed the [installation](/installation) and initial [Azure configuration](/configuration/Azure) guides first.
 
-<Note> 
-You may need to request a quota increase for specific machine types and GPUs in the region where you plan to deploy the model. Check your Azure quotas before proceeding. 
-</Note>
+<Warning>
+You may need to request a **quota increase** for specific GPU VM series (e.g. NC A100, NC V100) in the Azure region where you will deploy. Check quotas *before* starting the deployment.
+</Warning>
 
 <Note>
-  The model ids for Azure are different from AWS and GCP. Make sure to use the one provided by Azure in the Azure Model Catalog. 
+Azure uses **different model-id formats** than Hugging Face (slashes are replaced by double dashes). Use the model-id shown in the Azure *Model Catalog* card.
 
-  To find the relevnt model id, follow the steps in the [quick start](For Azure ML)
+Example – Meta Llama 3 8B Instruct:
+```
+meta-llama--meta-llama-3-8b-instruct
+```
 </Note>
 
-## Step 1: Setting Up Magemaker for Azure
-
-Run the following command to configure Magemaker for Azure deployment:
-
-```sh
+---
+## Step 1 · Configure Magemaker for Azure
+If you have not configured Azure yet, run:
+```bash
 magemaker --cloud azure
 ```
-
-This initializes Magemaker with the necessary configurations for deploying models to Azure ML Studio.
-
-## Step 2: YAML-based Deployment
-
-For reproducible deployments, use YAML configuration:
-
-```sh
-magemaker --deploy .magemaker_config/your-model.yaml
+This command will:
+1. Prompt you to sign in via `az login` (if needed)
+2. Ask for the **subscription-id**, **resource-group**, and **workspace**
+3. Create/append a `.env` file with the required environment variables.
+
+You should now have at least these variables in `.env`:
+```bash
+AZURE_SUBSCRIPTION_ID=<your-sub-id>
+AZURE_RESOURCE_GROUP=<rg-name>
+AZURE_WORKSPACE_NAME=<workspace>
+AZURE_REGION=<region>
+HUGGING_FACE_HUB_KEY=<hf-token>  # required for gated models such as Llama-3
 ```
 
-Example YAML for Azure deployment:
-
+---
+## Step 2 · Create a YAML Deployment File
+Magemaker recommends YAML for reproducibility and CI/CD integration. Save the following as `.magemaker_config/llama3-azure.yaml`:
 ```yaml
+# .magemaker_config/llama3-azure.yaml
+
 deployment: !Deployment
   destination: azure
-  endpoint_name: llama3-endpoint
+  endpoint_name: llama3-endpoint          # must be globally unique within the workspace
   instance_count: 1
-  instance_type: Standard_NC24ads_A100_v4 
+  instance_type: Standard_NC24ads_A100_v4 # or Standard_NC24s_v3 for V100
+  # num_gpus is *not* required for Azure
 
 models:
   - !Model
-    id: meta-llama-meta-llama-3-8b-instruct
-    location: null
-    predict: null
+    id: meta-llama--meta-llama-3-8b-instruct  # Azure-style id
     source: huggingface
     task: text-generation
-    version: null
+    predict:            # optional inference parameters
+      temperature: 0.7
+      top_p: 0.9
+      max_new_tokens: 250
 ```
 
-<Note>
-   For gated models like llama from Meta, you have to accept terms of use for model on hugging face and adding Hugging face token to the environment are necessary for deployment to go through.
-</Note>
-
-### Selecting an Appropriate Instance
-For 8B parameter models, recommended instance types include:
-
-- Standard_NC24ads_A100_v4 (optimal performance)
-- Standard_NC24s_v3 (cost-effective option with V100)
-
-<Warning>
-If you encounter quota issues, submit a quota increase request in the Azure console. In the search bar search for `Quotas` and select the subscription you are using. In the `provider` select `Machine Learning` and then select the relevant region for the quota increase
-
-![Azure Quota](../Images/quotas.png)
-</Warning>
-
-## Step 3: Querying the Deployed Model
-
-Once the deployment is complete, note down the endpoint id.
-
-You can use the interactive dropdown menu to quickly query the model.
-
-### Querying Models
-
-From the dropdown, select `Query a Model Endpoint` to see the list of model endpoints. Press space to select the endpoint you want to query. Enter your query in the text box and press enter to get the response.
-
-![Query Endpoints](../Images/query-1.png)
-
-Or you can use the following code
+Then deploy with:
+```bash
+magemaker --deploy .magemaker_config/llama3-azure.yaml
+```
+Magemaker will:
+1. Upload a container image & model metadata to the workspace
+2. Provision the chosen VM(s)
+3. Create an online endpoint
+4. Return the **endpoint name** once healthy.
 
-```python 
+---
+## Step 3 · Query the Deployed Endpoint
+### 3.1 Interactive CLI (Recommended)
+Run:
+```bash
+magemaker --cloud azure
+```
+Choose **“Query a Model Endpoint”**, pick `llama3-endpoint`, enter your prompt, and view the response directly in the terminal.
 
+### 3.2 Python Example
+If you prefer code, use the snippet below. It relies on the same `.env` file generated earlier.
+```python
 from azure.identity import DefaultAzureCredential
 from azure.ai.ml import MLClient
-from azure.mgmt.resource import ResourceManagementClient
-
 from dotenv import dotenv_values
-import os
-
-
-def query_azure_endpoint(endpoint_name, query):
-    # Initialize the ML client
-    subscription_id   = dotenv_values(".env").get("AZURE_SUBSCRIPTION_ID")
-    resource_group    = dotenv_values(".env").get("AZURE_RESOURCE_GROUP")
-    workspace_name    = dotenv_values(".env").get("AZURE_WORKSPACE_NAME")
+import json, os
 
+def query_azure_endpoint(endpoint_name: str, prompt: str):
+    env = dotenv_values(".env")
     credential = DefaultAzureCredential()
+
     ml_client = MLClient(
         credential=credential,
-        subscription_id=subscription_id,
-        resource_group_name=resource_group,
-        workspace_name=workspace_name
+        subscription_id=env["AZURE_SUBSCRIPTION_ID"],
+        resource_group_name=env["AZURE_RESOURCE_GROUP"],
+        workspace_name=env["AZURE_WORKSPACE_NAME"],
     )
 
-    import json
-
-    # Test data
-    test_data = {
-        "inputs": query
-    }
-
-    # Save the test data to a temporary file
-    with open("test_request.json", "w") as f:
-        json.dump(test_data, f)
+    payload_path = "tmp_request.json"
+    with open(payload_path, "w") as f:
+        json.dump({"inputs": prompt}, f)
 
-    # Get prediction
     response = ml_client.online_endpoints.invoke(
         endpoint_name=endpoint_name,
-        request_file = 'test_request.json'
+        request_file=payload_path,
     )
 
-    print('Raw Response Content:', response)
-    # delete a file
-    os.remove("test_request.json")
+    os.remove(payload_path)
     return response
-    
-endpoint_id = 'your-endpoint-id-here'
-    
-input_text = 'What are you?'
 
+if __name__ == "__main__":
+    endpoint = "llama3-endpoint"  # replace if you used a different name
+    print(query_azure_endpoint(endpoint, "What are you?"))
+```
 
-resp = query_azure_endpoint(endpoint_id=endpoint_id, input_text=input_text)
-print(resp)
+---
+## Troubleshooting
+1. **Quota errors** – Go to *Azure Portal → Quotas* → choose your subscription & region → request more GPUs/cores.
+2. **Authentication errors** – Confirm `az login` is active and your `.env` variables are correct.
+3. **Deployment stuck in ‘Failed’** – Open *Azure ML Studio → Endpoints* → click the endpoint → check *Logs* for detailed error messages.
 
-```
+<Info>
+Want to use your endpoint with the OpenAI SDK or LangChain? Magemaker ships an [OpenAI-compatible proxy](/concepts/openai-proxy) you can start with `python -m magemaker.server`.
+</Info>
 
+---
 ## Conclusion
-You have successfully deployed and queried Llama 3 on Azure using Magemaker's interactive dropdown menu. For any questions or feedback, feel free to contact us at [support@slashml.com](mailto:support@slashml.com).
+You have successfully deployed and queried **Llama 3 8B Instruct** on Azure ML using Magemaker.  Remember to **delete** the endpoint once you are done to avoid ongoing GPU charges:
+```bash
+magemaker --cloud azure → Delete a Model Endpoint
+```
 
+Need help?  Reach us at [support@slashml.com](mailto:support@slashml.com).

From 889126b3308c1fbd8fb37817a5283ab12dcd8b87 Mon Sep 17 00:00:00 2001
From: "pr-test1[bot]" <226697212+pr-test1[bot]@users.noreply.github.com>
Date: Tue, 16 Sep 2025 13:32:47 +0000
Subject: [PATCH 15/18] docs: sync tutorials/deploying-llama-3-to-gcp.mdx with
 latest code

---
 tutorials/deploying-llama-3-to-gcp.mdx | 123 ++++++++++++-------------
 1 file changed, 59 insertions(+), 64 deletions(-)

diff --git a/tutorials/deploying-llama-3-to-gcp.mdx b/tutorials/deploying-llama-3-to-gcp.mdx
index a94d616..35150d3 100644
--- a/tutorials/deploying-llama-3-to-gcp.mdx
+++ b/tutorials/deploying-llama-3-to-gcp.mdx
@@ -3,10 +3,10 @@ title: Deploying Llama 3 to GCP
 ---
 
 ## Introduction
-This tutorial guides you through deploying Llama 3 to Google Cloud Platform (GCP) Vertex AI using Magemaker and querying it using the interactive dropdown menu. Ensure you have followed the [installation](installation) steps before proceeding. 
+This tutorial guides you through deploying Llama 3 to Google Cloud Platform (GCP) Vertex AI using Magemaker and querying it via the interactive dropdown menu. Make sure you have completed the [installation](installation) and GCP configuration steps before proceeding.
 
-<Note> 
-You may need to request a quota increase for specific machine types and GPUs in the region where you plan to deploy the model. Check your GCP quotas before proceeding. 
+<Note>
+For large models you may need to request a quota increase for specific machine types and GPUs in the region where you plan to deploy the model. Check your GCP quotas before proceeding.
 </Note>
 
 ## Step 1: Setting Up Magemaker for GCP
@@ -17,11 +17,11 @@ Run the following command to configure Magemaker for GCP Vertex AI deployment:
 magemaker --cloud gcp
 ```
 
-This initializes Magemaker with the necessary configurations for deploying models to Vertex AI.
+This initializes Magemaker with the necessary environment variables (PROJECT_ID, GCLOUD_REGION, etc.) and creates/updates your `.env` file.
 
 ## Step 2: YAML-based Deployment
 
-For reproducible deployments, use YAML configuration:
+For reproducible deployments, use a YAML configuration:
 
 ```sh
 magemaker --deploy .magemaker_config/your-model.yaml
@@ -33,92 +33,79 @@ Example YAML for GCP deployment:
 deployment: !Deployment
   destination: gcp
   endpoint_name: llama3-endpoint
+  instance_type: n1-standard-8      # a.k.a. machine_type in Vertex AI UI
+  accelerator_type: NVIDIA_T4       # GPU type
   accelerator_count: 1
-  instance_type: n1-standard-8
-  accelerator_type: NVIDIA_T4
-  num_gpus: 1
-  quantization: null
+  num_gpus: 1                       # optional – overrides accelerator_count if set
+  quantization: null                # optional – bf16 / int8 / bitsandbytes
 
 models:
   - !Model
     id: meta-llama/Meta-Llama-3-8B-Instruct
-    location: null
-    predict: null
     source: huggingface
     task: text-generation
+    predict: null                   # optional inference parameters
     version: null
+    location: null                  # optional custom artefact path
 ```
+
 <Note>
-   For gated models like llama from Meta, you have to accept terms of use for model on hugging face and adding Hugging face token to the environment are necessary for deployment to go through.
+For gated models like Llama you must (1) accept the terms of use on Hugging Face and (2) add your `HUGGING_FACE_HUB_KEY` to the `.env` file.
 </Note>
 
-
-### Selecting an Appropriate Instance
-For Llama 3, a machine type such as `n1-standard-8` with an attached NVIDIA T4 GPU (`NVIDIA_T4`) is a suitable configuration for most use cases. Adjust the instance type and GPU based on your workload requirements.
+### Choosing an Instance Type
+A common configuration for Llama 3 (8B) is `n1-standard-8` with one T4 GPU. Adjust `instance_type`, `accelerator_type`, and `accelerator_count` according to your latency and cost requirements.
 
 <Warning>
-If you encounter quota issues, submit a quota increase request in the GCP console under "IAM & Admin > Quotas" for the specific GPU type in your deployment region.
+If you encounter quota issues, submit a quota increase request in the GCP console under **IAM & Admin → Quotas** for the specific GPU type and region.
 </Warning>
 
 ## Step 3: Querying the Deployed Model
 
-Once the deployment is complete, note down the endpoint id.
-
-You can use the interactive dropdown menu to quickly query the model.
+After deployment completes, note the **endpoint ID** printed by Magemaker.
 
-### Querying Models
-
-From the dropdown, select `Query a Model Endpoint` to see the list of model endpoints. Press space to select the endpoint you want to query. Enter your query in the text box and press enter to get the response.
+### Option A – Interactive Dropdown
+Use `Query a Model Endpoint` from the Magemaker menu to send prompts directly from your terminal.
 
 ![Query Endpoints](../Images/query-1.png)
 
-Or you can use the following code:
-```python 
-from google.cloud import aiplatform
-from google.protobuf import json_format
-from google.protobuf.struct_pb2 import Value
-import json
+### Option B – Python (Vertex AI REST)
+If you prefer code, you can call the endpoint with a simple REST helper:
+
+```python
 from dotenv import dotenv_values
+import google.auth
+import google.auth.transport.requests
+import requests
 
 
-def query_vertexai_endpoint_rest(
-    endpoint_id: str,
-    input_text: str,
-    token_path: str = None
-):
-    import google.auth
-    import google.auth.transport.requests
-    import requests
+def query_vertexai_endpoint_rest(endpoint_id: str, input_text: str, token_path: str | None = None):
+    """Send a text-generation request to a Vertex AI endpoint."""
 
-    # TODO: this will have to come from config files
-    project_id = dotenv_values('.env').get('PROJECT_ID')
-    location = dotenv_values('.env').get('GCLOUD_REGION')
+    project_id = dotenv_values(".env").get("PROJECT_ID")
+    location   = dotenv_values(".env").get("GCLOUD_REGION")
 
-    
-    # Get credentials
+    # Obtain an access token
     if token_path:
-        credentials, project = google.auth.load_credentials_from_file(token_path)
+        credentials, _ = google.auth.load_credentials_from_file(token_path)
     else:
-        credentials, project = google.auth.default()
-    
-    # Refresh token
-    auth_req = google.auth.transport.requests.Request()
-    credentials.refresh(auth_req)
-    
-    # Prepare headers and URL
+        credentials, _ = google.auth.default()
+    credentials.refresh(google.auth.transport.requests.Request())
+
     headers = {
         "Authorization": f"Bearer {credentials.token}",
         "Content-Type": "application/json"
     }
-    
-    url = f"https://{location}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/endpoints/{endpoint_id}:predict"
-    
-    # Prepare payload
+
+    url = (
+        f"https://{location}-aiplatform.googleapis.com/v1/projects/"
+        f"{project_id}/locations/{location}/endpoints/{endpoint_id}:predict"
+    )
+
     payload = {
         "instances": [
             {
                 "inputs": input_text,
-                # TODO: this also needs to come from configs
                 "parameters": {
                     "max_new_tokens": 100,
                     "temperature": 0.7,
@@ -127,20 +114,28 @@ def query_vertexai_endpoint_rest(
             }
         ]
     }
-    
-    # Make request
-    response = requests.post(url, headers=headers, json=payload)
-    print('Raw Response Content:', response.content.decode())
 
+    response = requests.post(url, headers=headers, json=payload, timeout=300)
+    response.raise_for_status()
     return response.json()
 
-endpoint_id="your-endpoint-id-here"
 
-input_text='What are you?"'
-resp = query_vertexai_endpoint_rest(endpoint_id=endpoint_id, input_text=input_text)
-print(resp)
+# Example usage
+if __name__ == "__main__":
+    ENDPOINT_ID = "your-endpoint-id-here"
+    prompt = "What are you?"
+    print(query_vertexai_endpoint_rest(ENDPOINT_ID, prompt))
 ```
 
-## Conclusion
-You have successfully deployed and queried Llama 3 on GCP Vertex AI using Magemaker's interactive dropdown menu. For any questions or feedback, feel free to contact us at [support@slashml.com](mailto:support@slashml.com).
+### Option C – Built-in Helper
+Magemaker also ships with a helper function that wraps the above logic:
+
+```python
+from magemaker.gcp.query_endpoint import query_vertexai_endpoint
 
+answer = query_vertexai_endpoint(endpoint_name="your-endpoint-id-here", query="What are you?")
+print(answer)
+```
+
+## Conclusion
+You have successfully deployed and queried Llama 3 on GCP Vertex AI using Magemaker. For questions or feedback, reach us at [support@slashml.com](mailto:support@slashml.com).

From 6e426784ce9dc38a885291f131686a28bf38473c Mon Sep 17 00:00:00 2001
From: "pr-test1[bot]" <226697212+pr-test1[bot]@users.noreply.github.com>
Date: Tue, 16 Sep 2025 13:32:49 +0000
Subject: [PATCH 16/18] docs: sync updated_readme.md with latest code

---
 updated_readme.md | 191 +++++++++++++++++-----------------------------
 1 file changed, 70 insertions(+), 121 deletions(-)

diff --git a/updated_readme.md b/updated_readme.md
index bcfc60b..1e76877 100644
--- a/updated_readme.md
+++ b/updated_readme.md
@@ -1,4 +1,3 @@
-
 <a name="readme-top"></a>
 
 <!-- PROJECT LOGO -->
@@ -7,7 +6,7 @@
   <h3 align="center">Magemaker v0.1, by SlashML</h3>
 
   <p align="center">
-    Deploy open source AI models to AWS in minutes.
+    Deploy open source AI models to AWS, GCP, and Azure in minutes.
     <br />
   </p>
 </div>
@@ -21,8 +20,7 @@
     <li>
       <a href="#about-model-manager">About Magemaker</a>
     </li>
-    <li>
-      <a href="#getting-started">Getting Started</a>
+    <li><a href="#getting-started">Getting Started</a>
       <ul>
         <li><a href="#prerequisites">Prerequisites</a></li>
         <li><a href="#installation">Installation</a></li>
@@ -39,93 +37,81 @@
 
 <!-- ABOUT THE PROJECT -->
 ## About Magemaker
-Magemaker is a Python tool that simplifies the process of deploying an open source AI model to your own cloud. Instead of spending hours digging through documentation to figure out how to get AWS working, Magemaker lets you deploy open source AI models directly from the command line.
+Magemaker is a Python tool that simplifies the process of deploying an open-source AI model to **your** cloud. Instead of spending hours digging through platform-specific docs, Magemaker spins up production-ready endpoints on AWS SageMaker, Google Cloud Vertex AI, or Azure Machine Learning from a single CLI command.
 
-Choose a model from Hugging Face or SageMaker, and Magemaker will spin up a SageMaker instance with a ready-to-query endpoint in minutes.
+Choose a model from Hugging Face, AWS JumpStart, or point Magemaker at your own model artefacts—an endpoint will be online in minutes.
 
 <!-- GETTING STARTED -->
 <br>
 
 ## Getting Started
 
-Magemaker works with AWS. Azure and GCP support are coming soon!
-
-To get a local copy up and running follow these simple steps.
+Magemaker currently supports **AWS, GCP and Azure**. The first run guides you through the minimal configuration required for your chosen provider(s).
 
 ### Prerequisites
 
-* Python
-* An AWS account
-* Quota for AWS SageMaker instances (by default, you get 2 instances of ml.m5.xlarge for free)
-* Certain Hugging Face models (e.g. Llama2) require an access token ([hf docs](https://huggingface.co/docs/hub/en/models-gated#access-gated-models-as-a-user))
-
-### Configuration
-
-**Step 1: Set up AWS and SageMaker**
-
-To get started, you’ll need an AWS account which you can create at https://aws.amazon.com/. Then you’ll need to create access keys for SageMaker.
-
-We wrote up the steps in [Google Doc](https://docs.google.com/document/d/1NvA6uZmppsYzaOdkcgNTRl7Nb4LbpP9Koc4H_t5xNSg/edit?tab=t.0#heading=h.farbxuv3zrzm) as well. 
+* Python 3.11 + (3.12 currently unsupported due to Azure SDK issue)
+* Cloud account(s) & sufficient quota:
+  * AWS for SageMaker
+  * GCP for Vertex AI
+  * Azure for Azure ML
+* CLI tools (optional but recommended):
+  * AWS CLI, Google Cloud SDK, Azure CLI
+* Hugging Face account & token for gated models (e.g. Llama-3)
 
-
-
-### Installing the package
-
-**Step 1**
+### Installation
 
 ```sh
 pip install magemaker
 ```
 
-**Step 2: Running magemaker**
-
-Run it by simply doing the following:
+### First-time configuration
+Run Magemaker with your desired cloud flag. The wizard collects credentials (or points you to the relevant CLI login command), writes them to a local `.env`, and verifies quota.
 
 ```sh
-magemaker
+magemaker --cloud [aws|gcp|azure|all]
 ```
 
-If this is your first time running this command. It will configure the AWS client so you’re ready to start deploying models. You’ll be prompted to enter your Access Key and Secret here. You can also specify your AWS region. The default is us-east-1. You only need to change this if your SageMaker instance quota is in a different region.
-
-Once configured, it will create a `.env` file and save the credentials there. You can also add your Hugging Face Hub Token to this file if you have one. 
-
-```sh
-HUGGING_FACE_HUB_KEY="KeyValueHere"
+Typical `.env` variables (auto-generated):
+```bash
+AWS_ACCESS_KEY_ID="..."      # if aws selected
+AWS_SECRET_ACCESS_KEY="..."
+AWS_REGION="us-east-1"
+PROJECT_ID="my-gcp-project"  # if gcp selected
+GCLOUD_REGION="us-central1"
+AZURE_SUBSCRIPTION_ID="..."  # if azure selected
+AZURE_RESOURCE_GROUP="ml-resources"
+AZURE_WORKSPACE_NAME="ml-workspace"
+AZURE_REGION="eastus"
+HUGGING_FACE_HUB_KEY="hf_..." # optional
 ```
 
 <p align="right">(<a href="#readme-top">back to top</a>)</p>
 
 
-
 <!-- USAGE -->
 <br>
 
 ## Using Magemaker
 
-### Deploying models from dropdown
-
-When you run `magemaker` comamnd it will give you an interactive menu to deploy models. You can choose from a dropdown of models to deploy.
+### Interactive dropdown
+`magemaker --cloud ...` opens an interactive menu where you can:
+* Deploy a new model
+* List / delete active endpoints
+* Query an endpoint directly from the terminal
 
-#### Deploying Hugging Face models
-If you're deploying with Hugging Face, copy/paste the full model name from Hugging Face. For example, `google-bert/bert-base-uncased`. Note that you’ll need larger, more expensive instance types in order to run bigger models. It takes anywhere from 2 minutes (for smaller models) to 10+ minutes (for large models) to spin up the instance with your model. 
+### YAML-based deployment (recommended)
+For CI/CD or reproducibility, pass a YAML config:
 
-#### Deploying Sagemaker models
-If you are deploying a Sagemaker model, select a framework and search from a model. If you a deploying a custom model, provide either a valid S3 path or a local path (and the tool will automatically upload it for you). Once deployed, we will generate a YAML file with the deployment and model in the `CONFIG_DIR=.magemaker_config` folder. You can modify the path to this folder by setting the `CONFIG_DIR` environment variable. 
-
-#### Deploy using a yaml file
-We recommend deploying through a yaml file for reproducability and IAC. From the cli, you can deploy a model without going through all the menus. You can even integrate us with your Github Actions to deploy on PR merge. Deploy via YAML files simply by passing the `--deploy` option with local path like so:
-
-```
-magemaker --deploy .magemaker_config/bert-base-uncased.yaml
+```sh
+magemaker --deploy .magemaker_config/bert-aws.yaml
 ```
 
-Following is a sample yaml file for deploying a model the same google bert model mentioned above:
-
+Example ­– **AWS SageMaker (Hugging Face)**
 ```yaml
 deployment: !Deployment
   destination: aws
-  # Endpoint name matches model_id for querying atm.
-  endpoint_name: test-bert-uncased
+  endpoint_name: bert-uncased-dev
   instance_count: 1
   instance_type: ml.m5.xlarge
 
@@ -135,83 +121,46 @@ models:
   source: huggingface
 ```
 
-Following is a yaml file for deploying a llama model from HF:
+Example ­– **GCP Vertex AI**
 ```yaml
 deployment: !Deployment
-  destination: aws
-  endpoint_name: test-llama2-7b
-  instance_count: 1
-  instance_type: ml.g5.12xlarge
-  num_gpus: 4
-  # quantization: bitsandbytes
+  destination: gcp
+  endpoint_name: llama3-gcp
+  accelerator_count: 1
+  instance_type: n1-standard-8
+  accelerator_type: NVIDIA_T4
 
 models:
 - !Model
   id: meta-llama/Meta-Llama-3-8B-Instruct
   source: huggingface
-  predict:
-    temperature: 0.9
-    top_p: 0.9
-    top_k: 20
-    max_new_tokens: 250
 ```
 
-#### Fine-tuning a model using a yaml file
-
-You can also fine-tune a model using a yaml file, by using the `train` option in the command and passing path to the yaml file
-
-`
-magemaker --train .magemaker_config/train-bert.yaml
-`
-
-Here is an example yaml file for fine-tuning a hugging-face model:
-
+Example ­– **Azure ML**
 ```yaml
-training: !Training
-  destination: aws
-  instance_type: ml.p3.2xlarge
+deployment: !Deployment
+  destination: azure
+  endpoint_name: llama3-azure
   instance_count: 1
-  training_input_path: s3://jumpstart-cache-prod-us-east-1/training-datasets/tc/data.csv
-  hyperparameters: !Hyperparameters
-    epochs: 1
-    per_device_train_batch_size: 32
-    learning_rate: 0.01
+  instance_type: Standard_NC24ads_A100_v4
 
 models:
 - !Model
-  id: meta-textgeneration-llama-3-8b-instruct
+  id: meta-llama-meta-llama-3-8b-instruct   # Azure model-catalog id
   source: huggingface
 ```
 
+### AWS JumpStart & Custom models
+Deploy marketplace models or your own artefacts stored locally/S3—see the [JumpStart & Custom Models](docs/concepts/jumpstart-custom-models.mdx) guide.
 
-<br>
-<br>
-
-If you’re using the `ml.m5.xlarge` instance type, here are some small Hugging Face models that work great:
-<br>
-<br>
-
-**Model: [google-bert/bert-base-uncased](https://huggingface.co/google-bert/bert-base-uncased)**
-
-- **Type:** Fill Mask: tries to complete your sentence like Madlibs
-- **Query format:** text string with `[MASK]` somewhere in it that you wish for the transformer to fill
-- 
-<br>
-<br>
-
-**Model: [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2)**
-
-- **Type:** Feature extraction: turns text into a 384d vector embedding for semantic search / clustering
-- **Query format:** "*type out a sentence like this one.*"
-
-<br>
-<br>
-
-
-### Deactivating models
-
-Any model endpoints you spin up will run continuously unless you deactivate them! Make sure to delete endpoints you’re no longer using so you don’t keep getting charged for your SageMaker instance.
+### Fine-tuning
+```sh
+magemaker --train .magemaker_config/train-bert.yaml
+```
+YAML follows the `!Training` schema; only AWS is supported for training today.
 
+### Deactivating endpoints
+Endpoints accrue cloud charges until deleted. Use the menu option **Delete a Model Endpoint** or run your cloud console cleanup.
 
 <p align="right">(<a href="#readme-top">back to top</a>)</p>
 
@@ -220,10 +169,10 @@ Any model endpoints you spin up will run continuously unless you deactivate them
 <br>
 
 ## What we're working on next
-- [ ]  More robust error handling for various edge cases
-- [ ]  Verbose logging
-- [ ]  Enabling / disabling autoscaling
-- [ ]  Deployment to Azure and GCP
+- [ ] Enhanced error handling & verbose logging
+- [ ] Autoscaling controls
+- [ ] Streaming support in OpenAI-compatible proxy
+- [ ] Additional cloud-specific optimisations
 
 <p align="right">(<a href="#readme-top">back to top</a>)</p>
 
@@ -232,9 +181,9 @@ Any model endpoints you spin up will run continuously unless you deactivate them
 <br>
 
 ## Known issues
-- [ ]  Querying within Magemaker currently only works with text-based model - doesn’t work with multimodal, image generation, etc.
-- [ ]  Deleting a model is not instant, it may show up briefly after it was queued for deletion
-- [ ]  Deploying the same model within the same minute will break
+- Query helper currently supports text-based pipelines only
+- Endpoint deletion is asynchronous—may appear active for a short time after request
+- Deploying the same model repeatedly within ~60 seconds can fail due to name collision (timestamp workaround in progress)
 
 <p align="right">(<a href="#readme-top">back to top</a>)</p>
 
@@ -251,6 +200,6 @@ Distributed under the Apache 2.0 License. See `LICENSE` for more information.
 
 ## Contact
 
-You can reach us, faizan & jneid, at [support@slashml.com](mailto:support@slashml.com). 
+Questions, bugs, ideas? Reach us (Faizan & Jneid) at [support@slashml.com](mailto:support@slashml.com).
 
-We’d love to hear from you! We’re excited to learn how we can make this more valuable for the community and welcome any and all feedback and suggestions.
+We’d love your feedback!

From 99600ebb987bd98d096617af13d5a6fb9ee6f90d Mon Sep 17 00:00:00 2001
From: "pr-test1[bot]" <226697212+pr-test1[bot]@users.noreply.github.com>
Date: Tue, 16 Sep 2025 13:32:50 +0000
Subject: [PATCH 17/18] docs: create concepts/jumpstart-custom-models.mdx

---
 concepts/jumpstart-custom-models.mdx | 88 ++++++++++++++++++++++++++++
 1 file changed, 88 insertions(+)
 create mode 100644 concepts/jumpstart-custom-models.mdx

diff --git a/concepts/jumpstart-custom-models.mdx b/concepts/jumpstart-custom-models.mdx
new file mode 100644
index 0000000..1df3461
--- /dev/null
+++ b/concepts/jumpstart-custom-models.mdx
@@ -0,0 +1,88 @@
+---
+title: JumpStart & Custom Models
+description: Deploy AWS JumpStart marketplace and custom models with Magemaker
+---
+
+## Overview
+Besides Hugging Face models, Magemaker can now deploy two additional model types **on AWS SageMaker**:
+
+1. **JumpStart models** – pre-packaged models provided by Amazon or third-party sellers.
+2. **Custom models** – your own fine-tuned artefacts (.tar.gz or directory) stored locally or in S3.
+
+<GithubContribTag />
+
+---
+## 1 · Deploying a JumpStart Model
+
+### 1.1 Find the *model_id*
+Browse the [JumpStart Model Zoo](https://docs.aws.amazon.com/sagemaker/latest/dg/jumpstart-model-zoo.html) or call:
+```python
+from sagemaker.jumpstart.notebook_utils import list_jumpstart_models
+list_jumpstart_models()
+```
+
+### 1.2 Create a YAML
+```yaml
+models:
+- !Model
+  id: huggingface-text2text-flan-t5-large  # JumpStart model_id
+  source: sagemaker
+
+deployment: !Deployment
+  destination: aws
+  instance_type: ml.g5.2xlarge
+  instance_count: 1
+```
+
+### 1.3 Deploy
+```bash
+magemaker --deploy .magemaker_config/flan-t5.yaml
+```
+Magemaker handles the EULA acceptance (`accept_eula=True`) automatically.
+
+---
+## 2 · Deploying a Custom Model
+Custom models are useful when you have already trained a model locally or with SageMaker Training.
+
+### 2.1 Package Artifacts
+- For **Hugging Face** style, create a directory containing `config.json`, `tokenizer.json`, etc.  
+- Optionally compress to `model.tar.gz`.
+
+### 2.2 Upload (optional)
+If your artefact is not yet on S3, Magemaker will upload it for you.
+
+### 2.3 YAML Example
+```yaml
+models:
+- !Model
+  id: my-distilbert-finetuned
+  source: huggingface
+  location: ./artifacts/distilbert.tar.gz  # or s3://bucket/key
+
+deployment: !Deployment
+  destination: aws
+  instance_type: ml.m5.xlarge
+```
+
+### 2.4 Deploy
+```bash
+magemaker --deploy .magemaker_config/my-distilbert.yaml
+```
+Magemaker will:
+1. Upload the local artefact to `s3://<default-bucket>/models/my-distilbert-finetuned/` (if needed)
+2. Create a `HuggingFaceModel` pointing to that S3 path
+3. Spin up the endpoint.
+
+---
+## Tips & Quotas
+- **GPU quota** – JumpStart LLMs often need a GPU instance (`ml.g5.*`). Request a quota increase first.
+- **Endpoint names** – If `endpoint_name` is omitted, Magemaker appends a timestamp to ensure uniqueness.
+- **Cost control** – Delete endpoints when not in use: `magemaker --cloud aws → Delete a Model Endpoint`.
+
+---
+## FAQ
+**Q : Do JumpStart models work on GCP / Azure?**  
+*A : Not yet. JumpStart is an AWS-specific marketplace.*
+
+**Q : Can I pass inference parameters to a JumpStart model?**  
+*A : Yes. Use the same `predict:` block as with Hugging Face models.*

From 277c5e61d0e2c6c16f2f31ae2f3c7151ccb5c8a1 Mon Sep 17 00:00:00 2001
From: "pr-test1[bot]" <226697212+pr-test1[bot]@users.noreply.github.com>
Date: Tue, 16 Sep 2025 13:32:52 +0000
Subject: [PATCH 18/18] docs: create concepts/openai-proxy.mdx

---
 concepts/openai-proxy.mdx | 60 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 60 insertions(+)
 create mode 100644 concepts/openai-proxy.mdx

diff --git a/concepts/openai-proxy.mdx b/concepts/openai-proxy.mdx
new file mode 100644
index 0000000..d44ccd2
--- /dev/null
+++ b/concepts/openai-proxy.mdx
@@ -0,0 +1,60 @@
+---
+title: OpenAI-Compatible Proxy
+description: Use your SageMaker / Vertex AI / AzureML endpoint with OpenAI SDKs via Magemaker's built-in proxy
+---
+
+## Overview
+Magemaker ships with a lightweight FastAPI server (`server.py`) that exposes a **Chat Completion**-compatible API.  This lets you swap OpenAI’s URL with `http://<your-host>:8000` and keep using the official `openai` Python / JS SDKs, LangChain, or any other OpenAI-compatible client.
+
+<Info>
+The proxy is an **optional** component.  You still deploy models the usual way; the server only forwards requests to the selected endpoint under the hood.
+</Info>
+
+## 1 · Start the proxy
+```bash
+python -m magemaker.server  # default: 0.0.0.0:8000
+```
+
+Environment variables from your `.env` file are automatically loaded, so the server knows how to reach SageMaker / Vertex AI / Azure ML.
+
+You should see:
+```
+INFO:     Started server process […]
+INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
+```
+
+## 2 · Point the OpenAI SDK to the proxy
+```python
+import openai
+
+openai.api_key = "sk-ignore"             # any non-empty value works
+openai.base_url = "http://localhost:8000/v1"  # important – include /v1
+
+response = openai.ChatCompletion.create(
+    model="your-endpoint-name",           # equals SageMaker / Vertex / Azure endpoint
+    messages=[{"role": "user", "content": "Hello!"}]
+)
+print(response.choices[0].message.content)
+```
+
+### Supported routes
+| Method | Path                | Description                         |
+| ------ | ------------------- | ----------------------------------- |
+| POST   | `/v1/chat/completions` | Streams request to the backing endpoint |
+| GET    | `/endpoints`        | Lists all active endpoints found via Magemaker |
+
+## 3 · Authentication & CORS
+The proxy does **not** enforce authentication.  Run it behind a reverse proxy (NGINX, CloudFront, APIM, etc.) if you need tokens, rate-limiting, or HTTPS termination.
+
+## 4 · Deployment suggestions
+- **Docker**: Build an image with your `.env` baked in and deploy to ECS / Cloud Run / ACI.
+- **Systemd**: On a small EC2/VPS, create a service that starts `python -m magemaker.server` on boot.
+
+## 5 · Limitations
+1. Only the `chat/completions` endpoint is implemented at present.
+2. Streaming (`stream=True`) is not yet supported – responses are returned once inference completes.
+3. The proxy currently forwards to a **single** cloud endpoint per request; multi-model routing is on the roadmap.
+
+<Warning>
+Do **not** expose the proxy publicly without additional security controls – anyone could burn cloud inference credits.
+</Warning>