slashml · pr-test1 · Sep 16, 2025 · Sep 16, 2025 · Sep 16, 2025 · Sep 16, 2025
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -65,4 +65,4 @@ By contributing, you agree that your contributions will be licensed under the Ap
 
 ## Questions?
 
-Feel free to contact us at [support@slashml.com](mailto:support@slashml.com) if you have any questions about contributing!
+Feel free to contact us at [support@slashml.com](mailto:support@slashml.com) if you have any questions about contributing!
diff --git a/about.mdx b/about.mdx
@@ -22,9 +22,9 @@ Do submit your feature requests at https://magemaker.featurebase.app/
 - Querying within Magemaker currently only works with text-based models
 - Deleting a model is not instant, it may show up briefly after deletion
 - Deploying the same model within the same minute will break
-- Hugging-face models on Azure have different Ids than their Hugging-face counterparts. Follow the steps specified in the quick-start guide to find the relevant models
-- For Azure deploying models other than Hugging-face is not supported yet. 
-- Python3.13 is not supported because of an open-issue by Azure. https://github.com/Azure/azure-sdk-for-python/issues/37600
+- Hugging Face models on Azure have different IDs than their Hugging Face counterparts. Follow the steps specified in the quick-start guide to find the relevant models.
+- For Azure, deploying models other than Hugging Face is not supported yet.
+- Python 3.12 is **not** supported because of an open issue in the Azure SDK (see https://github.com/Azure/azure-sdk-for-python/issues/37600).
 
 
 If there is anything we missed, do point them out at https://magemaker.featurebase.app/

diff --git a/concepts/contributing.mdx b/concepts/contributing.mdx
@@ -165,4 +165,4 @@ We are committed to providing a welcoming and inclusive experience for everyone.
 
 ## License
 
-By contributing to Magemaker, you agree that your contributions will be licensed under the Apache 2.0 License.
+By contributing to Magemaker, you agree that your contributions will be licensed under the Apache 2.0 License.
diff --git a/concepts/deployment.mdx b/concepts/deployment.mdx
@@ -62,8 +62,8 @@ deployment: !Deployment
   destination: gcp
   endpoint_name: opt-125m-gcp
   instance_count: 1
-  machine_type: n1-standard-4
-  accelerator_type: NVIDIA_TESLA_T4
+  instance_type: n1-standard-4
+  accelerator_type: NVIDIA_TESLA_T4   # or NVIDIA_L4
   accelerator_count: 1
 
 models:
@@ -113,6 +113,7 @@ deployment: !Deployment
   instance_count: 1
   instance_type: ml.g5.12xlarge
   num_gpus: 4
+  # quantization: bitsandbytes   # Optional
 
 models:
   - !Model
@@ -202,10 +203,9 @@ Choose your instance type based on your model's requirements:
 4. Set up monitoring and alerting for your endpoints
 
 <Warning>
-Make sure you setup budget monitory and alerts to avoid unexpected charges.
+Make sure you set up budget monitoring and alerts to avoid unexpected charges.
 </Warning>
 
-
 ## Troubleshooting Deployments
 
 Common issues and their solutions:
@@ -225,4 +225,4 @@ Common issues and their solutions:
    - Verify model ID and version
    - Check instance memory requirements
    - Validate Hugging Face token if required
-   - Endpoing deployed but deployment failed. Check the logs, and do report this to us if you see this issue.
+   - Endpoint deployed but deployment failed. Check the logs, and report this to us if you see this issue.
diff --git a/concepts/fine-tuning.mdx b/concepts/fine-tuning.mdx
@@ -5,7 +5,7 @@ description: Guide to fine-tuning models with Magemaker
 
 ## Fine-tuning Overview
 
-Fine-tuning allows you to adapt pre-trained models to your specific use case. Magemaker simplifies this process through YAML configuration.
+Fine-tuning allows you to adapt pre-trained models to your specific use case. Currently, Magemaker supports fine-tuning **on AWS SageMaker only** (GCP and Azure fine-tuning are on the roadmap). The workflow is entirely YAML-driven so it can be automated in CI/CD pipelines.
 
 ### Basic Command
 
@@ -19,19 +19,25 @@ magemaker --train .magemaker_config/train-config.yaml
 
 ```yaml
 training: !Training
-  destination: aws
-  instance_type: ml.p3.2xlarge
+  destination: aws                 # only "aws" is supported for now
+  instance_type: ml.p3.2xlarge      # GPU instance for training
   instance_count: 1
-  training_input_path: s3://your-bucket/training-data.csv
+  training_input_path: s3://your-bucket/training-data.csv  # points to your dataset
 
 models:
-- !Model
-  id: your-model-id
-  source: huggingface
+  - !Model
+    id: your-model-id
+    source: huggingface
 ```
 
+• **destination** – must be `aws` at the moment.  
+• **training_input_path** – S3 URI that Magemaker will pass directly to SageMaker.  
+• **instance_type / instance_count** – any SageMaker training instance type is supported.
+
 ### Advanced Configuration
 
+Beyond the basics, you can supply custom hyperparameters. If omitted, Magemaker will attempt to infer sensible defaults based on the model family (see `get_hyperparameters_for_model()` in the codebase).
+
 ```yaml
 training: !Training
   destination: aws
@@ -49,20 +55,38 @@ training: !Training
     save_steps: 1000
 ```
 
+<Note>
+If you omit `hyperparameters`, Magemaker will fall back to task-specific defaults. For example, text-generation models automatically receive the hyperparameters returned by `get_hyperparameters_for_model()`.
+</Note>
+
+### Optional Parameters
+
+The `Training` schema also supports the following optional fields (all are optional and have sensible defaults):
+
+| Field                | Description                                                               |
+| -------------------- | ------------------------------------------------------------------------- |
+| `output_path`        | S3 URI where training artifacts should be stored                          |
+| `max_run`            | Maximum training job runtime in seconds                                   |
+| `volume_size_in_gb`  | Size of the EBS volume attached to the training instance                  |
+| `spot`               | `true/false` – use SageMaker Spot Training to save costs                  |
+| `checkpoint_s3_uri`  | S3 URI for incremental checkpoints (only relevant if `spot: true`)        |
+
+*(See the `Training` Pydantic model in `magemaker/schemas/training.py` for the full list.)*
+
 ## Data Preparation
 
 ### Supported Formats
 
 <CardGroup>
   <Card title="CSV Format" icon="file-csv">
-    - Simple tabular data
-    - Easy to prepare
+    - Simple tabular data<br />
+    - Easy to prepare<br />
     - Good for classification tasks
   </Card>
 
   <Card title="JSON Lines" icon="file-code">
-    - Flexible data format
-    - Good for complex inputs
+    - Flexible data format<br />
+    - Good for complex inputs<br />
     - Supports nested structures
   </Card>
 </CardGroup>
@@ -71,48 +95,43 @@ training: !Training
 
 <Steps>
   <Step title="Prepare Data">
-    Format your data according to model requirements
+    Format your dataset according to the model requirements (e.g., one JSON line per training example).
   </Step>
   <Step title="Upload to S3">
-    Use AWS CLI or console to upload data
+    Use the AWS CLI or console to upload your dataset:
+    ```bash
+    aws s3 cp local/path/to/data.csv s3://your-bucket/data.csv
+    ```
   </Step>
   <Step title="Configure Path">
-    Specify S3 path in training configuration
+    Reference the S3 URI in `training_input_path` of your YAML file.
   </Step>
 </Steps>
 
 ## Instance Selection
 
-### Training Instance Types
-
-Choose based on:
-- Dataset size
-- Model size
-- Training time requirements
-- Cost constraints
+Choosing the right training instance impacts both training time and cost.
 
 Popular choices:
-- ml.p3.2xlarge (1 GPU)
-- ml.p3.8xlarge (4 GPUs)
-- ml.p3.16xlarge (8 GPUs)
 
-## Hyperparameter Tuning
+| Instance            | GPUs | Typical use-case                              |
+| ------------------- | ---- | -------------------------------------------- |
+| `ml.p3.2xlarge`     | 1    | Small to medium models (<7B parameters)       |
+| `ml.p3.8xlarge`     | 4    | Larger models / shorter turnaround            |
+| `ml.p3.16xlarge`    | 8    | Large-scale training / distributed workloads  |
 
-### Basic Parameters
+<Warning>
+Always check your current SageMaker GPU quota and request increases if necessary.
+</Warning>
 
-```yaml
-hyperparameters: !Hyperparameters
-  epochs: 3
-  learning_rate: 2e-5
-  batch_size: 32
-```
+## Hyperparameter Tuning
 
-### Advanced Tuning
+While you can hard-code values, you can also pass ranges to let SageMaker perform hyperparameter tuning (HPO). Use the `min`, `max`, or `values` keys as shown below:
 
 ```yaml
 hyperparameters: !Hyperparameters
   epochs: 3
-  learning_rate: 
+  learning_rate:
     min: 1e-5
     max: 1e-4
     scaling: log
@@ -122,9 +141,27 @@ hyperparameters: !Hyperparameters
 
 ## Monitoring Training
 
-### CloudWatch Metrics
+Magemaker streams CloudWatch metrics for every training job. Key metrics include:
+
+- `Train/Loss`
+- `Eval/Loss`
+- `LearningRate`
+- `GPUUtilization`
+
+You can access logs directly in the SageMaker console or via the AWS CLI:
 
-Available metrics:
-- Loss
-- Learning rate
-- GPU utilization
+```bash
+aws logs tail /aws/sagemaker/TrainingJobs --follow --since 1h
+```
+
+<Note>
+Job status (Started, InProgress, Completed, Failed) is also surfaced in the CLI output.
+</Note>
+
+## Cleaning Up
+
+Training jobs store artifacts (model checkpoints, logs) in S3. Delete these objects when no longer needed to avoid storage costs:
+
+```bash
+aws s3 rm --recursive s3://your-bucket/<training-job-name>
+```
diff --git a/concepts/jumpstart-custom-models.mdx b/concepts/jumpstart-custom-models.mdx
@@ -0,0 +1,88 @@
+---
+title: JumpStart & Custom Models
+description: Deploy AWS JumpStart marketplace and custom models with Magemaker
+---
+
+## Overview
+Besides Hugging Face models, Magemaker can now deploy two additional model types **on AWS SageMaker**:
+
+1. **JumpStart models** – pre-packaged models provided by Amazon or third-party sellers.
+2. **Custom models** – your own fine-tuned artefacts (.tar.gz or directory) stored locally or in S3.
+
+<GithubContribTag />
+
+---
+## 1 · Deploying a JumpStart Model
+
+### 1.1 Find the *model_id*
+Browse the [JumpStart Model Zoo](https://docs.aws.amazon.com/sagemaker/latest/dg/jumpstart-model-zoo.html) or call:
+```python
+from sagemaker.jumpstart.notebook_utils import list_jumpstart_models
+list_jumpstart_models()
+```
+
+### 1.2 Create a YAML
+```yaml
+models:
+- !Model
+  id: huggingface-text2text-flan-t5-large  # JumpStart model_id
+  source: sagemaker
+
+deployment: !Deployment
+  destination: aws
+  instance_type: ml.g5.2xlarge
+  instance_count: 1
+```
+
+### 1.3 Deploy
+```bash
+magemaker --deploy .magemaker_config/flan-t5.yaml
+```
+Magemaker handles the EULA acceptance (`accept_eula=True`) automatically.
+
+---
+## 2 · Deploying a Custom Model
+Custom models are useful when you have already trained a model locally or with SageMaker Training.
+
+### 2.1 Package Artifacts
+- For **Hugging Face** style, create a directory containing `config.json`, `tokenizer.json`, etc.  
+- Optionally compress to `model.tar.gz`.
+
+### 2.2 Upload (optional)
+If your artefact is not yet on S3, Magemaker will upload it for you.
+
+### 2.3 YAML Example
+```yaml
+models:
+- !Model
+  id: my-distilbert-finetuned
+  source: huggingface
+  location: ./artifacts/distilbert.tar.gz  # or s3://bucket/key
+
+deployment: !Deployment
+  destination: aws
+  instance_type: ml.m5.xlarge
+```
+
+### 2.4 Deploy
+```bash
+magemaker --deploy .magemaker_config/my-distilbert.yaml
+```
+Magemaker will:
+1. Upload the local artefact to `s3://<default-bucket>/models/my-distilbert-finetuned/` (if needed)
+2. Create a `HuggingFaceModel` pointing to that S3 path
+3. Spin up the endpoint.
+
+---
+## Tips & Quotas
+- **GPU quota** – JumpStart LLMs often need a GPU instance (`ml.g5.*`). Request a quota increase first.
+- **Endpoint names** – If `endpoint_name` is omitted, Magemaker appends a timestamp to ensure uniqueness.
+- **Cost control** – Delete endpoints when not in use: `magemaker --cloud aws → Delete a Model Endpoint`.
+
+---
+## FAQ
+**Q : Do JumpStart models work on GCP / Azure?**  
+*A : Not yet. JumpStart is an AWS-specific marketplace.*
+
+**Q : Can I pass inference parameters to a JumpStart model?**  
+*A : Yes. Use the same `predict:` block as with Hugging Face models.*
Original file line number	Diff line number	Diff line change
Expand Up		@@ -65,4 +65,4 @@ By contributing, you agree that your contributions will be licensed under the Ap

		## Questions?

		Feel free to contact us at [support@slashml.com](mailto:support@slashml.com) if you have any questions about contributing!
		Feel free to contact us at [support@slashml.com](mailto:support@slashml.com) if you have any questions about contributing!
Original file line number	Diff line number	Diff line change
Expand Up		@@ -165,4 +165,4 @@ We are committed to providing a welcoming and inclusive experience for everyone.

		## License

		By contributing to Magemaker, you agree that your contributions will be licensed under the Apache 2.0 License.
		By contributing to Magemaker, you agree that your contributions will be licensed under the Apache 2.0 License.