Skip to content

Commit

Permalink
Merge pull request #2007 from quixoticmonk/d-improve-awscc_sagemaker_…
Browse files Browse the repository at this point in the history
…cluster

docs: added examples for awscc_sagemaker_cluster
  • Loading branch information
ewbankkit authored Sep 12, 2024
2 parents 9c572a6 + e76b40b commit 61ad654
Show file tree
Hide file tree
Showing 14 changed files with 1,480 additions and 0 deletions.
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@
## 1.14.0 (Unreleased)

FEATURES:

* **New Resource:** `awscc_sagemaker_cluster`

## 1.13.0 (September 5, 2024)

FEATURES:
Expand Down
224 changes: 224 additions & 0 deletions docs/resources/sagemaker_cluster.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,224 @@
---
page_title: "awscc_sagemaker_cluster Resource - terraform-provider-awscc"
subcategory: ""
description: |-
Resource Type definition for AWS::SageMaker::Cluster
---

# awscc_sagemaker_cluster (Resource)

Resource Type definition for AWS::SageMaker::Cluster

## Example Usage

### Basic usage
To create a SageMaker HyperPod Cluster resource. You can find some of the lifecycle scripts at https://github.com/aws-samples/awsome-distributed-training/tree/main/1.architectures/5.sagemaker-hyperpod/LifecycleScripts/base-config.

```terraform
resource "awscc_sagemaker_cluster" "example" {
cluster_name = "example"
instance_groups = [
{
execution_role = awscc_iam_role.example.arn
instance_count = 1
instance_type = "ml.c5.2xlarge"
instance_group_name = "example"
life_cycle_config = {
source_s3_uri = "s3://${aws_s3_bucket.example.id}/config/"
on_create = "on_create_noop.sh"
}
instance_storage_configs = [{
ebs_volume_config = {
volume_size_in_gb = 30
}
}]
}
]
tags = [{
key = "ModifiedBy"
value = "AWSCC"
}]
}
resource "aws_s3_bucket" "example" {
bucket = "example"
}
resource "aws_s3_object" "script" {
bucket = aws_s3_bucket.example.id
key = "config/on_create_noop.sh"
source = "on_create_noop.sh"
}
resource "aws_s3_object" "params" {
bucket = aws_s3_bucket.example.id
key = "config/provisioning_parameters.json"
source = "provisioning_parameters.json"
}
```

### EKS orchestrator
To create a SageMaker HyperPod Cluster resource with an existing EKS cluster as the orchestrator.

```terraform
resource "awscc_sagemaker_cluster" "this" {
cluster_name = "example"
instance_groups = [
{
execution_role = awscc_iam_role.example.arn
instance_count = 1
instance_type = "ml.c5.2xlarge"
instance_group_name = "example"
life_cycle_config = {
source_s3_uri = "s3://${aws_s3_bucket.this.id}/base-config/"
on_create = "on_create_noop.sh"
}
}
]
orchestrator = {
eks = {
cluster_arn = "arn:${data.aws_partition.current.partition}:eks:${data.aws_region.current.name}:${data.aws_caller_identity.current.account_id}:cluster/hyperpod-eks-example"
}
}
vpc_config = {
security_group_ids = [var.sg_id]
subnets = [var.subnet_id]
}
tags = [{
key = "ModifiedBy"
value = "AWSCC"
}]
}
resource "aws_s3_bucket" "example" {
bucket = "example"
}
resource "aws_s3_object" "script" {
bucket = aws_s3_bucket.example.id
key = "config/on_create_noop.sh"
source = "on_create_noop.sh"
}
resource "aws_s3_object" "params" {
bucket = aws_s3_bucket.example.id
key = "config/provisioning_parameters.json"
source = "provisioning_parameters.json"
}
data "aws_caller_identity" "current" {}
data "aws_region" "current" {}
data "aws_partition" "current" {}
```

<!-- schema generated by tfplugindocs -->
## Schema

### Required

- `instance_groups` (Attributes List) The instance groups of the SageMaker HyperPod cluster. (see [below for nested schema](#nestedatt--instance_groups))

### Optional

- `cluster_name` (String) The name of the HyperPod Cluster.
- `node_recovery` (String) If node auto-recovery is set to true, faulty nodes will be replaced or rebooted when a failure is detected. If set to false, nodes will be labelled when a fault is detected.
- `orchestrator` (Attributes) Specifies parameter(s) specific to the orchestrator, e.g. specify the EKS cluster. (see [below for nested schema](#nestedatt--orchestrator))
- `tags` (Attributes Set) Custom tags for managing the SageMaker HyperPod cluster as an AWS resource. You can add tags to your cluster in the same way you add them in other AWS services that support tagging. (see [below for nested schema](#nestedatt--tags))
- `vpc_config` (Attributes) Specifies an Amazon Virtual Private Cloud (VPC) that your SageMaker jobs, hosted models, and compute resources have access to. You can control access to and from your resources by configuring a VPC. (see [below for nested schema](#nestedatt--vpc_config))

### Read-Only

- `cluster_arn` (String) The Amazon Resource Name (ARN) of the HyperPod Cluster.
- `cluster_status` (String) The status of the HyperPod Cluster.
- `creation_time` (String) The time at which the HyperPod cluster was created.
- `failure_message` (String) The failure message of the HyperPod Cluster.
- `id` (String) Uniquely identifies the resource.

<a id="nestedatt--instance_groups"></a>
### Nested Schema for `instance_groups`

Required:

- `execution_role` (String) The execution role for the instance group to assume.
- `instance_count` (Number) The number of instances you specified to add to the instance group of a SageMaker HyperPod cluster.
- `instance_group_name` (String) The name of the instance group of a SageMaker HyperPod cluster.
- `instance_type` (String) The instance type of the instance group of a SageMaker HyperPod cluster.
- `life_cycle_config` (Attributes) The lifecycle configuration for a SageMaker HyperPod cluster. (see [below for nested schema](#nestedatt--instance_groups--life_cycle_config))

Optional:

- `current_count` (Number) The number of instances that are currently in the instance group of a SageMaker HyperPod cluster.
- `instance_storage_configs` (Attributes List) The instance storage configuration for the instance group. (see [below for nested schema](#nestedatt--instance_groups--instance_storage_configs))
- `on_start_deep_health_checks` (List of String) Nodes will undergo advanced stress test to detect and replace faulty instances, based on the type of deep health check(s) passed in.
- `threads_per_core` (Number) The number you specified to TreadsPerCore in CreateCluster for enabling or disabling multithreading. For instance types that support multithreading, you can specify 1 for disabling multithreading and 2 for enabling multithreading.

<a id="nestedatt--instance_groups--life_cycle_config"></a>
### Nested Schema for `instance_groups.life_cycle_config`

Required:

- `on_create` (String) The file name of the entrypoint script of lifecycle scripts under SourceS3Uri. This entrypoint script runs during cluster creation.
- `source_s3_uri` (String) An Amazon S3 bucket path where your lifecycle scripts are stored.


<a id="nestedatt--instance_groups--instance_storage_configs"></a>
### Nested Schema for `instance_groups.instance_storage_configs`

Optional:

- `ebs_volume_config` (Attributes) Defines the configuration for attaching additional Amazon Elastic Block Store (EBS) volumes to the instances in the SageMaker HyperPod cluster instance group. The additional EBS volume is attached to each instance within the SageMaker HyperPod cluster instance group and mounted to /opt/sagemaker. (see [below for nested schema](#nestedatt--instance_groups--instance_storage_configs--ebs_volume_config))

<a id="nestedatt--instance_groups--instance_storage_configs--ebs_volume_config"></a>
### Nested Schema for `instance_groups.instance_storage_configs.ebs_volume_config`

Optional:

- `volume_size_in_gb` (Number) The size in gigabytes (GB) of the additional EBS volume to be attached to the instances in the SageMaker HyperPod cluster instance group. The additional EBS volume is attached to each instance within the SageMaker HyperPod cluster instance group and mounted to /opt/sagemaker.




<a id="nestedatt--orchestrator"></a>
### Nested Schema for `orchestrator`

Required:

- `eks` (Attributes) Specifies parameter(s) related to EKS as orchestrator, e.g. the EKS cluster nodes will attach to, (see [below for nested schema](#nestedatt--orchestrator--eks))

<a id="nestedatt--orchestrator--eks"></a>
### Nested Schema for `orchestrator.eks`

Required:

- `cluster_arn` (String) The ARN of the EKS cluster, such as arn:aws:eks:us-west-2:123456789012:cluster/my-eks-cluster



<a id="nestedatt--tags"></a>
### Nested Schema for `tags`

Required:

- `key` (String) The key name of the tag. You can specify a value that is 1 to 128 Unicode characters in length and cannot be prefixed with aws:. You can use any of the following characters: the set of Unicode letters, digits, whitespace, _, ., /, =, +, and -.
- `value` (String) The value for the tag. You can specify a value that is 0 to 256 Unicode characters in length and cannot be prefixed with aws:. You can use any of the following characters: the set of Unicode letters, digits, whitespace, _, ., /, =, +, and -.


<a id="nestedatt--vpc_config"></a>
### Nested Schema for `vpc_config`

Required:

- `security_group_ids` (List of String) The VPC security group IDs, in the form sg-xxxxxxxx. Specify the security groups for the VPC that is specified in the Subnets field.
- `subnets` (List of String) The ID of the subnets in the VPC to which you want to connect your training job or model.

## Import

Import is supported using the following syntax:

```shell
$ terraform import awscc_sagemaker_cluster.example "cluster_arn"
```
1 change: 1 addition & 0 deletions examples/resources/awscc_sagemaker_cluster/import.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
$ terraform import awscc_sagemaker_cluster.example "cluster_arn"
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
resource "awscc_sagemaker_cluster" "this" {
cluster_name = "example"
instance_groups = [
{
execution_role = awscc_iam_role.example.arn
instance_count = 1
instance_type = "ml.c5.2xlarge"
instance_group_name = "example"
life_cycle_config = {
source_s3_uri = "s3://${aws_s3_bucket.this.id}/base-config/"
on_create = "on_create_noop.sh"
}
}
]
orchestrator = {
eks = {
cluster_arn = "arn:${data.aws_partition.current.partition}:eks:${data.aws_region.current.name}:${data.aws_caller_identity.current.account_id}:cluster/hyperpod-eks-example"
}
}
vpc_config = {
security_group_ids = [var.sg_id]
subnets = [var.subnet_id]
}

tags = [{
key = "ModifiedBy"
value = "AWSCC"
}]

}

resource "aws_s3_bucket" "example" {
bucket = "example"
}

resource "aws_s3_object" "script" {
bucket = aws_s3_bucket.example.id
key = "config/on_create_noop.sh"
source = "on_create_noop.sh"
}

resource "aws_s3_object" "params" {
bucket = aws_s3_bucket.example.id
key = "config/provisioning_parameters.json"
source = "provisioning_parameters.json"
}

data "aws_caller_identity" "current" {}
data "aws_region" "current" {}
data "aws_partition" "current" {}
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
resource "awscc_sagemaker_cluster" "example" {
cluster_name = "example"
instance_groups = [
{
execution_role = awscc_iam_role.example.arn
instance_count = 1
instance_type = "ml.c5.2xlarge"
instance_group_name = "example"
life_cycle_config = {
source_s3_uri = "s3://${aws_s3_bucket.example.id}/config/"
on_create = "on_create_noop.sh"
}
instance_storage_configs = [{
ebs_volume_config = {
volume_size_in_gb = 30
}
}]
}
]

tags = [{
key = "ModifiedBy"
value = "AWSCC"
}]

}

resource "aws_s3_bucket" "example" {
bucket = "example"
}

resource "aws_s3_object" "script" {
bucket = aws_s3_bucket.example.id
key = "config/on_create_noop.sh"
source = "on_create_noop.sh"
}

resource "aws_s3_object" "params" {
bucket = aws_s3_bucket.example.id
key = "config/provisioning_parameters.json"
source = "provisioning_parameters.json"
}
Loading

0 comments on commit 61ad654

Please sign in to comment.