Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AutoScalingGroups support #57

Closed
ackward opened this issue Nov 30, 2020 · 13 comments · Fixed by #58
Closed

AutoScalingGroups support #57

ackward opened this issue Nov 30, 2020 · 13 comments · Fixed by #58
Assignees
Labels
backend backend server feature-request help wanted Extra attention is needed release-3.4

Comments

@ackward
Copy link
Contributor

ackward commented Nov 30, 2020

Hi!,

First of all, thanks and congrats for this superb tool :)

I don't know if it's too much work but I miss the lack of ASG (AutoScalingGroup) support, most of my EC2 instances are part of an ASG, as they follow the same design model for nonprod and prod. We have scheduled actions that stops them at the evening and starts them at the morning but it's inneficient as most of them could be left stopped or even better, not start them and let the developers (or app owners) start them from a web frontend.

aws-power-toogle could stop the ASG just setting the DesiredCount to 0 and for starting the instances, setting the DesiredCount to 1 (or to the original desired size if it was cached). Managed by the same tags that for EC2.

@gbolo
Copy link
Owner

gbolo commented Dec 1, 2020

Hi @ackward

Thanks for the feature request. I personally have no worked with ASG before, but I'll take a look at this for you. Your design thoughts around the implementation make total sense to me as well. I'll dig through the API and see how much effort this would take. Hopefully I won't have to make any UI changes.

@gbolo gbolo self-assigned this Dec 1, 2020
@gbolo gbolo added enhancement New feature or request backend backend server feature-request help wanted Extra attention is needed and removed enhancement New feature or request labels Dec 1, 2020
@gbolo
Copy link
Owner

gbolo commented Dec 1, 2020

Looked into this. My thoughts so far:

I see an API call for retrieving ASG, but unlike the EC2 Instance call it does not allow me to filter (by tags). This seems a bit inefficient to me, especially when there is a ton of ASGs that are not tag compatible with aws-power-toggle (not too big of a deal though). Also I will have to write some paging logic, since the max ASGs it can return is 100 (which I don't know if that's a small number or not?).

Once I get these ASGs that I filtered out manually via tags, I can add a new object to the cache to keep track of them. If the ASG had a desired count greater than 0 then I can also cache that. When a user decides to power toggle down an environment, I can make an API call with the ASG name and set the desired count to 0 (while still keeping the original non-0 value if it existed). I must then loop through all instances and also issue a shutdown since I am currently not associating instances with an ASG (even though aws is). I could spend some time on this and actually create the association myself in the cache, this would all me to avoid issuing shutdown calls to instances that belong to an ASG that I already zeroed the desire count to. @ackward please confirm if this logic makes any sense :)

https://godoc.org/github.com/aws/aws-sdk-go-v2/service/autoscaling#Client.DescribeAutoScalingGroups

func (c *Client) DescribeAutoScalingGroups(ctx context.Context, params *DescribeAutoScalingGroupsInput, optFns ...func(*Options)) (*DescribeAutoScalingGroupsOutput, error)
type DescribeAutoScalingGroupsInput struct {

    // The names of the Auto Scaling groups. By default, you can only specify up to 50
    // names. You can optionally increase this limit using the MaxRecords parameter. If
    // you omit this parameter, all Auto Scaling groups are described.
    AutoScalingGroupNames []string

    // The maximum number of items to return with this call. The default value is 50
    // and the maximum value is 100.
    MaxRecords *int32

    // The token for the next set of items to return. (You received this token from a
    // previous call.)
    NextToken *string
}

type DescribeAutoScalingGroupsOutput struct {

    // The groups.
    //
    // This member is required.
    AutoScalingGroups []types.AutoScalingGroup

    // A string that indicates that the response contains more items than can be
    // returned in a single response. To receive additional items, specify this string
    // for the NextToken value when requesting the next set of items. This value is
    // null when there are no more items to return.
    NextToken *string

    // Metadata pertaining to the operation's result.
    ResultMetadata middleware.Metadata
}

https://godoc.org/github.com/aws/aws-sdk-go-v2/service/autoscaling#Client.SetDesiredCapacity

func (c *Client) SetDesiredCapacity(ctx context.Context, params *SetDesiredCapacityInput, optFns ...func(*Options)) (*SetDesiredCapacityOutput, error)
type SetDesiredCapacityInput struct {

    // The name of the Auto Scaling group.
    //
    // This member is required.
    AutoScalingGroupName *string

    // The desired capacity is the initial capacity of the Auto Scaling group after
    // this operation completes and the capacity it attempts to maintain.
    //
    // This member is required.
    DesiredCapacity *int32

    // Indicates whether Amazon EC2 Auto Scaling waits for the cooldown period to
    // complete before initiating a scaling activity to set your Auto Scaling group to
    // its new capacity. By default, Amazon EC2 Auto Scaling does not honor the
    // cooldown period during manual scaling activities.
    HonorCooldown *bool
}

So it's certainly doable, but it would take significant time for me to properly implement since I have to experiment with ASGs as well as write the code. I will provide further updates when I make any progress (but no promises). Obviously help would be greatly appreciated if anyone else is interested in this feature.

@ackward
Copy link
Contributor Author

ackward commented Dec 1, 2020

Thanks for looking it :)
I think it can be simplier.

  1. ASG at the end are still EC2 running, so if they have the power-toggle tag to true and the env tag they are already collected, the main difference is that you shouldn't stop/start them manually as they are managed by the ASG
    This is how they appear now, like another EC2 (because at the end is an EC2)
    image

  2. If a EC2 is part of a ASG, it has a tag called "aws:autoscaling:groupName" with the ASG GroupName

  3. The main difference in operative is that stoping an ASG, terminates the EC2 so the app doesn't see them anymore and they can't be started

  4. Also, instead of having N instances per ASG, they can be grouped in 1 (virtual) EC2 with the sum of all the cores and memory as their EC2 id it's not going to be used. Technically an ASG can have different instance types, it isn't our case as I think it's too complex too manage but it can happen so the instance type could be mismatched)

IMHO the app shouldn't poll ASGs, when polling EC2 it should detect the aws tag and then cache those ASGs, or create that virtual EC2 record. These virtual EC2 records (they don't have a ec2 id but an asg group name) should be mantained even if there are no EC2 running (ASG in 0 mode)

Start and Stop is the same handler that the EC2 only detecting that the record is virtual (no id or aws groupname exists) and then execute the ASG on/of code instead of the EC2 default one.

When stopping is a matter of just putting the desire to 0 knowing the AutoScalingGroupName *string (i.e. a property of the virtual record)
Starting will work if the virtual EC2 records aren't deleted (ASG EC2 are terminated when not in use),so you can still start them in the UI.
In this model, to get an ASG listed first you are going to need at least an EC2 running (to get the data) but I don't think it's too much problem to get it working at least once.

There shouldn't be any need to loop through all the instance to shutdown them. If they are part of an ASG, they will be terminated if the desiredsize is set to 0. If set to >1, ASG will create them and the EC2 poll routine + asg detect code will work seamlessly.

Also there shouldn't be any need on modifying the frontend as these changes are only in the backend and there shouldn't be any format difference if you show the ASG as EC2.

Just a few ideas, the main one is that you have the ASG group name as tag in the EC2.

@gbolo
Copy link
Owner

gbolo commented Dec 2, 2020

Thanks for the detailed implementation plan @ackward, that really helped in shedding more light on how to handle this. That extra tag aws:autoscaling:groupName makes a huge difference.

I can see one problem though. Let's say an ASG has been shutdown (so desired state is set to 0). Then we restart aws-power-toggle. Unfortunately, we only discover environments based on instances (current design anyways), so if we don't poll for ASGs we can't interact with it again since no instance will have a tag aws:autoscaling:groupName. How should we deal with this situation?

@ackward
Copy link
Contributor Author

ackward commented Dec 2, 2020

Yes, that's correct.
IMHO the easiest and most elegant way it's using a DynamoDB table. Save the only ASG (no need to save EC2 data) in the table when the cache is updated with an ASG, and read the table when starting the app. Add a property for the DynamoDB table name to the yaml config file and use this persistent functionality if it's declared.
A quick cost estimate with 10000 writes/1000 reads per month and 1Gb of storage is around $0.29 month

My plan is to run the tool as an ECS fargate (using env variables to overwrite the yaml config file), everything declared inside a CFT, the IAM role and the table will also be provisioned in just a CF Stack.

@gbolo
Copy link
Owner

gbolo commented Dec 2, 2020

Yea I don't think adding a state requirement is worth it just for this feature. I'm pretty happy keeping it stateless. I can work on adding this feature as experimental (can be enabled by config flag). It will poll the ASGs and present them as a single instance with combined cpu/memory stats.

I will look into this when I get some time, I will also need to experiment a bit with ASGs ;)

@ackward
Copy link
Contributor Author

ackward commented Dec 2, 2020

Thanks for your time, if you are interested in the CFT for ECS Fargate I can upload it somewhere.
I don't dare to show code in go, a pity I could have contribute to the project, but in Python polling those ASGs is something like:

session = boto3.Session(profile_name='%s' % profile)
auto = session.client('autoscaling',region_name='eu-west-1')

page_iterator = auto.get_paginator('describe_auto_scaling_groups').paginate(PaginationConfig={'PageSize': 100} )

for page in page_iterator:
  for asg in page["AutoScalingGroups"]:
    if any(d["Key"] == 'power-toggle-enabled' and d["Value"] == 'true' for d in asg["Tags"]): print (asg["AutoScalingGroupName"])

The tag "power-toggle-enabled" must be declared in the ASG, not only in the LaunchTemplate or LaunchConfiguration.
The asg variable has the following structure. If their EC2 are stopped the variable DesiredCapacity is 0 and Instances are [] (an empty list), the instance_type can vary as ASG supports weighted sets, i.e 60% on demand type X, 40% on spot type Y (we don't use it but it's possible, just for avoiding bugs thinking all of them are similar)

image

@ackward
Copy link
Contributor Author

ackward commented Dec 3, 2020

FYI, before you spend a lot of time on this. I've managed to get it working. It needs quite a bit of cleaning/documentation and don't expect great quality code (first time coding in go, the last time I did something in C-like was 20 years ago)
I'll push it to a fork so you can decide what to do with it.

@gbolo
Copy link
Owner

gbolo commented Dec 4, 2020

Hey @ackward, I would love to review your code and both welcome and appreciate your contributions. Please feel free to push a fork at your earliest convenience.

@ackward
Copy link
Contributor Author

ackward commented Dec 4, 2020

Toogle by env is still not supported, I need to look at it.
The code at the moment can be found in: master...ackward:asg_support

@gbolo
Copy link
Owner

gbolo commented Dec 7, 2020

Toogle by env is still not supported, I need to look at it.
The code at the moment can be found in: master...ackward:asg_support

Thanks @ackward
sorry for delay, I'm on "vacation" but I'll take a look shortly

@gbolo
Copy link
Owner

gbolo commented Dec 7, 2020

Hi @ackward I looked over your branch and added some stuff, (I couldn't commit it to your fork). Your code was great BTW. Please try building it and seeing if it works to your liking: https://github.com/gbolo/aws-power-toggle/tree/asg_support

gbolo added a commit that referenced this issue Dec 9, 2020
* ASG Support

* CloudFormation Example

* fix env functionality

* fix ever-increasing cache table when ASG support is turned off

* fix broken test case

* update docs to describe ASG functionality

Co-authored-by: n0251612 <g.gainza@libertyseguros.es>
Co-authored-by: ackward <ackward@gmail.com>
@gbolo
Copy link
Owner

gbolo commented Dec 9, 2020

@ackward I merged this into the release-3.4 branch, so this feature will be available starting in that version. Thanks for the contribution, greatly appreciated 👍

@gbolo gbolo closed this as completed Dec 9, 2020
@gbolo gbolo mentioned this issue Dec 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend backend server feature-request help wanted Extra attention is needed release-3.4
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants