How to configure aws auto scaling group for scaling up/down using terraform - amazon-web-services

I have an ECS cluster (type: ec2) that has an auto-scaling group. let's say that I have for example 10 instances as a maximum and there are 6 instances as a desired count of running instances and each instance has 2 deployed services now I want to configure the auto-scaling group for dynamically scaling up / down based on the service counts and this means:
if the desired number of the deployed service is 6 and I made an update to scaling the replica number of a service. the cluster must scale up with an instance from the 10 instances for the new replica number and so on until the maximum number of 10 instances is full and if I decrease the number of replicas the cluster must terminate the unused instance and so on.
why i need this ?
because I don't wanna have any instance with a Status: Active and I don't use it. I think that any unused instance with an active status will pay for it. so if you even have a good idea or I am wrong in my thoughts tell me.
here is my configuration:
resource "aws_autoscaling_policy" "asg_policy" {
name = "asg-policy"
scaling_adjustment = 1
policy_type = "SimpleScaling"
adjustment_type = "ChangeInCapacity"
cooldown = 100
autoscaling_group_name = aws_autoscaling_group.ecs_asg.name
}
resource "aws_autoscaling_group" "ecs_asg" {
name = "ecs-asg"
vpc_zone_identifier = ["${aws_subnet.public_1.id}", "${aws_subnet.public_2.id}", "${aws_subnet.public_3.id}"]
launch_configuration = aws_launch_configuration.ecs_launch_config.name
desired_capacity = 6
min_size = 0
max_size = 10
health_check_grace_period = 100
health_check_type = "ELB"
force_delete = true
target_group_arns = [aws_lb_target_group.asg_tg.arn]
termination_policies = ["OldestInstance"]
}
i tried to configure the asg_policy but seems not working as expected.
tried to setup the max / min number but not working.
can anyone help me & tnx

Related

How to configure a CloudWatch alarm to evaluate once every X minutes

I would like to configure a CloudWatch alarm to:
sum the last 30 minutes of the ApplicationRequestsTotal metric once every 30 minutes
alarm if the sum is equal to 0
I have configured the custom CloudWatch ApplicationRequestsTotal metric to emit once every 60 seconds for my service.
I have configure the alarm as:
{
"MetricAlarms": [
{
"AlarmName": "radio-silence-alarm",
"AlarmDescription": "Alarm if 0 or less requests are received for 1 consecutive period(s) of 30 minutes.",
"ActionsEnabled": true,
"OKActions": [],
"InsufficientDataActions": [],
"MetricName": "ApplicationRequestsTotal",
"Namespace": "AWS/ElasticBeanstalk",
"Statistic": "Sum",
"Dimensions": [
{
"Name": "EnvironmentName",
"Value": "service-environment"
}
],
"Period": 1800,
"EvaluationPeriods": 1,
"Threshold": 0.0,
"ComparisonOperator": "LessThanOrEqualToThreshold",
"TreatMissingData": "missing"
}
],
"CompositeAlarms": []
}
I have set up many alarms like this and each one seems to:
sum the last 30 minutes of ApplicationRequestsTotal metric once EVERY minute
For example this service started getting 0 ApplicationRequestsTotal at 8:36a and right at 9:06a CloudWatch triggered an alarm.
The aws cloudwatch describe-alarm-history for the above time period:
{
"AlarmName": "radio-silence-alarm",
"AlarmType": "MetricAlarm",
"Timestamp": "2021-09-29T09:06:37.929000+00:00",
"HistoryItemType": "StateUpdate",
"HistorySummary": "Alarm updated from OK to ALARM",
"HistoryData": "{
"version":"1.0",
"oldState":{
"stateValue":"OK",
"stateReason":"Threshold Crossed: 1 datapoint [42.0 (22/09/21 08:17:00)] was not less than or equal to the threshold (0.0).",
"stateReasonData":{
"version":"1.0",
"queryDate":"2021-09-22T08:47:37.930+0000",
"startDate":"2021-09-22T08:17:00.000+0000",
"statistic":"Sum",
"period":1800,
"recentDatapoints":[
42.0
],
"threshold":0.0,
"evaluatedDatapoints":[
{
"timestamp":"2021-09-22T08:17:00.000+0000",
"sampleCount":30.0,
"value":42.0
}
]
}
},
"newState":{
"stateValue":"ALARM",
"stateReason":"Threshold Crossed: 1 datapoint [0.0 (29/09/21 08:36:00)] was less than or equal to the threshold (0.0).",
"stateReasonData":{
"version":"1.0",
"queryDate":"2021-09-29T09:06:37.926+0000",
"startDate":"2021-09-29T08:36:00.000+0000",
"statistic":"Sum",
"period":1800,
"recentDatapoints":[
0.0
],
"threshold":0.0,
"evaluatedDatapoints":[
{
"timestamp":"2021-09-29T08:36:00.000+0000",
"sampleCount":30.0,
"value":0.0
}
]
}
}
}"
}
What have I configured incorrectly?
That is not how Amazon CloudWatch works.
When creating an Alarm in CloudWatch, you specify:
A metric (eg CPU Utilization, or perhaps a Custom Metric being sent to CloudWatch)
A time period (eg the previous 30 minutes)
An aggregation method (eg Average, Sum, Count)
For example, CloudWatch can trigger an Alarm if the Average of the metric was exceeded over the previous 30 minutes. This is continually evaluated as a sliding window. It does not look at metrics in distinct 30-minute blocks.
Using your example, it would send an alert whenever the Sum of the metric is zero for the previous 30 minutes, on a continual basis.
I think that your answer can be found directly in the documentation that I'm going to link: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html
I'm gonna cite the docs:
When you create an alarm, you specify three settings to enable CloudWatch to evaluate when to change the alarm state:
Period is the length of time to evaluate the metric or expression to create each individual data point for an alarm. It is expressed in seconds. If you choose one minute as the period, the alarm evaluates the metric once per minute.
Evaluation Periods is the number of the most recent periods, or data points, to evaluate when determining alarm state.
Datapoints to Alarm is the number of data points within the Evaluation Periods that must be breaching to cause the alarm to go to the ALARM state. The breaching data points don't have to be consecutive, but they must all be within the last number of data points equal to Evaluation Period.
When you configure Evaluation Periods and Datapoints to Alarm as different values, you're setting an "M out of N" alarm. Datapoints to Alarm is ("M") and Evaluation Periods is ("N"). The evaluation interval is the number of data points multiplied by the period. For example, if you configure 4 out of 5 data points with a period of 1 minute, the evaluation interval is 5 minutes. If you configure 3 out of 3 data points with a period of 10 minutes, the evaluation interval is 30 minutes.

AWS Schedule AutoScaling via terraform (timezone issue)

I have a common TF code for different regions in my AWS, and similarly have a common autoscaling block in TF for all my regions.
Now I want to do time-based scheduling for my ASG, but since the timezones are different in different regions. I am not able to do the same with TF, because I do not see any option to specify timezone in aws_autoscaling_schedule resource of TF.
Following is my code :
resource "aws_autoscaling_schedule" "schedule1" {
scheduled_action_name = "scale_down"
min_size = var.min_size
max_size = var.max_size
recurrence = "0 18 * * 5"
desired_capacity = var.min_size
autoscaling_group_name = aws_autoscaling_group.example_asg.name
}
Now If you see the recurrence, I want to do scale down at 6pm as per respective region timings, but I am not able to achieve the same because the default timezone is UTC.
PS: I have different vars file for every region. And getting max_size and min_size, separately for every region.
I tried manipulating the recurrence block by passing one var in every regions's var file. But doesn't seem to work getting error as we can not pass vars in recurrence block :
resource "aws_autoscaling_schedule" "schedule1" {
scheduled_action_name = "scale_down"
min_size = var.min_size
max_size = var.max_size
recurrence = "0 var.temp_var * * 5"
desired_capacity = var.min_size
autoscaling_group_name = aws_autoscaling_group.example_asg.name
}
Any idea how can I achieve the same. Any help is appreciated.
Thanks!
To overcome your error and to correctly pass var.temp_var into recurrence you should do:
recurrence = "0 ${var.temp_var} * * 5"

AWS EC2 spot instance availability

I am using the API call request_spot_instances to create spot instance without specifying any availability zone. Normally a random AZ is picked by the API. The spot request sometimes would return a no capacity status whereas I could request for a spot instance successfully through the AWS console in another AZ. What is the proper way to check the availability of the spot instance of a specific instance type before calling the request_spot_instance?
There is no public API to check Spot Instance availability. Having said that, you can still achieve what you want by following the below steps:
Use request_spot_fleet instead, and configure it to launch a single instance.
Be flexible with the instance types you use, pick as many as you can and include them in the request. To help you pick the instances, check Spot Instance advisor for instance interruption and saving rates.
At the Spot Fleet request, configure AllocationStrategy to capacityOptimized this will allow the fleet to allocate capacity form the most available Spot instance from your instances list and reduce the likelihood of Spot interruptions.
Don't set a max price SpotPrice, the default Spot instance price will be used. The pricing model for Spot has changed and it's no longer based on bidding, therefore Spot prices are more stable and don't fluctuate.
This may be a bit overkill for what you are looking for but with parts of the code you can find the spot price history for the last hour (this can be changed). It'll give you the instance type, AZ, and additional information. From there you can loop through the instance type to by AZ. If a spot instance doesn't come up in say 30 seconds try the next AZ.
And to Ahmed's point in his answer, this information can be used in the spot_fleet_request instead of looping through the AZs. If you pass the wrong AZ or subnet in the spot fleet request, it may pass the dryrun api call, but can still fail the real call. Just a heads up on that if you are using the dryrun parameter.
Here's the output of the code that follows:
In [740]: df_spot_instance_options
Out[740]:
AvailabilityZone InstanceType SpotPrice MemSize vCPUs CurrentGeneration Processor
0 us-east-1d t3.nano 0.002 512 2 True [x86_64]
1 us-east-1b t3.nano 0.002 512 2 True [x86_64]
2 us-east-1a t3.nano 0.002 512 2 True [x86_64]
3 us-east-1c t3.nano 0.002 512 2 True [x86_64]
4 us-east-1d t3a.nano 0.002 512 2 True [x86_64]
.. ... ... ... ... ... ... ...
995 us-east-1a p2.16xlarge 4.320 749568 64 True [x86_64]
996 us-east-1b p2.16xlarge 4.320 749568 64 True [x86_64]
997 us-east-1c p2.16xlarge 4.320 749568 64 True [x86_64]
998 us-east-1d p2.16xlarge 14.400 749568 64 True [x86_64]
999 us-east-1c p3dn.24xlarge 9.540 786432 96 True [x86_64]
[1000 rows x 7 columns]
And here's the code:
ec2c = boto3.client('ec2')
ec2r = boto3.resource('ec2')
#### The rest of this code maps the instance details to spot price in case you are looking for certain memory or cpu
paginator = ec2c.get_paginator('describe_instance_types')
response_iterator = paginator.paginate( )
df_hold_list = []
for page in response_iterator:
df_hold_list.append(pd.DataFrame(page['InstanceTypes']))
df_instance_specs = pd.concat(df_hold_list, axis=0).reset_index(drop=True)
df_instance_specs['Spot'] = df_instance_specs['SupportedUsageClasses'].apply(lambda x: 1 if 'spot' in x else 0)
df_instance_spot_specs = df_instance_specs.loc[df_instance_specs['Spot']==1].reset_index(drop=True)
#unapck memory and cpu dictionaries
df_instance_spot_specs['MemSize'] = df_instance_spot_specs['MemoryInfo'].apply(lambda x: x.get('SizeInMiB'))
df_instance_spot_specs['vCPUs'] = df_instance_spot_specs['VCpuInfo'].apply(lambda x: x.get('DefaultVCpus'))
df_instance_spot_specs['Processor'] = df_instance_spot_specs['ProcessorInfo'].apply(lambda x: x.get('SupportedArchitectures'))
#look at instances only between 30MB and 70MB
instance_list = df_instance_spot_specs['InstanceType'].unique().tolist()
#---------------------------------------------------------------------------------------------------------------------
# You can use this section by itself to get the instancce type and availability zone and loop through the instance you want
# just modify instance_list with one instance you want informatin for
#look only in us-east-1
client = boto3.client('ec2', region_name='us-east-1')
prices = client.describe_spot_price_history(
InstanceTypes=instance_list,
ProductDescriptions=['Linux/UNIX', 'Linux/UNIX (Amazon VPC)'],
StartTime=(datetime.now() -
timedelta(hours=1)).isoformat(),
# AvailabilityZone='us-east-1a'
MaxResults=1000)
df_spot_prices = pd.DataFrame(prices['SpotPriceHistory'])
df_spot_prices['SpotPrice'] = df_spot_prices['SpotPrice'].astype('float')
df_spot_prices.sort_values('SpotPrice', inplace=True)
#---------------------------------------------------------------------------------------------------------------------
# merge memory size and cpu information into this dataframe
df_spot_instance_options = df_spot_prices[['AvailabilityZone', 'InstanceType', 'SpotPrice']].merge(df_instance_spot_specs[['InstanceType', 'MemSize', 'vCPUs',
'CurrentGeneration', 'Processor']], left_on='InstanceType', right_on='InstanceType')

Dynamically distribute EC2s in available Subnets via Terraform [duplicate]

This question already has an answer here:
Terraform list element out of bounds?
(1 answer)
Closed 3 years ago.
The requirement is to create EC2s from the dynamically given list instance_names and distributed them evenly in the available subnets of the VPC.
I have tried looping and conditional statements with little luck.
Use Case 01 - (In a VPC with two subnets) If we are creating 2
servers, One EC2 should be in subnet 'a' and other in subnet 'b'
Use Case 02 - (In a VPC with two subnets) If we are creating 3
servers, two EC2s need to be in subnet 'a' and the other EC2 in subnet
'b'
Control Code
module "servers" {
source = "modules/aws-ec2"
instance_type = "t2.micro"
instance_names = ["server01", "server02", "server03"]
subnet_ids = module.prod.private_subnets
}
Module
resource "aws_instance" "instance" {
count = length(var.instance_names)
subnet_id = var.subnet_ids[count.index]
tags = {
Name = var.instance_names[count.index]
}
}
You can use element to loop around the subnet_ids list and get the correct id for each aws_instance.
In the docs you can see that element will give you the desired effect because:
If the given index is greater than the length of the list then the index is "wrapped around" by taking the index modulo the length of the list
Use Case 1
-> server01 - subnet 'a' <-> element(subnet_ids,0)
-> server02 - subnet 'b' <-> element(subnet_ids,1)
Use Case 2
-> server01 | subnet 'a' <-> element(subnet_ids,0)
-> server02 | subnet 'b' <-> element(subnet_ids,1)
# loop around the subnet id list the first id again
-> server03 | subnet 'a' <-> element(subnet_ids,2)
-> server04 | subnet 'b' <-> element(subnet_ids,3)
-> etc.
So the following update to the code should work:
resource "aws_instance" "instance" {
count = length(var.instance_names)
subnet_id = element(var.subnet_ids, count.index)
tags = {
Name = var.instance_names[count.index]
}
}
I found an interesting answer from Tom Lime for a similar question. I derived an answer from it to this scenario. For the Module, we provide the below logic for the subnet_id
subnet_id = "${var.subnet_ids[ count.index % length(var.subnet_ids) ]}"

AWS Powershell - Get-CWMetricStatistics for EBS Read IOPS

I'm having some issues getting IOPs stats for EBS volumes, using this code:
Get-CWMetricList -Namespace AWS/EC2 |Select-Object * -Unique
Get-CWMetricList -Namespace AWS/EBS |Select-Object * -Unique
$StartDate = (Get-Date).AddDays(-3)
$EndDate = Get-Date
$ReadIOPS = Get-CWMetricStatistics -Namespace "AWS/EC2" -MetricName "DiskReadOps" -UtcStartTime $StartDate -UtcEndTime $EndDate -Period 300 -Statistics #("Average")
$ReadIOPS.Datapoints.Count
$ReadIOPS = Get-CWMetricStatistics -Namespace "AWS/EBS" -MetricName "VolumeReadOps" -UTCStartTime $StartDate -UTCEndTime $EndDate -Period 300 -Statistics #("Average")
$ReadIOPS.Datapoints.Count
Top 2 lines show that the Namespace/Metrics Names are correct. Rest should show that the first query in the AWS/EC2 name space gets data, however the 2nd in the AWS/EBS namespace doesn't.
The ultimate goal is to add a -dimension tag and grab all read/write iops for a particular volumed. This is why the AWS/EC2 namespace doens't work as I need to specify a volume id and not an instance ID.
Any ideas why I'm not picking up any datapoints on the latter query?
Turns out that EBS stats require a Vol ID to be specified though this is not called out or errors as such.
I had stripped out the dimension to cast as wide a net as possible/back to basics when troubleshooting. Adding that back in fixed the issue:
ie. this works
$Volume = 'vol-blah'
$dimension1 = New-Object Amazon.CloudWatch.Model.Dimension
$dimension1.set_Name("VolumeId")
$dimension1.set_Value($Volume)
$ReadIOPS = Get-CWMetricStatistics -Namespace "AWS/EBS" -MetricName "VolumeReadOps" -UTCStartTime $StartDate -UTCEndTime $EndDate -Period 300 -Statistics #("Average") -Dimension $dimension1
$ReadIOPS.Datapoints.Count