Redis Autoscaling Ec2 - amazon-web-services

I need a help to Design a network that meets the following requirements:
Core network resources are duplicated in at least 2 regions
Network traffic is routed to the appropriate region, based on user location
Subnetworks are appropriately sized and secured
All devices exist such that networks can be connected to the internet and each other
Networks are tolerant of internet events, and are designed to be highly available
I will need to include in the design any device that will process information in this architecture, even if you do not have implicit control of it (routes, firewalls, NAT Gateways, Internet Gateways, etc.).
Need help in Automate the creation and deconstruction of this service. I need to use any tool like terraform/cloudformation or ansible / chef cookbooks to deploy, as long as it is expressed in code and/or configuration.
Assume the following:
The vpc and subnet already exists
Only local network access is required for all resources and need this architecture need to scale up and down using autoscaling launch configuration when it meets the threshold based on load and alert sent through cloudwatch/sns.

Does this yaml automation code help this is what I am going to try..Please let me know your input
Usage
resource "aws_sns_topic" "global" {
...
}
resource "aws_elasticache_subnet_group" "redis" {
...
}
resource "aws_elasticache_parameter_group" "redis" {
...
}
module "cache" {
source = "github.com/nazeerahamed79/terraform-aws-redis-elasticache"
vpc_id = "vpc-20f74844"
cache_identifier = "cache"
automatic_failover_enabled = "false"
desired_clusters = "1"
instance_type = "cache.t2.micro"
engine_version = "3.2.4"
parameter_group = "${aws_elasticache_parameter_group.redis.name}"
subnet_group = "${aws_elasticache_subnet_group.redis.name}"
maintenance_window = "sun:02:30-sun:03:30"
notification_topic_arn = "${aws_sns_topic.global.arn}"
alarm_cpu_threshold = "75"
alarm_memory_threshold = "10000000"
alarm_actions = ["${aws_sns_topic.global.arn}"]
project = "Redis_deployment"
environment = "Redis_deployment"
}
Variables
vpc_id - ID of VPC meant to house the cache
project - Name of the project making use of the cluster (default: Redis_deployment)
environment - Name of environment the cluster is targeted for (Redis_Multi_azdeployment: Unknown)
cache_identifier - Name used as ElastiCache cluster ID
automatic_failover_enabled - Flag to determine if automatic failover should be enabled
desired_clusters - Number of cache clusters in replication group
instance_type - Instance type for cache instance (default: cache.t2.micro)
engine_version - Cache engine version (default: 3.2.4)
parameter_group - Cache parameter group name (default: redis3.2)
subnet_group - Cache subnet group name
maintenance_window - Time window to reserve for maintenance
notification_topic_arn - ARN to notify when cache events occur
alarm_cpu_threshold - CPU alarm threshold as a percentage (default: 75)
alarm_memory_threshold - Free memory alarm threshold in bytes (default: 10000000)
alarm_actions - ARN to be notified via CloudWatch when alarm thresholds are triggered
Outputs
id - The replication group ID
cache_security_group_id - Security group ID of the cache cluster
port - Port of replication group leader
endpoint - Public DNS name of replication group leader

Related

Create new security group for redshift and apply using Terraform

I'm quite new to Terraform, and struggling with something.
I'm playing around with Redshift for a personal project, and I want to update the inbound security rules for the default security group which is applied to Redshift when it's created.
If I were doing it in AWS Console, I'd be adding a new inbound rule with Type being All Traffic and Source being Anywhere -IPv4 which adds 0.0.0.0/0.
Below in main.tf I've tried to create a new security group and apply that to Redshift, but I get a VPC-by-Default customers cannot use cluster security groups error.
What is it I'm doing wrong?
resource "aws_redshift_cluster" "redshift" {
cluster_identifier = "redshift-cluster-pipeline"
skip_final_snapshot = true terraform destroy
master_username = "awsuser"
master_password = var.db_password
node_type = "dc2.large"
cluster_type = "single-node"
publicly_accessible = "true"
iam_roles = [aws_iam_role.redshift_role.arn]
cluster_security_groups = [aws_redshift_security_group.redshift-sg.name]
}
resource "aws_redshift_security_group" "redshift-sg" {
name = "redshift-sg"
ingress {
cidr = "0.0.0.0/0"
}
The documentation for the Terraform resource aws_redshift_security_group states:
Creates a new Amazon Redshift security group. You use security groups
to control access to non-VPC clusters
The error message you are receiving is clearly staging that you are using the wrong type of security group, and you need to use a VPC security group instead. Once you create the appropriate VPC security group, you would set it in the aws_redshift_cluster resource via the vpc_security_group_ids property.

Exposing a ECS Service to the net

I have created a ECS cluster and created a number of services. But I want one of the services be accessed to the outside world. That service will then interact with the other services.
Created an ECS cluster
Created services.
Created the apps loaded into a docker container.
I updated the security group to allow outside access
But under network interfaces on my console I cant find any reference to my security group I created. The security groups created are there.
resource "aws_ecs_service" "my_service" {
name = "my_service"
cluster = aws_ecs_cluster.fetcher_service.id
task_definition = "${aws_ecs_task_definition.my_service.family}:${max(aws_ecs_task_definition.my_service.revision, data.aws_ecs_task_definition.my_service.revision)}"
desired_count = 0
network_configuration {
subnets = var.vpc_subnet_ids
security_groups = var.zuul_my_group_ids
assign_public_ip = true
}
}
Am I missing any steps
If desired count is set to 0, probably no containers will be spun up in the first place and no network interfaces will be allocated. Maybe that's the issue.
Set the desires count to something larger than zero to test this.
Thank you tp LRuttens answer. I set desired count to 1. and under network instances I see a network associated with my securitygroup for that ECS service,

Adding new AWS EBS Volume to ASG in same AZ

ok, so I am trying to attach an EBS volume which I have created using Terraform to an ASG's instance using userdata, but now issue is both are in different AZ's, due to which, it failing to attach. Below is the steps I am trying and failing:
resource "aws_ebs_volume" "this" {
for_each = var.ebs_block_device
size = lookup(each.value,"volume_size", null)
type = lookup(each.value,"volume_type", null)
iops = lookup(each.value, "iops", null)
encrypted = lookup(each.value, "volume_encrypt", null)
kms_key_id = lookup(each.value, "kms_key_id", null)
availability_zone = join(",",random_shuffle.az.result)
}
In above resource, I am using random provider to get one AZ from list of AZs, and same list is provided to ASG resource below:
resource "aws_autoscaling_group" "this" {
desired_capacity = var.desired_capacity
launch_configuration = aws_launch_configuration.this.id
max_size = var.max_size
min_size = var.min_size
name = var.name
vpc_zone_identifier = var.subnet_ids // <------ HERE
health_check_grace_period = var.health_check_grace_period
load_balancers = var.load_balancer_names
target_group_arns = var.target_group_arns
tag {
key = "Name"
value = var.name
propagate_at_launch = true
}
}
And here is userdata which I am using:
TOKEN=`curl -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600"`
instanceId = curl -H "X-aws-ec2-metadata-token: $TOKEN" http://169.254.169.254/latest/meta-data/instance-id
aws ec2 attach-volume --volume-id ${ebs_volume_id} --instance-id $instanceId --device /dev/nvme1n1
Above will attach the newly created volume, as I am passing output ${ebs_volume_id} of above resource.
But, its failing because instance and volume are in different AZs.
Can anyone help me on this as a better solution than hardcoding AZ on both ASG and Volume?
I'd have to understand more about what you're trying to do to solve this with just the aws provider and terraform. And honestly, most ideas are going to be a bit complex.
You could have an ASG per AZ. Otherwise, the ASG is going to select some AZ at each launch. And you'll have more instances in an AZ than you have volumes and volumes in other AZs with no instances to attach to.
So you could create a number of volumes per az and an ASG per AZ. Then the userdata should list all the volumes in the AZ that are not attached to an instance. Then pick the id of the first volume that is unattached. Then attach it. If all are attached, you should trigger your alerting because you have more instances than you have volumes.
Any attempt to do this with a single ASG is really an attempt at writing your own ASG but doing it in a way that fights with your actual ASG.
But there is a company who offers managing this as a service. They also help you manage them as spot instances to save cost: https://spot.io/
The elastigroup resource is an ASG managed by them. So you won't have an aws asg anymore. But they have some interesting stateful configurations.
We support instance persistence via the following configurations. all values are boolean. For more information on instance persistence please see: Stateful configuration
persist_root_device - (Optional) Boolean, should the instance maintain its root device volumes.
persist_block_devices - (Optional) Boolean, should the instance maintain its Data volumes.
persist_private_ip - (Optional) Boolean, should the instance maintain its private IP.
block_devices_mode - (Optional) String, determine the way we attach the data volumes to the data devices, possible values: "reattach" and "onLaunch" (default is onLaunch).
private_ips - (Optional) List of Private IPs to associate to the group instances.(e.g. "172.1.1.0"). Please note: This setting will only apply if persistence.persist_private_ip is set to true
stateful_deallocation {
should_delete_images = false
should_delete_network_interfaces = false
should_delete_volumes = false
should_delete_snapshots = false
}
This allows you to have an autoscaler that preserves volumes and handles the complexities for you.

Terraform - how to use the same load balancer between multiple AWS ecs_service resources?

I had a question about creating a service on AWS ECS using Terraform, and would appreciate any and all feedback, especially since I'm an AWS newbie.
I have several services in the same cluster (each service is a machine learning model). The traffic isn't that high, so I would like the same load balancer to route requests to the different services (based on a request header which specifies the model to use).
I was trying to create the services using Terraform (https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/ecs_service) but I'm having a hard time understanding the load_balancer configuration. There is no option to choose the ARN or ID of a specific load balancer, which makes me think that a separate Load Balancer is created for each service - and that sounds expensive :)
Has anyone had any experience with this, who can tell me what is wrong with my reasoning?
Thanks a lot for reading!
Fred, in the link to the documentation you've posted is the answer, let me walk you through it.
Here is how two ECS services can use a one Application Load Balancer Graphically:
The below scenario describes the configuration for one of the services, it is analogous for a second one, the only thing you wouldn't need to repeat is the Load Balancer declaration.
You can define the following:
# First let's define the Application LB
resource "aws_lb" "unique" {
name = "unique-lb"
internal = false
load_balancer_type = "application"
... #the rest of the config goes here
}
#Now let's create the target group for the service one
resource "aws_lb_target_group" "serviceonetg" {
name = "tg-for-service-one"
port = 8080 #example value
protocol = "HTTP"
... #the rest of the config goes here
}
#Now create the link between the LB and the Target Group
# also will add a rule when to forward the traffic using HTTP path /serviceone
resource "aws_alb_listener" "alb_serviceone_listener" {
load_balancer_arn = aws_alb.unique.arn # Here is the LB ARN
port = 80
protocol = "HTTP"
default_action {
target_group_arn = "${aws_alb_target_group.serviceonetg.arn}" #Here is the TG ARN
type = "forward"
}
condition {
field = "path-pattern"
values = ["/serviceone"]
}
}
#As a last step, you need to link your service with the target group.
resource "aws_ecs_service" "service_one" {
... # prior configuration goes here
load_balancer {
target_group_arn = aws_lb_target_group.serviceonetg.arn # Here you will link the service with the TG
container_name = "myservice1"
container_port = 8080
}
... #the rest of the config goes here
}
As a side note, I would template the repeating part for the services using data structures in a way you can use count or for_each to describe Target Group, Listeners and Services only once and leaving templating engine do the rest. Basically, follow the DRY principle.
I hope this can help you.

Insufficient capacity in availability zone on AWS

I got the following error from AWS today.
"We currently do not have sufficient m3.large capacity in the Availability Zone you requested (us-east-1a). Our system will be working on provisioning additional capacity. You can currently get m3.large capacity by not specifying an Availability Zone in your request or choosing us-east-1e, us-east-1b."
What does this mean exactly? It sounds like AWS doesn't have the physical resources to allocate me the virtual resources that I need. That seems unbelievable though.
What's the solution? Is there an easy way to change the availability zone of an instance?
Or do I need to create an AMI and restore it in a new availability zone?
This is not a new issue. You cannot change the availability zone. Best option is to create an AMI and relaunch the instance in new AZ, as you have already said. You would have everything in place. If you want to go across regions, see this - http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/CopyingAMIs.html
You can try getting reserved instances, which guarantee you get the instances all the time.
I fixed this eror by fixing my aws_region and availability_zone values. Once I added aws_subnet_ids, error msg showed me exactly which zone my ec2 was being created.
variable "availability_zone" {
default = "ap-southeast-2c"
}
variable "aws_region" {
description = "EC2 Region for the VPC"
default = "ap-southeast-2c"
}
data "aws_vpc" "default" {
default = true
}
data "aws_subnet_ids" "all" {
vpc_id = "${data.aws_vpc.default.id}"
}
resource "aws_instance" "ec2" {
....
subnet_id = "${element(data.aws_subnet_ids.all.ids, 0)}"
availability_zone = "${var.availability_zone}"
}