I'm having an issue with AWS. I did deploy an application using Terraform, but when I try to destroy it, the process doesn't finish because of a subnet. That subnet was related to an EC2 instance that doesn't exist anymore.
If I try to remove it via AWS console it says there is a network interface using that subnet. Ok, but when I try to remove the network interface it says it is in use, but the supposed thing that could be using it, the EC2 instance, was terminated. Would you know how can I get rid of this network interface?
Thanks in advance!
I did try to remove the components individually on AWS console without success.
I think I figured out what happened. When I first run terraform apply, I had set up two availability zones. But then I decided to have just one availability zone, because I just wanted to work with one instance of the application. The point is that when using ELB, you MUST have at least two application instances, since it doesn't make sense to have a Load Balancer having just one app instance. When I run terraform apply with this new configuration, it applied the change partially, leaving an ALB instance available.
After removing the ELB from the Terraform configuration, everything worked fine!
Related
I recently took over architecture from a 3rd party to help a client. I'm new to AWS, so this is probably simple, and I just couldn't find it in the docs/stack overflow. They had an existing EC2 instance that had both a node app and a react app deployed, from different repos. Each were deployed using their own pipeline. The source, build, and deploy steps were working for both, and I verified the artifacts were being generated and stored in S3. The load balancer had a target group that hit a single machine in one subnet. The app was running just fine until this morning, and I'm trying to figure out if it's something I did.
My goal this morning was to spin up a new EC2 instance (for which I have the keys, so I can connect directly), a new load balancer that pointed to my machine, and space in S3 for new pipelines I created to store artifacts. I created an AMI from their EC2 instance with the running app and used it to provision my own on the same subnet as their instance. I used the existing security group for my machine. I created a target group to target my machine for use with my load balancer. I created a load balancer to route traffic to this new machine. I then created two pipelines, similar to theirs, but with different artifact locations in S3, and a source of my own repo where I have a copy of the code. I got deployments through the pipeline to work. Everything was great until I was about to test my system, when I was informed their app was down.
I tried hitting it and got a 502, bad gateway. I checked the load balancer and it sees traffic coming in, but gave a 502 for all responses. I checked the target group and it's now showing their EC2 instance as unhealthy. I tried rebooting the machine, but it's still unhealthy, then I tried creating another version of their machine in another subnet, and ensured it was targeted by the target group, but the new instance showed up as unhealthy as well. I can't SSH into the machine because I don't have the key used to create the EC2 instance. If anyone knows where I should look to bring it back online, I'd be forever in your debt.
I undid everything I created this morning, stopping my EC2 instance, and deleting my load balancer, but their app is still returning a 502, showing the instance as unhealthy in their target group.
These are some things to help you debug:
You first need to access the EC2 directly and not through the Load Balancer. Check that the application is running. If the EC2 is in private VPC, you can start an EC2 instance with a public IP and use it as a bastion host.
You will need to have SSH access to the EC2 machine at some point, so that you can look at the logs. This question has answers on how to replace the key pair.
I have an Elastic Beanstalk server behind an Application Load Balancer, all inside a VPC. The first call to the server after leaving it along for a while takes a very long time. It's almost as if the instance is being booted up right then! Instead of being already on...
This issue does not present locally, nor outside of a VPC, it only happens in the VPC on AWS so something in my configuration must be off.
The VPC has 3 public and 3 private subnets, in the same availability zones, and the public subnets all have auto-assign public IP on
I've assigned these to the network settings on my Elastic Beanstalk environment, assigning the public subnets to the public load balancer, and then the private subnets to the private instance.
I've set the auto scaling load balanced group with minimum of 3 instances, and confirmed they're running
Despite this, after leaving the site alone for a while... the first new call to the server consistently takes over one minute, and then works great. I assume I'm just missing something small but cannot figure out what it is...
Thanks in advance!
I am convinced this is not an application issue because, on first load the call takes over one minute, but on subsequent loads it's near instant, and this behavior is constant across days. Locally, I never have this issue. Outside a VPC, I never have this issue.
first/slow load (after leaving the app alone overnight)
second/fast load (refreshing right after the above)
UPDATE
AWS support suggested I deassociate the subnets from my route tables. I did that and now all subnets public and private are showing current routing table Main. Now though, instead of taking a long time all calls to my server are failing!
I tried attaching the internet gateway in that VPC to the routing table via edge association but I'm getting the error that
Route table contains unsupported route destination. The unsupported route destination is less specific than or non-overlapping with VPC local CIDR
There is one public subnet with overlapping CIDRs with the internet gateway (10.1.0.0/24 on the subnet and 10.1.0.0/24 on the gateway). I tried manually associating that to the Main routing table but still get the same error
I this this what a similar question. Assuming you are using Auto-Scaling most likely you will want to add a LifeCycle hook. You can do that in ElasticBeanstalk as shown here, by editing the .ebextensions/as-hook.config file. You will use the EC2_INSTANCE_LAUNCHING transition. In your hook make a request similar to what your application does which can ensure that the instance is ready to start serving traffic.
If you still have issues you could use more techniques for debugging such as X-Ray but you would need to then run the Agent and configure the sdk for your app which would be a bit of work so it may not be worth the effort. If it is simply the first request which takes a long time hooks will be the way to go.
I'm using a CloudFormation stack that deploys 3 EC2 VMs. Each needs to be configured to be able to discover the other 2, either via IP or hostname, doesn't matter.
Amazon's private internal DNS seems very unhelpful, because it's based on the IP address, which can't be known at provisioning time. As a result, I can't configure the nodes with just what I know at CloudFormation stack time.
As far as I can tell, I have a couple of options. All of them seem to me more complex than necessary - are there other options?
Use Route53, set up a private DNS hosted zone, make an entry for each of the VMs which is attached to their network interface, and then by naming the entries, I should know ahead of time the private DNS I assign to them.
Stand up yet another service to have the 3 VMs "phone home" once initialized, which could then report back to them who is ready.
Come up with some other VM-based shell magic, and do something goofy like using nmap to scan the local subnet for machines alive on a certain port.
On other clouds I've used (like GCP) when you provision a VM it gets an internal DNS name based on its resource name in the deploy template, which makes this kind of problem extremely trivial. Boy I wish I had that.
What's the best approach here? (1) seems straightforward, but requires people using my stack to have extra permissions they don't really need. (2) is extra resource usage that's kinda wasted. (3) Seems...well goofy.
Use Route53, set up a private DNS hosted zone, make an entry for each of the VMs which is attached to their network interface, and then by naming the entries
This is the best solution, but there's a simpler implementation.
Give each of your machines a "resource name".
In the CloudFormation stack, create a AWS::Route53::RecordSet resource that associates a hostname based on that "resource name" to the EC2 instance via its logical ID.
Inside your application, use the resource-name-based hostname to access the other isntance(s).
An alternative may be to use an Application Load Balancer, with your application instances in separate target groups. The various EC2 instances then send all traffic through the ALB, so you only have one reference that you need to propagate (and it can be stored in the UserData for the EC2 instance). But that's a lot more work.
This assumes that you already have the private hosted zone set up.
I think what you are talking about is known as service discovery.
If you deploy the EC2 instances in the same subnet in the same VPC with the same security group that allows the port the want to communicate over, they will be "discoverable" to each other.
You can then take this a step further. If autoscaling is on the group and machines die and respawn they can write there IPs into a registry i.e. dynamo so that other machines will know where to find them.
After CodeDeploy clones AutoScalingGroup, it leaves LoadBalancer field empty. This leads to following problem: when instance webserver dies, ELB does not understand instance is "down", this instance is not replaced automatically.
However, if i set LoadBalancer manually, it will work fine afterwards.
I watched how new ASG is cloned. There is possibility to suspend some processes while instance is booting. So as i understand, CodeDeploy suspends all actions related to ELB, because it uses his own automatic scripts to un-attach old instances and attach new ones to ELB.
fresh ASG
I dont use any custom attach or un-attach scripts myself.
Otherwise deployment runs ok, and new instances are created correctly.
I spoke with the team and apparently what's going on is that CodeDeploy now manages the load balancer for you. That is very confusing to customers to not see the ELB associated with that AutoScalingGroup. This allows CodeDeploy to control and make sure that the deployment finishes before binding the to the load balancer.
-Asaf
I have an Amazon EC2 instance with AutoScaling and Load balancer.
I deployed an application and configured Apache.
Everything went fine but Amazon for some reason terminated my instance and started a new one. I lost all the code and configuration there?
What should I do?
Maybe attach a EBS volume and deploy everything there? But my Apache server is installed on the main volume.
Can anyone help me?
If you are using autoscaling, instances will be terminated if they become unhealthy. In order to use autoscaling effectively, you should not keep any persistant data on the instance itself. This is called Shared Nothing architecture.
What you want to do, is create an AMI that has all your application and or tools to bootstrap your application. You would use this AMI as part of the launch configuration for your autoscale group. So if a new instance gets launched, either due to failure or needing to scale, your application will be back up without any interaction from you.