Coordinating multiple VMs in a VPC

Coordinating multiple VMs in a VPC - amazon-web-services

I'm using a CloudFormation stack that deploys 3 EC2 VMs. Each needs to be configured to be able to discover the other 2, either via IP or hostname, doesn't matter.
Amazon's private internal DNS seems very unhelpful, because it's based on the IP address, which can't be known at provisioning time. As a result, I can't configure the nodes with just what I know at CloudFormation stack time.
As far as I can tell, I have a couple of options. All of them seem to me more complex than necessary - are there other options?
Use Route53, set up a private DNS hosted zone, make an entry for each of the VMs which is attached to their network interface, and then by naming the entries, I should know ahead of time the private DNS I assign to them.
Stand up yet another service to have the 3 VMs "phone home" once initialized, which could then report back to them who is ready.
Come up with some other VM-based shell magic, and do something goofy like using nmap to scan the local subnet for machines alive on a certain port.
On other clouds I've used (like GCP) when you provision a VM it gets an internal DNS name based on its resource name in the deploy template, which makes this kind of problem extremely trivial. Boy I wish I had that.
What's the best approach here? (1) seems straightforward, but requires people using my stack to have extra permissions they don't really need. (2) is extra resource usage that's kinda wasted. (3) Seems...well goofy.

Use Route53, set up a private DNS hosted zone, make an entry for each of the VMs which is attached to their network interface, and then by naming the entries
This is the best solution, but there's a simpler implementation.
Give each of your machines a "resource name".
In the CloudFormation stack, create a AWS::Route53::RecordSet resource that associates a hostname based on that "resource name" to the EC2 instance via its logical ID.
Inside your application, use the resource-name-based hostname to access the other isntance(s).
An alternative may be to use an Application Load Balancer, with your application instances in separate target groups. The various EC2 instances then send all traffic through the ALB, so you only have one reference that you need to propagate (and it can be stored in the UserData for the EC2 instance). But that's a lot more work.
This assumes that you already have the private hosted zone set up.

I think what you are talking about is known as service discovery.
If you deploy the EC2 instances in the same subnet in the same VPC with the same security group that allows the port the want to communicate over, they will be "discoverable" to each other.
You can then take this a step further. If autoscaling is on the group and machines die and respawn they can write there IPs into a registry i.e. dynamo so that other machines will know where to find them.

Related

Link between two containers in two different ECS clusters

I'm looking for the best way to get access to a service running in container in ECS cluster "A" from another container running in ECS cluster "B".
I don't want to make any ports public.
Currently I found a way to have it working in the same VPC - by adding security group of instance of cluster "B" to inbound rule of security group of cluster "A", after that services from cluster "A" are available in containers running in "B" by 'private ip address'.
But that requires this security rule to be added (which is not convenient) and won't work for different regions. Maybe there's better solution which covers both cases - same VPC and region and different VPCs and regions?

The most flexible solution for your problem is to rely on some kind of service discovery. The AWS-native one would be using Route 53 Service Registry or AWS Cloud Map. The latter one is newer and also the one recommended in the docs. Checkout these two links:
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/service-discovery.html
https://aws.amazon.com/blogs/aws/amazon-ecs-service-discovery/
You could go for open source solutions like Consul.
All this could be overkill if you just need to link two individual containers. In this case you could create a small script that could be deployed as a Lambda that queries the AWS API and retrieves the target info.
Edit: Since you want to expose multiple ports on the same service you could also use load balancer and declare multiple target groups for your service. This way you could communicate between containers via the load balancer. Notice that this can lead to increased costs because traffic goes through the lb.
Here is an answer that talks about this approach: https://stackoverflow.com/a/57778058/7391331

To avoid adding custom security rules, you could simply perform some VPC peering between regions, which should allow instances in VPC 1 from Region A, view instances in VPC 2 from Region B. This document describes how such connectivity may be established. The same document provides references on how to link VPCs in the same region as well.

Static IP for outbound API calls

A new api service we use requires that we give them a list of all the IP addresses our calls will be coming from; if we make an api call from any other IP address, the call will fail.
This question has been asked before here, but I'm wondering if in 2019 there is any simpler/easier/lower cost solution.
Our Setup
Elastic Beanstalk, which currently scales to anywhere from 5 - 50 ec2 instances for our web application based on traffic
An Application Load Balancer
Also have a worker tier, which would be available for use if that might be helpful
Typically these api calls would be coming from any of our web tier ec2 instances, as the calls will be based on a user interaction. We can of course set up something different, e.g. have the worker tier make the calls
Solutions I've Found
Give each ec2 instance an elastic (static) ip address. This is not a great solution for us, because as we hopefully continue to scale the number of ip addresses needed will continue to grow {ref}
Set up two NAT instances (one not being sufficient as it would be a single point of failure). I'm hoping there is something simpler and lower cost than this option. {ref} {ref}
Create new ec2 instances and put them behind a Network Load Balancer. Again, complex and costly. {ref}
Are there any new, easier, less costly solutions? I have never used AWS Lambda before; maybe it is be possible to run Lambda functions all from one IP address? I don't have many ideas beyond that at this point. Thanks for your time.

A NAT is the best solution, and shouldn't cost you much more than a web-server.
The simplest way to use a NAT is the NAT Gateway. Pricing depends on region, but it's around $0.05/hour, which is a little more than the price of a t3.medium EC2 instance. You're also charged a per-GB rate for data, which can add up quickly. On the positive side, Amazon manages the infrastructure for you, including patches and high-availability.
A NAT Instance is an EC2 instance running a specially-configured AMI. You could probably get away with running this on a t3.micro instance, at $0.01 per hour, which is probably much less than any of your webservers. You will be responsible for applying patches and waking up in the middle of the night if anything goes wrong.
You can probably get away with a single NAT, of either type. You will pay for cross-AZ traffic by doing this ($0.01/GB), so it will be false economy if you move a lot of data across the NAT. It's a tossup on whether you'll get higher availability from two NATs, because you can only reference one at a time in your routing tables. So if one goes down you'll have to update the routing tables to point at the other, which will probably take as much time as bringing up a new instance.
You can't use a Lambda, because it needs to have a permanent IP address assignment and you can't control that with Lambda. You could write your own proxy server, running on EC2, but the costs for that are the same as a NAT Instance.

Here is prescriptive guidance from AWS: https://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/generate-a-static-outbound-ip-address-using-a-lambda-function-amazon-vpc-and-a-serverless-architecture.html
"This pattern describes how to generate a static outbound IP address in the Amazon Web Services (AWS) Cloud by using a serverless architecture..."
Essentially, you have an AWS Lambda function that uses an Elastic IP address as the outbound IP address. In the guidance, you will create "a Lambda function and a virtual private cloud (VPC) that routes outbound traffic through an internet gateway with a static IP address. To use the static IP address, you attach the Lambda function to the VPC and its subnets. "

Amazon RDS Aurora master/replica access restrictions

I am running a DB cluster with two instances at Amazon RDS Aurora. One instance is the master, the other instance is a read-only replica. The purpose of the replica is to allow a third party application access to certain tables of the database for reporting. Therefore, the reporting tool accesses the read-only cluster endpoint, which works perfectly fine. In order to achieve zero-downtime maintenance, AWS promotes the "replica" to be the "master" at any time. That's pretty cool and does not affect the reporting tool, because it accesses the cluster-ro endpoint, which always routes the traffic to the correct (read-only) replica.
However, this means I have to enable the "Publicly accessible: Yes" flag on both instances, so that the reporting tool (which is located outside the VPC) has access to all instances, because I can not predict which instance becomes the master or replica, correct?
I'd prefer, that the "master" instance (whatever instance that is) can only be accessed from inside the VPC. How can I achieve that?
My understand is that every change I do on the "master" instance, is automatically done on the replica(s), including adding/removing security groups for example. So if I open the firewall to allow access to the replica(s) for the reporting tool, the same IP addresses can also access the normal cluster endpoint and instance (not only the cluster-ro endpoint). How can I prevent that?

You will need to build something custom for this unfortunately. Few options to consider from a design point of view are as follows:
Aurora cluster shares the security group settings across all instance like you called out. If you want to have custom settings, then what you could consider is making your whole cluster VPC only, and then have either ALBs or EC2 proxies that forward requests to your DB instance(s). You can then have multiple of these "proxies" and associate separate security groups for each of them.
One big callout with this sort of an architecture is that you need to make sure that you take care of failovers cleanly. Your proxies should always talk to the cluster endpoints and never to the instance endpoints, as instances can change from READER to WRITER behind the scenes. For example, ALBs do no let you create listeners that forward requests to a DNS, they only work with IPs. This means that you would need additional infrastructure to monitor the IPs of readers and writers and keep the ALB updated.
EC2 proxies are definitely a better option for such a design, with the caveat of added cost. I can go into more details if you have specific questions around this setup. This is definitely a summary of the approach, and not prod ready.
On the same note, why can't you use read restricted db users instead and keep the cluster public (with ssl enabled of course)?

Permanently binding static IP to preemptible google cloud VM

For our project we need a static IP binding to our Google Cloud VM instance due to IP whitelisting.
Since it's a managed group preemptible, the VM will terminate once in a while.
However, when it terminates I see in the operations log compute.instances.preempted directly followed by compute.instances.repair.recreateInstance with the note:
Instance Group Manager 'xxx' initiated recreateInstance on instance
'xxx'.
Reason: instance's intent is RUNNING but instance's status is
STOPPING.
After that follows a delete and a insert operation in order to restore the instance.
The documentation states:
You can simulate an instance preemption by stopping the instance.
In which case the IP address will stay attached when the VM is started again.
A) So my question, is it possible to have the instance group manager stop and start the VM in the event of preemption, instead of recreating? Since recreating means that the static IP will be detached and needs to be manually attached each time.
B) If option A is not possible, how can I attach the static IP address automatically so that I don't have to attach it manually when the VM is recreated? I'd rather not have an extra NAT VM instance to take care of this problem.
Thanks in advance!

I figured out a workaround to this (specifically, keeping a static IP address assigned to a preemptible VM instance between recreations), with the caveat that your managed instance group has the following properties:
Not autoscaling.
Max group size of 1 (i.e. there is only ever meant to be one VM in this group)
Autohealing is default (i.e. only recreates VMs after they are terminated).
The steps you need to follow are:
Reserve a static IP.
Create an instance template, configured as preemptible.
Create your managed group, assigning your template to the group.
Wait for the group to spin up your VM.
After the VM has spun up, assign the static IP that you reserved in step 1 to the VM.
Create a new instance template derived from the VM instance via gcloud (see https://cloud.google.com/compute/docs/instance-templates/create-instance-templates#gcloud_1).
View the newly create instance template in the Console, and note that you see your External IP assigned to the template.
Update the MiG (Managed Instance Group) to use the new template, created in step 6.
Perform a proactive rolling update on the MiG using the Replace method.
Confirm that your VM was recreated with the same name, the disks were preserved (or not, depending on how you configured the disks in your original template), and the VM has maintained its IP address.
Regards to step 6, my gcloud command looked like this:
gcloud compute instance-templates create vm-template-with-static-ip \
--source-instance=source-vm-id \
--source-instance-zone=us-east4-c
Almost goes without saying, this sort of setup is only useful if you want to:
Minimize your costs by using a single preemptible VM.
Not have to deal with the hassle of turning on a VM again after it's been preempted, ensuring as much uptime as possible.
If you don't mind turning the VM back on manually (and possibly not being aware it's been shutdown for who knows how long) after it has been preempted, then do yourself a favor and don't bother with the MiG and just standup the singular VM.

Answering your questions:
(A) It is not possible at the moment, and I am not sure if it will ever be possible. By design preemptible VMs are deleted to make space for normal VMs (if there are capacity constraints in the given zone) or regularly to differentiate them from normal VMs. In the latter case preemption might seem like a start/stop event, but in the former it may take a substantial amount of time before the VM is recreated.
(B) At the moment there is not good way to achieve it in generality.
If you have a special case where your group has only one instance you can hardcode the IP address in the Instance Template
Otherwise at the moment the only solution I can think of (other than using a Load Balancer) is to write a startup script that would attach the NAT IP.

I've found one way that ensures that all VM's in your network have the same outgoing IP address. Using Cloud NAT you can assign a static IP which all VM's will use, there is a downside though:
GCP forwards traffic using Cloud NAT only when there are no other
matching routes or paths for the traffic. Cloud NAT is not used in the
following cases, even if it is configured:
You configure an external IP on a VM's interface.
If you configure an external IP on a VM's interface, IP packets with the VM's internal IP as the source IP will use the VM's
external IP to reach the Internet. NAT will not be performed on
such packets. However, alias IP ranges assigned to the interface
can still use NAT because they cannot use the external IP to reach
the Internet. With this configuration, you can connect directly to
a GKE VM via SSH, and yet have the GKE pods/containers use Cloud
NAT to reach the Internet.
Note that making a VM accessible via a load balancer external IP does not prevent a VM from using NAT, as long as the VM network
interface itself does not have an external IP address.
Removing the VM's external IP also prevents you from direct SSH access to the VM, even SSH access from the gcloud console itself. The quote above shows an alternative with a load balancer, another way is a bastion, but doesn't directly solve access from for example Kubernetes/kubectl.
If that's no problem for you, this is the way to go.

One solution is to let the instances have dynamically chosen ephemeral IPs, but set the group as the target of a Load Balancer with a static IP. This way even when instances are created or destroyed, the LB acts as a frontend keeping the IP continious over time.

AWS architecture with limited elastic IPs

Right now our small-ish business has 3 clients who we have assigned to 3 elastic IPs in Amazon Web Services (AWS).
If we restart an instance no one loses access because the IPs are the same after restart.
Is there a way to handle expanding to 3 more clients without having things fall apart if there's a restart?
I'm trying to request more IPs, but they suggest it depends on our architecture, and I'm not sure what architecture they're looking for (or why some would warrant more elastic IPs than others or if this is an unchecked suggestion box).
I realize this is a very basic question, but googling around only gets me uninformative docs from the vendors mouth.
EDIT:
There is a lot of content on the interwebs (mostly old) about AWS supporting IPv6, but that functionality appears to be deprecated.

You can request more EIPs in the short run. Up to 5 EIP is free depending on your account. You should also considering using name based URLs and assign each of your clients to a subdomain, for example,
clientA.example.com
clientB.example.com
clientC.example.com
This way you will not be needing an additional IP for every client you add. Depending on your traffic, one EC2 instance can serve many clients, and as you scale, you can put multiple EC2 instances behind an AWS Elastic Load Balaner, and this will scale to serve exponentially more clients.
If the client wants to keep their servers separate and can pay for them, you can purchase EIP as many as you need. You should also consider separating database into one database instance for each client, which is probably what clients desire more than separation of IPs.
For IPv6, a quick workaround would be to use a front-end ELB that supports both IPv6 and IPv4.

If you use elastic IPs from VPC, you get 5 per region for an AWS account. See Amazon VPC Limits.
So, you can go to console and select VPC. Then click on elastic IPs, create it. Once created, assign it to a relevant instance.
So, atleast for now, you can solve the problem if you are not bothered about region.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Coordinating multiple VMs in a VPC - amazon-web-services

Related

Link between two containers in two different ECS clusters

Static IP for outbound API calls

Amazon RDS Aurora master/replica access restrictions

Permanently binding static IP to preemptible google cloud VM

AWS architecture with limited elastic IPs

Categories

Resources