I want to make a cluster system within an AWS enterprise. The cluster will have a master node and several slaves. The slaves will connect to the master using a TCP/IP connection. There may be several clusters in our organization's AWS enterprise (eg dev1, dev2, qa1, qa2, etc).
For this particular technology, the slaves must somehow discover the IP address of the master node. What is the best practice in doing this? I had a few ideas:
Put the entire cluster in some sort of NAT'd subnet and have the master node always at a known address (eg 192.168.0.1)
Require some sort of domain name for each cluster and use DNS.
Use Eureka instead of DNS.
There may be more ideas. I'm somewhat new to AWS but not new to network topologies, so I may be going in the wrong direction. #1 about sounds to be the easiest thing to do. Are there any other ideas?
You can set arbitrary key-value pairs on your EC2 instances. So for example you could tag your instances with class=master and class=slave when you create them. Then, the other instances can use the EC2 API (using the AWS CLI, or one of the AWS SDKs) to list the instances with a certain tag and get the IP address. Here's an example using the AWS CLI:
aws ec2 describe-instances --filter Name=tag:class,Values=master \
--query Reservations[*].Instances[*].PrivateIpAddress --output text
which would return the private ip address of the master.
Another approach I've seen was to have the master write its own IP in a file in an S3 bucket, then have the nodes read the master's IP from that same file. Or it could be done using a database instead, any storage medium/location reachable by all participants will do.
Related
I am working on the AWS CLI for the first time and need help.
I want the query to get list of ec2 instances along with attached volumes and their respective type and size using AWS CLI.
Can you please help?
You would use describe-instances to obtain a list of all Amazon EC2 instances in your account in a particular Region.
You can use describe-volumes to obtain a list of Amazon EBS Volumes. There is an Attachments field that lists which EC2 instances are connected to each volume, and also fields for Size and VolumeType.
Given that you would need to join the Instance and Volume information together, it might be easier to do this from a programming language like Python rather than using the AWS CLI.
Does anyone know of an AWS CLI command that will list any running instance (run against a particular region) that doesn't have a snapshot available.
The closest command Ive found to try would be something like:
aws ec2 describe-snapshots --owner-ids self --query 'Snapshots[]' --region=us-east-1
I didn't actually get any return on it - just:
-------------------
|DescribeSnapshots|
+-----------------+
This is supposed to name every EC2 snapshot for each instance -- so I would have to subtract these ones from the entire EC2 inventory to reveal EC2 instances without.
Hence - I would like a command that would show running EC2 instances without any snapshots available -- so I can put something in place going forward.
Amazon EBS Snapshots are associated with Amazon EBS Volumes, which are associated with Amazon EC2 instances.
Therefore, you would need to write a program using an AWS SDK (I'd use Python, but there are many available) that would:
Obtain a list of all EBS Snapshots (make sure you use the equivalent to --owner-ids self), in which the return data will include the associated EBS VolumeId
Obtain a list of all EBS Volumes, in which the return data will include Attachments.InstanceId
Obtain a list of all running EC2 instances
Do a bit of looping logic to find Volumes without Snapshots, and then determine which instances are associated to those Volumes.
Note that rather than finding "instances without snapshots" it has to find "instances that have volumes without snapshots".
I don't think there is by default a CLI command that will allow you to do this. You can tag your snapshots with your instance ids for example then can query snapshots by filtering on the tags. Or you will have to use AWS SDK and create a custom script to allow you the get all instances and then check their volume ids if they have snapshots created or not.
I have an AWS account which is used for development. Because the developers are in one timezone, we switch off the resources after hours to conserve usage.
Is it possible to temporarily switch off nodes in elasticache cluster? all i found in cli reference was 'delete cluster':
http://docs.aws.amazon.com/cli/latest/reference/elasticache/index.html
ElastiCache clusters cannot be stopped. They can only be deleted and recreated. You can use this pattern to avoid paying for time when you're not using the cluster.
If you are using a Redis ElastiCache cluster, you can create a snapshot as the cluster is being deleted. Then, you can restore the cluster from the snapshot when you create it. This way, you preserve the data in the cluster.
The cluster endpoints are derived from a combination of
the cluster IDs,
the region,
the AWS account.
So as long as you delete and re-create clusters with those parts being constant, then the clusters will maintain the same endpoint.
At this time there is not a way to STOP and EMR cluster in the same
sense you can with EC2 instances. The EMR cluster uses instance-store
volumes and the EC2 start/stop feature relies on the use of EBS
volumes which are not appropriate for high-performance, low-latency
HDFS utilization.
The best way to simulate this behavior is to store the data in S3 and
then just ingest as a start up step of the cluster then save back to
S3 when done.
Documentation Reference:
https://forums.aws.amazon.com/thread.jspa?threadID=149772
Hope it helps.
EDIT1:
If you want to maintain the same dns, you can use the API/CLI to update the elastic cluster.
Reference:
http://docs.aws.amazon.com/cli/latest/reference/es/update-elasticsearch-domain-config.html
Hope it helps.
Is it possible to do AutoScaling with Static IPs in AWS ? The newly created instances should either have a pre-defined IP or pick from a pool of pre-defined IPs.
We are trying to setup ZooKeeper in production, with 5 zooKeeper instances. Each one should have a static-IP which are to hard-coded in the Kafka's AMI/Databag that we use. It should also support AutoScaling, so that if one of the zooKeeper node goes down, a new one is spawned with the same IP or from a pool of IPs. For this we have decided to go with 1 zoo-keeper instance per AutoScaling group, but the problem is with the IP.
If this is the wrong way, please suggest the right way. Thanks in advance !
One method would be to maintain a user data script on each instance, and have each instance assign itself an elastic IPs from a set of EIPs assigned for this purpose. This user data script would be referenced in the ASGs Launch Configuration, and would run on launch.
Say the user script is called "/scripts/assignEIP.sh", using the AWS CLI you would have it consult the pool to see which ones are available and which ones are not (already in use). Then it would assign itself one of the available EIPS.
For ease of IP management, you could keep the pool of IPs in a simple text properties file on S3, and have the instance download and consult that list when the instance starts.
Keep in mind that each instance will need an to be assigned IAM instance profile that will allow each instance to consult and assign EIPs to itself.
I'm trying to write a script to stop several instances in our test environment on Friday and have them start back on Monday, to save little cost.
Is there a way to stop instances by IP addresses (and not by instance ID), or some other way I don't know about? (The reason being that instance ID's may change if an instance had to be deleted and recreated.)
This is a zero code solution:
Put your instances into autoscale groups and add a shutdown and startup schedule on the autoscale group. This can be done in the AWS console.
This can also be automated using the AWS CLI.
Use EC2 Tags to give your instances key/value tag pairs, then write a script using Boto which searches for instances with the right tags, and then terminates them.
You could also use Boto to list instances matching the specific IP address, and terminate them that way.
But... IP addresses are dynamically assigned (unless you are using Elastic IPs). So why not make a note of the instance IDs when launching the instances, instead of the IP address?