How to query for the NodeIds in an AWS Elasticsearch cluster? - amazon-web-services

I'm trying to use Boto3 to return data about the nodes in an AWS elastic search cluster like free storage space, CPU usage, etc. I know that the Ids of the nodes in the cluster can change when the cluster is restarted so I don't want to hardcode them. Is there a way to return a list of NodeIds present in the cluster so I don't have to hardcode them?

Basically Nodes information is allocated can be extracted from ES cluster by API calls only.
So to extract those values what you need to do is use https://elasticsearch-py.readthedocs.io/en/master/index.html library and connect to ES Domain. Than execute this API call(GET /_nodes) https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-nodes-info.html.
Thanks
Ashish

Related

What is the difference between nodes, cluster, and database in redshift

Reading thru the documentation of aws, im quite confused with these three concepts
Cluster: composed of one or more compute nodes; composed of one or mode database
Compute node: run the query execution plans and transmit data among themselves to serve these queries
Database: User data is stored on the compute nodes
With this it is easy to assume that a compute node and a database is the same, isn't it? But when creating a redshift cluster, a portion of it is named as database configuration but seemingly referring to cluster. Below is an image of it, if my understanding is correct from the documentation, database configuration should be referring to compute nodes and not the cluster.
With these, what exactly is a cluster, database, and a compute node?
With this it is easy to assume that a compute node and a database is the same, isn't it?
No, that's not the case. You can have single-node Redshift cluster with multiple databases, or a single (large) database hosted on multiple compute nodes.
Basically, node refers to the hardware layer of Redshift, while database refers to software layer only.
Your screenshot shows only a default database called dev. You can create many more if you want. All hosted on the same cluster.

Is it possible to stop nodes in AWS ElastiCache cluster

I have an AWS account which is used for development. Because the developers are in one timezone, we switch off the resources after hours to conserve usage.
Is it possible to temporarily switch off nodes in elasticache cluster? all i found in cli reference was 'delete cluster':
http://docs.aws.amazon.com/cli/latest/reference/elasticache/index.html
ElastiCache clusters cannot be stopped. They can only be deleted and recreated. You can use this pattern to avoid paying for time when you're not using the cluster.
If you are using a Redis ElastiCache cluster, you can create a snapshot as the cluster is being deleted. Then, you can restore the cluster from the snapshot when you create it. This way, you preserve the data in the cluster.
The cluster endpoints are derived from a combination of
the cluster IDs,
the region,
the AWS account.
So as long as you delete and re-create clusters with those parts being constant, then the clusters will maintain the same endpoint.
At this time there is not a way to STOP and EMR cluster in the same
sense you can with EC2 instances. The EMR cluster uses instance-store
volumes and the EC2 start/stop feature relies on the use of EBS
volumes which are not appropriate for high-performance, low-latency
HDFS utilization.
The best way to simulate this behavior is to store the data in S3 and
then just ingest as a start up step of the cluster then save back to
S3 when done.
Documentation Reference:
https://forums.aws.amazon.com/thread.jspa?threadID=149772
Hope it helps.
EDIT1:
If you want to maintain the same dns, you can use the API/CLI to update the elastic cluster.
Reference:
http://docs.aws.amazon.com/cli/latest/reference/es/update-elasticsearch-domain-config.html
Hope it helps.

How to use AWS memcached with multiple nodes

I am new to AWS, and I want to store some temporary data on memcached. My memcached has two nodes, one in us-east-1-a, one in us-east-1-b. It stores data in 2 nodes but is not syncing them. Is there any way I can get all data from 2 buckets instead of going into 2 nodes 1 by 1.
edit: I use telnet to connect to the endpoint of memcached
So memcached clusters do not replicate data between nodes. Only redis will do that if setup like that, that is why you need to use the Configuration Endpoint for memcached, it knows all the instances in your cluster
The Memcached engine supports partitioning your data across multiple nodes
A Memcached cluster is a logical grouping of one or more ElastiCache Nodes. Data is partitioned across the nodes in a Memcached cluster.
Memcached cluster, If you use Automatic Discovery, you can use the cluster's configuration endpoint to configure your Memcached client. This means you must use a client that supports Automatic Discovery.
If you don't use Automatic Discovery, you must configure your client to use the individual node endpoints for reads and writes. You must also keep track of them as you add and remove nodes.
Finding the Endpoints for a Memcached Cluster (Console)

How much does storage increase when I add an instance to an Amazon Elasticsearch cluster?

When you're running out of space on an Amazon Elasticsearch cluster the documentation recommends: "If you are not using EBS, add additional nodes to your cluster configuration."
source: https://aws.amazon.com/premiumsupport/knowledge-center/add-storage-elasticsearch/
But I'm not able to find any explanation as to "how much" does that increase the storage? Does it literally double the storage going from one instance to two?
Tangential follow-up: When you add another instance to a cluster does it automatically re-balance the existing indexes or do you have to rebuild them?
If you go from one instance to two, you double the storage, indeed. Try that and see if it solves your storage space issue.
Regarding your follow-up question, when new nodes join the cluster, ES will automatically rebalance the shards to the new nodes. Automatic rebalancing is one of ES' nicest features.
Be aware that if the default elasticsearch index configuration is 1 primary plus 1 replica, the cluster will automatically replicate all your shards, and consume all your added disk space. Check the AES docs, and your instance configs.

How to identify a master node in an AWS cluster

I want to make a cluster system within an AWS enterprise. The cluster will have a master node and several slaves. The slaves will connect to the master using a TCP/IP connection. There may be several clusters in our organization's AWS enterprise (eg dev1, dev2, qa1, qa2, etc).
For this particular technology, the slaves must somehow discover the IP address of the master node. What is the best practice in doing this? I had a few ideas:
Put the entire cluster in some sort of NAT'd subnet and have the master node always at a known address (eg 192.168.0.1)
Require some sort of domain name for each cluster and use DNS.
Use Eureka instead of DNS.
There may be more ideas. I'm somewhat new to AWS but not new to network topologies, so I may be going in the wrong direction. #1 about sounds to be the easiest thing to do. Are there any other ideas?
You can set arbitrary key-value pairs on your EC2 instances. So for example you could tag your instances with class=master and class=slave when you create them. Then, the other instances can use the EC2 API (using the AWS CLI, or one of the AWS SDKs) to list the instances with a certain tag and get the IP address. Here's an example using the AWS CLI:
aws ec2 describe-instances --filter Name=tag:class,Values=master \
--query Reservations[*].Instances[*].PrivateIpAddress --output text
which would return the private ip address of the master.
Another approach I've seen was to have the master write its own IP in a file in an S3 bucket, then have the nodes read the master's IP from that same file. Or it could be done using a database instead, any storage medium/location reachable by all participants will do.