I have a AWS Redshift Cluster dc2.8xlarge and currently I am paying huge bill each month for running the cluster 24/7.
Is there a way I can automate the cluster uptime so that the cluster will be running in day time and I can stop the cluster at 8PM in evening and again start it in 8AM in morning.
Update: Stop/Start is now available. See: Amazon Redshift launches pause and resume
Amazon Redshift does not have a Start/Stop concept. However, there are a few options...
You could resize the cluster so that it is a lower-cost. A Redshift Cluster is sized for Compute and for Storage. You could reduce the number of nodes as long as you retain enough nodes for your Storage needs.
Also, Amazon Redshift has introduced RA3 nodes with managed storage enabling independent compute and storage scaling, which means you might be able scale-down to a single node. (This is a new node type, I'm not sure of how it works.)
Another option is to take a Snapshot and Shutdown the cluster. This will result in no costs for the cluster (but the Snapshot will be charged). Then, create a new cluster from the Snapshot when you want the cluster again.
Scheduling the above can be done in Amazon CloudWatch Events, which can trigger an AWS Lambda function. Within the function, you can make the necessary API calls to the Amazon Redshift service.
If you are concerned with the general cost of your cluster, you might want to downside from the dc2.8xlarge. You could either use multiple dc2.large nodes, or even consider a move to ds2.xlarge, which is a lower cost per TB of data stored.
good news :)
Now we can able to pause and resume the Redshift cluster (both Console and CLI)
check out the link:
https://aws.amazon.com/blogs/big-data/lower-your-costs-with-the-new-pause-and-resume-actions-on-amazon-redshift/
Now we can pause and resume an AWS Redshift cluster.
We can also schedule the pause and the resume, which is a very important feature to check on the costs.
Link: https://aws.amazon.com/blogs/big-data/lower-your-costs-with-the-new-pause-and-resume-actions-on-amazon-redshift/
This will help you in automating the cluster uptime & downtime so that the cluster will be running in day time and is paused automatically at a specific time in the evening and again start in the morning automatically.
its pretty easy to use opensource https://cloudcustodian.io to automate nightime/weekend off hours on redshift and other aws resources.
Related
We are running an EMR cluster with spot instances as task nodes. The EMR cluster is executing spark jobs which sometimes run for several hours. Interruptions of spot instances can cause the failure of the spark job which then requires us to restart the job entirely.
I can see that there is some basic information on the "Frequency of interruption" on AWS Spot Advisor - However, this data seems to be very generic, I can't see historic trends and I also miss the probability of interruption based on how long the spot instance is running (which should have a significant impact on the probability of interruption).
Is this data available somewhere? Or are there other data points that can be used as proxy?
I found this Github issue which provides a link to this JSON file in Spot Advisor S3 bucket that includes interruption rates.
https://spot-bid-advisor.s3.amazonaws.com/spot-advisor-data.json
we are looking to get some clarity on Cloud Data Fusion pricing. It looks like if we create a Cloud Data Fusion instance, as long as the instance is alive, we incur hourly rates. This can be quite high: $1100 per month for development and $3000 per enterprise instance.
https://cloud.google.com/data-fusion/pricing
There seems to be no way to stop an instance - this was confirmed by support, only delete.
However, the pricing talks of development vs execution. Wondering if we can avoid the instance charges once we are done deploying a pipeline. Not clear if this is possible or even a deployed pipeline requires an instance.
Thanks.
You can deploy your pipeline in 2 modes:
Either Cloud Data fusion create an ephemeral cluster, deploy your pipeline and tear down the cluster at the end -> Here you need to keep Data Fusion to tear down the cluster. But you can delete it before
Or run the pipeline on an existing cluster. This time, after the pipeline deployment and start, you can shut down the instance.
I agree, it's not clear but you can deduce this when you know how work an Hadoop cluster.
Note: don't forget to export your pipeline before deleting the instance
Note2: the instance also offer trigger scheduling to run the pipeline. Of course, if you delete the instance this feature is useless for you!
well I have couple of questions. I have a aurora cluster with a single MySQL RDS instance which has 450GB of data. we use this cluster only when we are doing some specific testing.so I want to delete this cluster but keep its data available to me so I can make a new cluster whenever we need any testing to be done.
there are couple of ways this can be done as far as I know
take a snapshot of the cluster and restore the cluster from the
snapshot whenever required.
backup the cluster to s3 and restore the
cluster from s3 when required
which way is more faster and which one is more cost efficient?
can an entire cluster be restored from s3 if so what are the steps involved ? , I found the aws documentation bit too messy.
If we stop a aurora cluster, it again automatically restarts within 7 days , is there a way to prevent this automatic restart and keep it stopped when it is not required and start when required ?
Does Athena have a gigantic cluster of machines ready to take queries from users and run them against their data? Are they using a specific open-source cluster management software for this?
I believe AWS will never disclose how they operate Athena service. However, as Athena is managed PrestoDB the overall design can be deduced based on that.
PrestoDB does not require cluster manager like YARN, Messos. It has own planner and scheduler that is able to run SQL physical plan on worker nodes.
I assume that AWS within each availability zone maintains PrestoDB coordinator connected to data catalog(AWS Glue) and set of presto worker. Workers are elastic and autoscaled. In case of inactivity, they're downscaled, but when the burst of activity occurs new workers added to the cluster.
I have an AWS account which is used for development. Because the developers are in one timezone, we switch off the resources after hours to conserve usage.
Is it possible to temporarily switch off nodes in elasticache cluster? all i found in cli reference was 'delete cluster':
http://docs.aws.amazon.com/cli/latest/reference/elasticache/index.html
ElastiCache clusters cannot be stopped. They can only be deleted and recreated. You can use this pattern to avoid paying for time when you're not using the cluster.
If you are using a Redis ElastiCache cluster, you can create a snapshot as the cluster is being deleted. Then, you can restore the cluster from the snapshot when you create it. This way, you preserve the data in the cluster.
The cluster endpoints are derived from a combination of
the cluster IDs,
the region,
the AWS account.
So as long as you delete and re-create clusters with those parts being constant, then the clusters will maintain the same endpoint.
At this time there is not a way to STOP and EMR cluster in the same
sense you can with EC2 instances. The EMR cluster uses instance-store
volumes and the EC2 start/stop feature relies on the use of EBS
volumes which are not appropriate for high-performance, low-latency
HDFS utilization.
The best way to simulate this behavior is to store the data in S3 and
then just ingest as a start up step of the cluster then save back to
S3 when done.
Documentation Reference:
https://forums.aws.amazon.com/thread.jspa?threadID=149772
Hope it helps.
EDIT1:
If you want to maintain the same dns, you can use the API/CLI to update the elastic cluster.
Reference:
http://docs.aws.amazon.com/cli/latest/reference/es/update-elasticsearch-domain-config.html
Hope it helps.