What query to run to determine Amazon Athena version? - amazon-web-services

I'd like to determine what version of Amazon Athena I'm connected to by running a query. Is this possible? If so, what is the query?
Searching Google, SO, and AWS docs have not found an answer.

Amazon Redshift launches as a cluster, with virtual machines being used for that specific cluster. The cluster must be specifically updated between versions because it is continuously running and is accessible by only one AWS account. Think of it as software running on your own virtual machines.
From Amazon Redshift Clusters:
Amazon Redshift provides a setting, Allow Version Upgrade, to specify whether to automatically upgrade the Amazon Redshift engine in your cluster if a new version of the engine becomes available.
Amazon Athena, however, is a fully-managed service. There is no cluster to be created -- you simply provide your query and it uses the metastore to know where to find data. Think of it just like Amazon S3 -- many servers provide access to multiple AWS customers simultaneously.
From Amazon Athena – Interactive SQL Queries for Data in Amazon S3:
Behind the scenes, Athena parallelizes your query, spreads it out across hundreds or thousands of cores, and delivers results in seconds.
As a fully-managed service, there is only ever one version of Amazon Athena, which is the version that is currently available.

Related

Implement incremental load from AWS RDS SQL Server to Amazon S3

I need to transfer data from DWH(AWS RDS MS SQL Server) to Amazon S3. Data in DWH can be updated, not only added. Did someone do such a pipeline? Data in DWH is updated every 10-15 minutes.
AWS Database Migration Service can help you with that, it also supports ongoing replication and SQL Serve is fully suported
there's a full walktrought solution here: https://dms-immersionday.workshop.aws/en/sqlserver-s3.html
they are using SQL Server on EC2, but you can change the source to RDS

Fail back to AWS from Azure Site Recovery

We a implementing a solution where we are replicating EC2 Instances (VMs) from AWS to Azure using Azure Site Recovery. Please note we are not migrating to Azure and would only want to set up the replication from AWS to Azure for Disaster Recovery purposes.
As per the article -
AWS instances are treated like physical, on-premises computer.
Once we have setup everything like enabling replication, we should be able see the replicated VMs in Replicated Item on ASR. As per my understanding, when we run a failover, Azure VMs are created from replicated data in Azure storage.
Now when the primary AWS site is available again, what happens when we want to fail back to AWS.
As we are treating/considering AWS Instances as Physical, as per the article
"Failback from Azure to an on-premises physical server isn't supported. You can only fail back to a VMware virtual machine”
Now the question is ‘Will we be able to Failback to the primary AWS site like the way we can fail back to VMware’?
I answered your query #https://learn.microsoft.com/en-us/answers/questions/51568/fail-back-to-aws-from-azure-site-recovery.html#answer-51979
Quoting from the link:
Only failover is supported for AWS machines. As you have already mentioned in quotes above, failback is not supported for AWS machines using Azure Site Recovery.
As long as Azure receives data from AWS machines, it will process it to make recovery points and those points would be available for you to failover to Azure enabling business continuity. In your case specifically, you can failover to Azure but going back to AWS will not be an option through Azure Site Recovery.

Which time series database to use for consistent dev prod (aws) parity

I deploy my app to AWS.
On AWS there are RDS which support some industrial standard DBMS like PostgreSQL/MySQL/Oracle.
These dbms can be make available on development machine (docker) as well, make it easy to achieve dev/prod parity.
I'm looking for a time series specialized database that I can achieve dev/prod as well.
AWS has Timestream that is specialized for time series, but I'm clueless of a local equivalent database for it.
There probably some EC2-hosted database possible, but I prefer to be lazy and have Amazon take care of manage the database cluster for me.
What options do I have?
Apache Druid is a very good Time-Series Database that can be deployed on local development environments and on multiple cloud environments easily.
Druid is ofered as a fully managed cloud service, on AWS, by Imply.
The fully managed variant of Druid is called Imply Cloud.
More information: https://imply.io/product/imply-cloud
You should try Amazon Timestream
It is as a nonrelational, fully managed service built specifically to collect, store and process time-series data. The arrival of masses of IoT data is expected to push time-series technology into wider use,that's why Amazon came with Timestream.

Does your Amazon Redshift database need be in the same region as your Machine Learning model?

When trying to use Amazon Redshift to create a datasource for my Machine Learning model, I encountered the following error when testing the access of my IAM role:
There is no '' cluster, or the cluster is not in the same region as your Amazon ML service. Specify a cluster in the same region as the Amazon ML service.
Is there anyway around this, as this would be a huge pain since all of our development team's data is stored in a region that Machine Learning doesn't work in?
That's an interesting situation to be in.
What probably you can do :
1) Wait for Amazon Web Services to support AWS ML in your preferred Region. (That's a long wait though).
2) OR what else you can do is Create a backup plan for your Redshift data.
Amazon Redshift provides you some by Default tools to back up your
cluster via snapshot to Amazon Simple Storage Service (Amazon S3).
These snapshots can be restored in any AZ in that region or
transferred automatically to other regions wherever you want (In your
case where your ML is running).
There is (Probably) no other way around to use your ML with Redshift being in different regions.
Hope it will help !

Simplest way to get data from AWS mysql RDS to AWS Elasticsearch?

I have data in an AWS RDS, and I would like to pipe it over to an AWS ES instance, preferably updating once an hour, or similar.
On my local machine, with a local mysql database and Elasticsearch database, it was easy to set this up using Logstash.
Is there a "native" AWS way to do the same thing? Or do I need to set up an EC2 server and install Logstash on it myself?
You can achieve the same thing with your local Logstash, simply point your jdbc input to your RDS database and the elasticsearch output to your AWS ES instance. If you need to run this regularly, then yes, you'd need to setup a small instance to run Logstash on it.
A more "native" AWS solution to achieve the same thing would include the use of Amazon Kinesis and AWS Lambda.
Here's a good article explaining how to connect it all together, namely:
how to stream RDS data into a Kinesis Stream
configuring a Lambda function to handle the stream
push the data to your AWS ES instance
Take a look at Amazon DMS. Its usually used for DB migrations, however, it also supports continuous data replication. This might simplify the process and be cost-effective.
You can use AWS Database Migration Service to perform continuous data replication. Continuous data replication has a multitude of use cases including Disaster Recovery instance synchronization, geographic database distribution and Dev/Test environment synchronization. You can use DMS for both homogeneous and heterogeneous data replications for all supported database engines. The source or destination databases can be located in your own premises outside of AWS, running on an Amazon EC2 instance, or it can be an Amazon RDS database. You can replicate data from a single database to one or more target databases or data from multiple source databases can be consolidated and replicated to one or more target databases.
https://aws.amazon.com/dms/