I have a .sql file filled with Athena queries.
Is there a way I can tell Athena to run the sql queries saved in s3://my-bucket/path/to/queries.sql?
In MySQL can do something like this (based in SO answer), but curious if possible in Athena
mysql> source \home\user\Desktop\test.sql;
Is there a way I can tell Athena to run the sql queries saved in s3://my-bucket/path/to/queries.sql?
I think there is no direct way to tell Athena to run query stored in S3.
In MySQL can do something like this (based in SO answer), but curious if possible in Athena.
If you want to do it at all, then yes, you should be able to run the query using AWS CLI.
Your steps should be look like this.
Get the query from S3 using CLI and store in temp variable
Pass the query stored in a temp variable to Athena Query CLI
Hope this will help.
Related
I have been working with AWS Athena for a while and need to do create a backup and version control of the views. I'm trying to build an automation for the backup to run daily and get all the views.
I tried to find a way to copy all the views created in Athena using boto3, but I couldn't find a way to do that. With Dbeaver I can see and export the views SQL script but from what I've seen only one at a time which not serve the goal.
I'm open for any way.
I try to find answer to my question in boto3 documentation and Dbeaver documentation. read thread on stack over flow and some google search did not took me so far.
Views and Tables are stored in the AWS Glue Data Catalog.
You can Query the AWS Glue Data Catalog - Amazon Athena to obtain information about tables, partitions, columns, etc.
However, if you want to obtain the DDL that was used to create the views, you will probably need to use SHOW CREATE TABLE [db_name.]table_name:
Analyzes an existing table named table_name to generate the query that created it.
Have you tried using get_query_results in boto3? get_query_results
Is it possible to get the last updated timestamp of a Athena Table stored as a CSV file format in S3 location using Spark SQL query.
If yes, can someone please provide more information on it.
There are multiple ways to do this.
Use the athena jdbc driver and do a spark read where the format is jdbc. In this read you will provide your "select max(timestamp) from table" query. Then as the next step just save to s3 fcrom the spark dataframe
You can skip the jdbc read altogther and just use boto3 to run the above query. It would be a combination of start_query_execution and get_query_results. You can then save this to s3 as well.
I've crawled a couple of XML files on S3 using AWS Glue, using a simple XML classifier:
However, when I try running any query on that data using AWS Athena, I get the following error (note that it's the simplest possible query I'm doing here):
HIVE_UNKNOWN_ERROR: Unable to create input format
Note that Athena can see my tables and it can see the columns, it just can't query them:
I noticed that there is someone with the same problem on the AWS Discussion forums: Athena XML Query Give HIVE Unknown Error but it got no love from anyone.
I know there is a similar question here about this error but the query in question targeted an RDS database, unlike an S3 bucket like I have here.
Has anyone got a solution for this?
Sadly at this time 12/2018 Athena cannot query XML input which is hard to understand when you may hear that Athena along with AWS Glue can query xml.
What output you are seeing from the AWS crawler is correct though, just not what you think its doing! For example after your crawler has run and you see the tables, but cannot execute any Athena queries. Go into your AWS Glue Catalog and at the right click tables, click your table, edit properties it will look something like this:
Notice how input format is null? If you have any other tables you can look at their properties or refer back to the input formatters documentation for Athena. This is the error you recieve.
Solutions:
convert your data to text/json/avro/other supported formats prior to upload
create a AWS glue job which converts a source to target from xml to target supported Athena format(compressed hopefully with ORC/Parquet)
I have a column in Dynamo DB which is backed up in S3, and is something like this:
{"attributes":
{"m":{"isBirthday":{"s":"0"},
"party":{"m":{"cake":{"bOOL":false},"pepsi":{"bOOL":false},"chips":{"bOOL":false},"fries":{"bOOL":false},"puffs":{"bOOL":false}}},"gift":{"s":"yes"}}},
"createdDate":{"n":"1521772435189"},
"modifiedDate":{"n":"1521772435189"}}
I need to use this S3 file in Athena and run query on the items in this "attributes".
I have tried multiple options to map it and then retrieve the values. However it doesnot work.
Please someone let me know how do I map it to my athena table to run query on this.
I created a database and some tables with Data on AWS Athena and would like to rename the database without deleting and re-creating the tables and database. Is there a way to do this? I tried the standard SQL alter database but it doesn't seem to work.
thanks!
I'm afraid there is no way to do this according to this official forum thread. You would need to remove the database and re-create it. However, since Athena does not store any data by itself, deleting a table or a database won't impact your data stored on S3. Therefore, if you kept all the scripts that create external tables, re-creating a database should be fairly quick thing to do.
Athena doesn't support renaming database. You need to recreate database with a new name.
You can use Presto which is an open source version of Athena and Presto supports more DDL queries.