How can I save SQL script from AWS Athena view with boto3/python - amazon-web-services

I have been working with AWS Athena for a while and need to do create a backup and version control of the views. I'm trying to build an automation for the backup to run daily and get all the views.
I tried to find a way to copy all the views created in Athena using boto3, but I couldn't find a way to do that. With Dbeaver I can see and export the views SQL script but from what I've seen only one at a time which not serve the goal.
I'm open for any way.
I try to find answer to my question in boto3 documentation and Dbeaver documentation. read thread on stack over flow and some google search did not took me so far.

Views and Tables are stored in the AWS Glue Data Catalog.
You can Query the AWS Glue Data Catalog - Amazon Athena to obtain information about tables, partitions, columns, etc.
However, if you want to obtain the DDL that was used to create the views, you will probably need to use SHOW CREATE TABLE [db_name.]table_name:
Analyzes an existing table named table_name to generate the query that created it.

Have you tried using get_query_results in boto3? get_query_results

Related

Is it possible to retrieve the Timestream schema via the API?

The AWS console for Timestream can show you the schema for tables, but I cannot find any way to retrieve this information programatically. While the schema is dynamically created from the write requests, it would be useful to check the schema to validate that writes are going to generate the same measure types.
As shown in this document, you can run the command beneath, it'll return the table's schema.
DESCRIBE "database"."tablename"
Having the right profile setup at your environment, to run this query from the CLI, you can run the command:
aws timestream-query query --query-string 'DESCRIBE "db-name"."tb-name"' --no-paginate
This should return the table schema.
You can also retrieve the same data running this query from source codes in Typescript/Javascript, Python, Java or C# just by using the Timestream-Query library from AWS's SDK.
NOTE:
If the table is empty, this query will return empty so, make sure you have data in your db/table before querying for it's schema, ok? Good luck!

Copy tables and views from Athena

Hi I need to copy tables and views from one Athena instance to another (I'm not using Glue). How to do this via AWS Console or via boto3/pyathena api without loosing any data? Cant find anything in documentation :(
Athena tables are built on top of files stored on S3. If you need the data, you will have to export the data.

Have you managed to get results when running the query SELECT * FROM information_schema.views in AWS Athena?

Is there a bug in the information_schema.views implementation in AWS Athena?
I'm getting 0 rows returned when running the query SELECT * FROM information_schema.views in AWS Athena even though the database I'm running on has tables and views in it.
Wanted to check if anyone else is facing the same issue.
I'm trying to fetch the view definition script for ALL views in the AWS Athena database as a single result set instead of using the SHOW CREATE VIEW statement.
You should be using
SHOW TABLES IN example_database; to get information about tables in Athena database.
And the loop through use describe query to fetch details.
Hope this will help!

Can I run Athena query from sql file stored in S3

I have a .sql file filled with Athena queries.
Is there a way I can tell Athena to run the sql queries saved in s3://my-bucket/path/to/queries.sql?
In MySQL can do something like this (based in SO answer), but curious if possible in Athena
mysql> source \home\user\Desktop\test.sql;
Is there a way I can tell Athena to run the sql queries saved in s3://my-bucket/path/to/queries.sql?
I think there is no direct way to tell Athena to run query stored in S3.
In MySQL can do something like this (based in SO answer), but curious if possible in Athena.
If you want to do it at all, then yes, you should be able to run the query using AWS CLI.
Your steps should be look like this.
Get the query from S3 using CLI and store in temp variable
Pass the query stored in a temp variable to Athena Query CLI
Hope this will help.

AWS Glue: Do I really need a Crawler for new content?

What I understand from the AWS Glue docs is a craweler will help crawl and discover new data. However, I noticed that once I crawled once, if new data goes into S3, the data is actually already discovered when I query the data catalog from Athena for example. So, can I say I do not need a crawler to crawl everytime new data is added, unless there are new schemas?
In fact, if I know the schema of the files, I can just manually create the table and do without a crawler, am I correct?
If data is partitioned by some keys (placed in sub-folders, like /data/year=2018/month=11/day=2) then you need a crawler to register newly added partitions (ie. /day=3) in Data Catalog to be able to query it via Athena.
However, if data is not partitined or comes into already registered partitions then there is no need to run a crawler.
Alternatively to runnig a crawler you can discover and register new partitions by running Athena command MSCK REPAIR TABLE <table> or registering them manually.
The easiest way to create a table in Data Catalog is running a crawler. But if you know schema and have patience to compose CREATE TABLE Athena query or fill all fields via AWS Glue console then you can go that way as well.
If you have the schema then you don't need to use the crawler and you might get better results (the crawler assumes partition columns are strings for example).
As Yuriy says, remember to run MSCK REPAIR TABLE or register new partitions manually.
MSCK can time out if you've added a lot of partitions. If it does, keep running it until it completes normally.