How to switch databases in AWS Athena using JDBC driver? - amazon-athena

I am trying to execute a sql statement against Athena using sqlworkbench. What would be the solution to switching a databases in Athena, or more generally Athena through the jdbc?
use AwsDataCatalog.geoosm
An error occurred when executing the SQL command: use
AwsDataCatalog.geoosm [Simba]AthenaJDBC An error has been
thrown from the AWS Athena client. line 1:19: mismatched input '.'
expecting [Execution ID not available] [SQL State=HY000, DB
Errorcode=100071] 1 statement failed.
Execution time: 0.18s
My SQL syntax in Athena comes from Presto documentation which from my understanding is the syntax used by Athena.
8.39. USE Synopsis
USE catalog.schema USE schema

The use statement which is supported in presto is not supported in Athena at this time.
But for cross database queries lower-case awsdatacatalog.geoosm, actually works.

Related

AWS Athena Javascript SDK - Create table from query result (CTAS) - Specifiy otuput format

I am trying using the AWS JavaScript Node.JS SDK to make a query using AWS Athena and store the results in a table in AWS Glue with Parquet format (not just a CSV file)
If I am using the conosle, it is pretty simple with a CTAS query :
CREATE TABLE tablename
WITH (
external_location = 's3://bucket/tablename/',
FORMAT = 'parquet')
AS
SELECT *
FROM source
But with AWS Athena JavaScript SDK I am only able to set an output file destination using the Workgoup or Output parameters and make a basic select query, the results would output to a CSV file and would not be indexed properly in AWS Glue so it breaks a bigger process it is part of, if I try to call that query using the JavaScript SDK I get :
Table properties [FORMAT] are not supported.
I would be able to call that DDL statement using the Java SDK JDBC driver connection option.
Is anyone familiar with a solution or workaround with the Javascript SDK for Node.JS?
There is no difference between running the SQL you posted in the Athena web console, AWS SDK for JavaScript, AWS SDK for Java, or the JDBC driver, none of these will process the SQL, so if the SQL works in one of these it will work in all of them. It's only the Athena service that reads the SQL.
Check your SQL and make sure you really use the same in your code as you have tried in the web console. If they are indeed the same, the error is somewhere else in your code, so post that too.
Update the problem is the upper case FORMAT. If you paste the code you posted into the Athena web console, it bugs out and doesn't run the query, but if you run it with the CLI or an SDK you get the error you posted. You did not run the same SQL in the console as in the SDK, if you had you would have gotten the same error in both.
Use lower case format and it will work.
This is definitely a bug in Athena, these properties should not be case sensitive.

Can I run Athena query from sql file stored in S3

I have a .sql file filled with Athena queries.
Is there a way I can tell Athena to run the sql queries saved in s3://my-bucket/path/to/queries.sql?
In MySQL can do something like this (based in SO answer), but curious if possible in Athena
mysql> source \home\user\Desktop\test.sql;
Is there a way I can tell Athena to run the sql queries saved in s3://my-bucket/path/to/queries.sql?
I think there is no direct way to tell Athena to run query stored in S3.
In MySQL can do something like this (based in SO answer), but curious if possible in Athena.
If you want to do it at all, then yes, you should be able to run the query using AWS CLI.
Your steps should be look like this.
Get the query from S3 using CLI and store in temp variable
Pass the query stored in a temp variable to Athena Query CLI
Hope this will help.

How to execute schema(database) rename in Athena?

I am trying to execute a sql statement against Athena using sqlworkbench. I have executed several queries and know I have a connection if that is the first question. What would be the solution to renaming a database in Athena, or maybe Athena through the jdbc?
alter schema geoosm rename to geo_osm
An error occurred when executing the SQL command: alter schema
geoosm rename to geo_osm
[Simba]AthenaJDBC An error has been thrown from the AWS
Athena client. line 1:24: mismatched input 'rename' expecting 'SET'
[Execution ID not available] [SQL State=HY000, DB Errorcode=100071] 1
statement failed.
Execution time: 0.27s
My SQL syntax comes in Athena from Presto documentation which from my understanding is the syntax used by Athena.
8.1. ALTER SCHEMA Synopsis
ALTER SCHEMA name RENAME TO new_name
Sorry but there is no way to rename a database in AWS Athena. Fortunately, table data and table definition are two completely different things in Athena.
You can just create a new database with the right name, generate all DDL's for your table and execute them using the new database.
The "new" tables in the new database will still pointing to the same location so nothing to worry about.

AWS Athena - How to Parameterize the SQL query

I want to provide runtime values to the query in Select & Create table statements. What are the ways to parameterize Athena SQL queries?
I tried with PREPARE & EXECUTE statements from Presto however it is not working in Athena console. Do we need any external scripts like Python to call it?
PREPARE my_select1
FROM SELECT * from NATION;
EXECUTE my_select1 USING 1;
The SQL and HiveQL Reference documentation does not list PREPARE nor EXECUTE as available commands.
You would need to fully construct your SELECT statement before sending it to Amazon Athena.
You have to upgrade to Athena engine version 2 and now this seems to be supported as of 2021-03-12 but I can't find an official report:
https://docs.aws.amazon.com/athena/latest/ug/querying-with-prepared-statements.html
Athena does not support Parameterized queries. How ever you can create user-defined functions that you can call in the body of a query. Refer to this to know more about UDFs.

Storing error message to Redshift through datapipeline

I am trying to run a SQL activity in Redshift cluster through data pipeline. After SQL activity, few logs need be written to a Table in redshift [such as number of rows affected, the error message(if any)].
Requirement:
If the sql Activity is finished successfully, the mentioned table will be written with 'error' column as null,
else if the sql Activity fails on any error, that particular error message is need to be updated into the 'error' column in Redshift table.
Can we able to achieve this through pipeline? If yes, How can we achieve this?
Thanks,
Ravi.
Unfortunately you cannot do this directly with SqlActivity in DataPipeline. The work around is to write a java program (or any executable) that does what you want and schedule it via Datapipeline using ShellCommandActivity.