Has anyone tried connecting AWS Athena from Oracle Data Integrator.
I have been trying this since long but am not able to find the appropriate JDBC connection string.
Steps I have followed from
https://docs.aws.amazon.com/athena/latest/ug/connect-with-jdbc.html#jdbc-url-format
Downloaded AthenaJDBC42_2.0.7.jar driver from AWS
Copied the same into the userlib directory of ODI
Created new technology in ODI
Trying to add Data server. Not able to form JDBC url.
JDBC string Sample format (which isn't working):
jdbc:awsathena://AwsRegion=[Region];User=[AccessKey];Password=[SecretKey];S3OutputLocation=[Output];
Please can anyone help? Thanks.
This is sorter version of JDBC I implemented for Athena. This was just POC and we want to go with AWS SDK rather then jdbc though less important here.
package com.poc.aws.athena;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
public class AthenaJDBC {
public static void main(String[] args) throws ClassNotFoundException, SQLException {
Connection connection = null;
Class.forName("com.simba.athena.jdbc.Driver");
connection = DriverManager.getConnection("jdbc:awsathena://AwsRegion=us-east-1;User=EXAMPLEKEY;"
+ "Password=EXAMPLESECRETKYE;S3OutputLocation=s3://example-bucket-name-us-east-1;");
Statement statement = connection.createStatement();
ResultSet queryResults = statement.executeQuery(ExampleConstants.ATHENA_SAMPLE_QUERY);
System.out.println(queryResults.next());
}
}
The only important point here related to url.
jdbc:awsathena://AwsRegion=us-east-1;User=EXAMPLEKEY;"
+ "Password=EXAMPLESECRETKYE;S3OutputLocation=s3://example-bucket-name-us-east-1;.
us-east-1 must be replaced with your actual region like us-west-1 etc
EXAMPLEKEY must be replaced with your AWS Access key that has Athena access.
EXAMPLESECRETKEY must be replaced with your AWS Secret key that has Athena access.
example-bucket-name-us-east-1 must be replaced with your S3 bucket that above keys has write access too.
There other keys simba driver support but less important here.
I hope this helps.
Sorry I missed to post answer on this.
It all worked fine after configuring a Athena JDBC connection in ODI like below and providing the 4 key values while connecting.
JDBC URL: jdbc:awsathena://athena.eu-west-2.amazonaws.com:443;AWSCredentialsProviderArguments=ACCESSKEYID,SECRETACCESSKEY,SESSIONTOKEN
Related
I am using JDBC to connect to Athena for a specific Workgroup. But it is by default redirecting to the primary workgroup
Below is the code snippet
Properties info = new Properties();
info.put("user", "access-key");
info.put("password", "secrect-access-key");
info.put("WorkGroup","test");
info.put("schema", "testschema");
info.put("s3_staging_dir", "s3://bucket/athena/temp");
info.put("aws_credentials_provider_class","com.amazonaws.auth.DefaultAWSCredentialsProviderChain");
Class.forName("com.simba.athena.jdbc.Driver");
Connection connection = DriverManager.getConnection("jdbc:awsathena://athena.<region>.amazonaws.com:443/", info);
As you can see I am using "Workgroup" as the key for the properties. I also tried "workgroup", "work-group", "WorkGroup". It is not able to redirect to the specified Workgroup. Always going to the default one i.e primary workgroup.
Kindly help. Thanks
If you look at the release notes of Athena JDBC, the workgroup support is from v2.0.7.
If you jar is below this version, it will not work. Try to upgrade the library to 2.0.7 or above
You need to Override Client-Side Settings in workgroup.Enable below setting and rerun the query via JDBC.
Check this doc for more information.
I want to import CSVs in my s3 bucket into my MySql rds instance using jdbc. It is a one time process and not an ongoing one. Interested in knowing the end to end process.
As you mentioned its one time activity, hence I would like you to suggest direct CSV import to MySQL rather then using JDBC unless there is specific reason you might have that you have not mentioned into the question.
Here is approach you could utilize,
for loop of your files in S3, the use following command to import data to MySQL RDS.
>mysqlimport [options] db_name textfile1 [textfile2 ...]
Please refer following for more details.
https://dev.mysql.com/doc/en/mysqlimport.html
I hope this provides you way to move forward. If I'm missing something, re-frame your question, I could reattempt the answer.
I am trying to connect Athena with Apache Zeppelin.I need to handle secret_key, Access_key, and Session_token. I am feeling hard to establish my connection with the Zeppelin JDBC interpreter.
I am following the steps as mentioned in this block,
If any one can help me out in establishing the connection with AWS Session token approach that would be helpful.
Thank You
The main docs for this are here:
https://docs.aws.amazon.com/athena/latest/ug/connect-with-jdbc.html
I found there are 2 driver versions, -1.1.0 and -1.0.1 . I could only get Zeppelin working with 1.1.0, and the links on that page don't go to that file, the only way to get it was using the aws s3 cp command
e.g.
aws s3 cp s3://athena-downloads/drivers/AthenaJDBC41-1.1.0.jar .
although I've given feedback on that page so it should be fixed soon.
Regarding the parameters, you use default.user and enter the Access_Key, default.password and enter the Secret_key. The default.driver should be com.amazonaws.athena.jdbc.AthenaDriver
The default.s3_staging_dir is actually the bucket where csv results are written so needs to match your athena settings.
There is no mention of where you might put a session token, however, you could always try putting it on the jdbc connection string ( which goes in default.url parameter value)
e.g.
jdbc:awsathena://athena.{REGION}.amazonaws.com:443?SessionToken=blahblahsomethingrealsessiontokengoeshere
but of course, replace {REGION} with the actual aws region and use your real session token.
I'm using the latest version of IntelliJ and I've just created a cluster in Amazon Redshift. How do I connect IntelliJ to Redshift so that I can query it from my favorite IDE?
Download a jdbc driver:
http://docs.aws.amazon.com/redshift/latest/mgmt/configure-jdbc-connection.html#download-jdbc-driver
On IntelliJ: View |Tool Windows | Database
Click on "Data Source
Properties" ()
Click Add (+) and select "Database Driver":
Uncheck "JDBC drivers", and add a jdbc driver, select a class from the dropdown and select a PostgreSQL dialect:
6.Add a new connection, and use this datasource for your connection: (+ | Data Source | RedShift).
7.Set URL templates:
jdbc:redshift://[{host::localhost}[:{port::5439}]][/{database::postgres}?][\?<&,user={user:param},password={password:param},{:identifier}={:param}>]
jdbc:redshift://\[{host:ipv6:\:\:1}\][:{port::5439}][/{database::postgres}?][\?<&,user={user:param},password={password:param},{:identifier}={:param}>]
jdbc:redshift:{database::postgres}[\?<&,user={user:param},password={password:param},{:identifier}={:param}>]
You can connect IntelliJ to Redshift by the using the JDBC Driver supplied by Amazon. In the Redshift Console, go to "Connect Client" to get the driver.
Then, in the IntelliJ Data Source window, add the JAR as a Driver file, and use the following settings:
Class: com.amazon.redshift.jdbc41.Driver
URL template: jdbc:redshift://{host}:{port}/{database}
Common Pitfalls:
If the driver file is not readable or marked as in quarantine by OS X, you won't be able to select the driver class.
For a more detailed guide, see this blog post: Connecting IntelliJ to Redshift
Note: There is no native Redshift support in IntelliJ yet. IntelliJ Issue DBE-1459
Update for 2019: I've just created a PostgreSQL connection and then filled the usual Redshift settings (don't forget port: 5439), no need to download Amazon's JDBC driver.
Only little issue is that the syntax check doesn't know Redshift specificities such as AS and some functions, but queries execute correctly.
Update for 2020: PyCharm (and possibly all other JetBrains IDEs) now supports connecting to Redshift through IAM AWS credentials without manual driver installation.
Here are the detailed setup instructions:
Grant a redshift:GetClusterCredentials permission to your AWS user. Either create and attach a new policy (docs) or use an existing one such as AmazonRedshiftFullAccess (not recommended: too permissive).
Create an AWS access key (access key id + secret access key pair) for your user (docs).
Create a text configuration file ~/.aws/credentials (no extension) with the following content (docs):
[default] # arbitrary profile name, will be used later
region = <your region>
aws_access_key_id = <your access key id> # created on the previous step
aws_secret_access_key = <your secret access key>
Create a new PyCharm database connection of type Amazon Redshift and set it up (docs):
Choose connection type = IAM cluster/region (right under the «General» tab of the connection settings window).
Authentication = AWS Profile
User = {your AWS login}
Profile = default or the one you have used in credentials file.
The credentials can possibly be provided through AccessKeyID/SecretAccessKey connection settings on the «Advanced» tab but it did not work for me (due to NullPointerException if Profile field is empty).
Database = {your database}, choose an existing one to not face non descriptive errors from the driver.
Region = {your region}
Cluster = {cluster name}, get it from Redshift AWS console.
Setup the connection:
Check necessary databases in the «Schemas» tab.
«Advanced» tab: AutoCreate = true (literal lowercase true as the setting value). This will automatically create a new database user with your AWS login.
Test connection.
I currently have a file in S3. I would like to issue commands using the Java AWS SDK, to take this data and place it into a RedShift table. If the table does not exist I would like to also create the table. I have been unable to find any clear examples on how to do this so I am wondering if I am going about it the wrong way? Should I be using standard postgres java connectors instead of the AWS SDK?
Connect (http://docs.aws.amazon.com/redshift/latest/mgmt/connecting-in-code.html#connecting-in-code-java) and submit your CREATE TABLE and COPY commands
Guys answer serves most of purpose.
I would like to post a working java JDBC code that does exactly Copy from S3 to Redshift table. I hope it will help others.
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.Statement;
import java.util.Properties;
public class RedShiftJDBC {
public static void main(String[] args) {
Connection conn = null;
Statement statement = null;
try {
//Even postgresql driver will work too. You need to make sure to choose postgresql url instead of redshift.
//Class.forName("org.postgresql.Driver");
//Make sure to choose appropriate Redshift Jdbc driver and its jar in classpath
Class.forName("com.amazon.redshift.jdbc42.Driver");
Properties props = new Properties();
props.setProperty("user", "username***");
props.setProperty("password", "password****");
System.out.println("\n\nconnecting to database...\n\n");
//In case you are using postgreSQL jdbc driver.
//conn = DriverManager.getConnection("jdbc:postgresql://********8-your-to-redshift.redshift.amazonaws.com:5439/example-database", props);
conn = DriverManager.getConnection("jdbc:redshift://********url-to-redshift.redshift.amazonaws.com:5439/example-database", props);
System.out.println("\n\nConnection made!\n\n");
statement = conn.createStatement();
String command = "COPY my_table from 's3://path/to/csv/example.csv' CREDENTIALS 'aws_access_key_id=******;aws_secret_access_key=********' CSV DELIMITER ',' ignoreheader 1";
System.out.println("\n\nExecuting...\n\n");
statement.executeUpdate(command);
//you must need to commit, if you realy want to have data saved, otherwise it will not appear if you query from other session.
conn.commit();
System.out.println("\n\nThats all copy using simple JDBC.\n\n");
statement.close();
conn.close();
} catch (Exception ex) {
ex.printStackTrace();
}
}
}