Connect SAS to AWS Athena - amazon-web-services

I am trying to establish a connection between SAS & AWS Athena.
I am working on RHEL 6.7, java version is 1.8.0_71.
Could someone advise how to configure that please?
So far, after some reading on "Accessing Amazon Athena with JDBC" , I have tried a 'maybe it will work' naive approach with trying to set a DSN in odbc.ini files (outside of SAS): I have downloaded Athena JDBC jar file and tried configuring connection in a similar way I did it for EMR.
odbc.ini:
[ODBC]
# Specify any global ODBC configuration here such as ODBC tracing.
[ODBC Data Sources]
ATHENA=Amazon Athena JDBC Driver
[ATHENA]
Driver=/opt/amazon/hiveodbc/lib/64/AthenaJDBC41-1.1.0.jar
HOST=jdbc:awsathena://athena.eu-west-1.amazonaws.com:443?s3_staging_dir=s3://aws-athena-query-results/sas/
odbcinst.ini
[ODBC Drivers]
Amazon Athena JDBC Driver=Installed
[Amazon Athena JDBC Driver]
Description=Amazon Athena JDBC Driver
Driver=/opt/amazon/hiveodbc/lib/64/AthenaJDBC41-1.1.0.jar
## The option below is for using unixODBC when compiled with -DSQL_WCHART_CONVERT.
## Execute 'odbc_config --cflags' to determine if you need to uncomment it.
# IconvEncoding=UCS-4LE
iODBC throws the following:
iODBC Demonstration program
This program shows an interactive SQL processor
Driver Manager: 03.52.0709.0909
Enter ODBC connect string (? shows list): DSN=ATHENA
1: SQLDriverConnect = [iODBC][Driver Manager]/opt/amazon/hiveodbc/lib/64/AthenaJDBC41-1.1.0.jar: invalid ELF header (0) SQLSTATE=00000
2: SQLDriverConnect = [iODBC][Driver Manager]Specified driver could not be loaded (0) SQLSTATE=IM003
Any suggestion would be much appreciated!

Related

How to use SPARQL UPDATE UNLOAD to delete data from aws Neptune db

I want to use the UNLOAD command for unloading the data (.ttl file) in Neptune graph db.
I tried by also by curl command in jupyter lab python terminal from neptune graph db. Could anyone please give example how to use UNLOAD command? Using this documentation :
https://docs.aws.amazon.com/neptune/latest/userguide/sparql-api-reference-unload.html
%%sparql
UNLOAD <https://s3.amazonaws.com/bucket-name/folder-name/file.ttl> FROM GRAPH <http://aws.amazon.com/neptune/vocab/v01/DefaultNamedGraph>
AND also
!curl https://cluster-endpoint:8182/sparql \
\-d "update=UNLOAD <https://s3.amazonaws.com/bucket-name/folder-name/file.ttl>"
In both cases it is showing me HTTP connect timeout while loading from s3...

How to connect to Amazon Athena using Simba ODBC

I am attempting to connect to Athena from RStudio using DBI::dbConnect, and I am having issues with opening the driver.
con <- DBI::dbConnect(
odbc::odbc(),
Driver = "[Simba Athena ODBC Driver]",
S3OutputLocation = "[s3://bucket-folder/]",
AwsRegion = "[region]",
AuthenticationType = "IAM Credentials",
Schema = "[schema]",
UID = rstudioapi::askForPassword("AWS Access Key"),
PWD = rstudioapi::askForPassword("AWS Secret Key"))
Error: nanodbc/nanodbc.cpp:983: 00000: [unixODBC][Driver Manager]Can't open lib '[Simba Athena ODBC Driver]' : file not found
In addition, this code returns nothing.
sort((unique(odbcListDrivers()[[1]])))
character(0)
It appears that my ODBC driver is unaccessible or incorrectly installed, but I am having trouble understanding why. I have downloaded the driver and can see it in my library.
Any insight is greatly appreciated!
The function arguments look strange. Remove the [] from Driver, S3OutputLocation and AwsRegion.
I solved by validating the list of driver that R recognizes using odbc::odbcListDrivers(), then adjusting the name of the Driver argument accordingly. If R can't definitely identify the driver, setting ODBCSYSINI=/folder_that_contains_odbcinst.ini/ in .Renviron solved for me.

Spark org.postgresql.Driver not found even though it's configured EMR

I am trying to write a pyspark data frame to a Postgres database with the following code:
mode = "overwrite"
url = "jdbc:postgresql://host/database"
properties = {"user": "user","password": "password","driver": "org.postgresql.Driver"}
dfTestWrite.write.jdbc(url=url, table="test_result", mode=mode, properties=properties)
However I am getting the following error:
An error occurred while calling o236.jdbc.
: java.lang.ClassNotFoundException: org.postgresql.Driver
I've found a few SO questions that address a similar issue but haven't found anything that helps. I followed the AWS docs here to add the configuration and from the EMR console it looks as though it was successful:
What am I doing wrong?
What you followed document is to add the database connector for the Presto and it is not a way to add the jdbc driver into the spark. Connector does not mean the driver.
You should download the postgresql jdbc driver and locate it to the spark lib directory or somewhere to refer it by a configuration.

Issue connecting to Databricks table from Azure Data Factory using the Spark odbc connector

​We have managed to get a valid connection from Azure Data Factory towards our Azure Databricks cluster using the Spark (odbc) connector. In the list of tables we do get the expected list, but when querying a specific table we get an exception.
ERROR [HY000] [Microsoft][Hardy] (35) Error from server: error code:
'0' error message:
'com.databricks.backend.daemon.data.common.InvalidMountException:
Error while using path xxxx for resolving path xxxx within mount at
'/mnt/xxxx'.'.. Activity ID:050ac7b5-3e3f-4c8f-bcd1-106b158231f3
In our case the Databrick tables and mounted parquet files stored in Azure Data Lake 2, this is related to the above exception. Any suggestions how to solve this issue?
Ps. the same error appaers when connectin from Power BI desktop.
Thanks
Bart
In your configuration to mount the lake can you add this setting:
"fs.azure.createRemoteFileSystemDuringInitialization": "true"
I haven't tried your exact scenario - however this solved a similar problem for me using Databricks-Connect.

Libname Connection to SAS Access to Redshift throwing CLI error

We recently got SAS Access to Redshift license and are trying to connect to Redshift database directly without ODBC.
This is the libname statement I am using
libname A1 redshift server='XXX' port=5439 user='YYYY' password='ZZZZ' Database='RRRR';
It is however throwing the following error
ERROR: CLI error trying to establish connection: [unixODBC][Driver
Manager]Can't open lib 'SAS ACCESS to Amazon Redshift' : file not
found ERROR: Error in the LIBNAME statement.
Are there any configurations we need to do before using SAS Access to Redshift?
Thanks in advance.
Yes - you need to configure your ODBC settings.
You can find a very detailed guide here and additional guidance here