How to connect Superset with AWS athena? - amazon-web-services

Has anyone tried connecting superset to AWS athena ?
I was able to connect to redshift by using SQLAlchemy URI:
postgresql://username:password#xxxx.redshift.amazonaws.com:port/dbname
but I am having hard time connecting to AWS athena. AWS has JDBC driver (http://docs.aws.amazon.com/athena/latest/ug/connect-with-jdbc.html) but I can't figure out how to use it with superset. Any example ?

In case someone else would come here:
awsathena+jdbc://username:password#xxxx.redshift.amazonaws.com:port/dbname
This is from the superset documentation.

We tried installing superset with PyAthena JDBC & REST. Our experience with PyAthena (REST) is far better than PyAthenaJDBC, would recommend to use same in production.
Install PyAthena (pure python library, java is not needed)
pip install "PyAthena>1.2.0"
Access database by creating connection url
awsathena+rest://{aws_access_key_id}:{aws_secret_access_key}#athena.{region_name}.amazonaws.com/{schema_name}?s3_staging_dir={s3_staging_dir}&...
I found this article, a good guide on deploying superset.

Take a look at this github PR
You'll want to install PyAthenaJDBC package into pip. The driver that you are referring to is a Java driver, which is great, but Superset is largely a Python application, so it'll need a python driver to connect/interact with Athena.
The above answer is correct, but you'll want to install that package to ensure that you actually can connect to athena.

You must define a property s3_staging_dir when you connect to Athena's Driver.
Example: s3_staging_dir=s3://your_bucket

I got it to work using: PyAthenaJDBC (python 3.6.7) with these steps:
1) Make sure you have the PyAthenaJDBC pkg. installed:
pip install "PyAthenaJDBC>1.0.9"
2) Restart superset
3) Download the JDBC driver: from aws driver download I used the AthenaJDBC41-2.0.6.jar version
Example driver download url, Note: I saved my driver in /drivers/
wget https://s3.amazonaws.com/athena-downloads/drivers/JDBC/SimbaAthenaJDBC_2.0.6/AthenaJDBC41_2.0.6.jar
4) Add the data-source to superset:
awsathena+jdbc://AWS_KEY:AWS_SECRET#athena.us-west-2.amazonaws.com/mydb?s3_staging_dir=s3://path/to/my/data/&driver_path=/drivers/AthenaJDBC41_2.0.6.jar
Note: If superset is running on ECS / EC2 you can assign an IAM role, and remove the AWS KEY/SECRET from the URI, Example raw connection URI below:
awsathena+jdbc://{aws_key}:{aws_secret}#athena.{region_name}.amazonaws.com/{schema_name}?s3_staging_dir={s3_staging_dir}&driver_path={driver_path}
Much more info here:

Official guidance from Superset:
https://superset.apache.org/docs/databases/athena
awsathena+rest://{aws_access_key_id}:{aws_secret_access_key}#athena.{region_name}.amazonaws.com/{schema_name}?s3_staging_dir={s3_staging_dir}&...
You need to have some tweaks by yourself. This worked for me after many hours of reading posts on 2021-12-12:
awsathena+rest://{secret id}:{secret access key}#athena.ap-southeast-1.amazonaws.com/test?s3_staging_dir=s3://{your bucket where Athena query result is stored}/test/&work_group=primary
Note that in my example:
"schema_name = test": You must see a database named "test" under Athena \ Query Editor \ Database at this point. It is created in Glue Console \ Data Catalog \ Database with a crawler or manual.
s3://{your bucket}/{path if needed}/test: you need to go to Athena \ Workgroups, select a workgroup and check the setting if it turned on the "Query result location" or not. In my case, the name of the workgroup is "primary", the query result of the "test" database will be stored in s3://{your bucket where Athena query result is stored}/test/
Make sure you have installed these under Python Virtual Environment:
pip install "PyAthenaJDBC>1.0.9"
pip install "PyAthena>1.2.0"
See how to create Superset under Python Env:
https://superset.apache.org/docs/installation/installing-superset-from-scratch
Security Group:
(I got this instruction from here: https://www.youtube.com/watch?v=vzuPQPRcT-0)
I build Superset on the EC2 Instance. Therefore, you need to check out the security group setting. Because it relates to EC2 service, Athena service, and the website which Superset is running in UI.
In my case, I have turned on all these settings to make sure it can run the first time. Then you can narrow down the setting later.
Custom TCP - TCP - 8088 - ::/0 ; 0.0.0.0/0
HTTP - TCP - 80 - ::/0 ; 0.0.0.0/0
SSH - TCP - 22 - ::/0 ; 0.0.0.0/0
Custom ICMP - IPv4 - Echo Request - N/A - 0.0.0.0/0
All ICMP - IPv6 - IPv6 ICMP - All - ::/0
All ICMP - IPv6 - IPv6 ICMP - All - 0.0.0.0/0

After lot of hustle managed to create connection string which works, note that all key & s3 path needs to be encoded, below format works for me
awsathena+rest://{encoded aws_access_key_id}:{encoded aws_secret_access_key}#athena.{region_name}.amazonaws.com:443/{schema_name}?s3_staging_dir={encoded s3_staging_dir}
You can use below code to generate connection string, save it to file & run
from urllib.parse import quote_plus
conn_str = "awsathena+rest://{aws_access_key_id}:{aws_secret_access_key}#athena.{region_name}.amazonaws.com:443/"\
"{schema_name}?s3_staging_dir={s3_staging_dir}"
print(conn_str.format(
aws_access_key_id=quote_plus("{aws_access_key_id}"),
aws_secret_access_key=quote_plus("{aws_secret_access_key}"),
region_name="{region_name}",
schema_name="{schema_name}",
s3_staging_dir=quote_plus("{s3_staging_dir}")))

Related

Dynamodb local web shell does not load

I am running DynamoDB locally using the instructions here. To remove potential docker networking issues I am using the "Download Locally" version of the instructions. Before running dynamo locally I run aws configure to set some fake values for AWS access, secret, and region, and here is the output:
$ aws configure
AWS Access Key ID [****************fake]:
AWS Secret Access Key [****************ake2]:
Default region name [local]:
Default output format [json]:
here is the output of running dynamo locally:
$ java -Djava.library.path=./DynamoDBLocal_lib -jar DynamoDBLocal.jar -sharedDb
Initializing DynamoDB Local with the following configuration:
Port: 8000
InMemory: false
DbPath: null
SharedDb: true
shouldDelayTransientStatuses: false
CorsParams: *
I can confirm that the DynamoDB is running locally successfully by listing tables using aws cli
$ aws dynamodb list-tables --endpoint-url http://localhost:8000
{
"TableNames": []
}
but when I visit http://localhost:8000/shell in my browser, this is the error I get and the page does not load.
I tried running curl on the shell to see if I can get a more useful error message:
$ curl http://localhost:8000/shell
{
"__type":"com.amazonaws.dynamodb.v20120810#MissingAuthenticationToken",
"Message":"Request must contain either a valid (registered) AWS access key ID or X.509 certificate."}%
I tried looking up the error above, but I don't have much choice in doing setup when running the shell merely in the browser. Any help is appreciated on how I can run the Dynamodb javascript web shell with this setting.
Software versions:
aws cli: aws-cli/2.4.7 Python/3.9.9 Darwin/20.6.0 source/x86_64 prompt/off
OS: MacOS Big Sur 11.6.2 (20G314)
DynamoDB Local Web Shell was deprecated with version 1.16.X and is not available any longer from 1.17.X to latest. There are no immediate plans for a new Web Shell to be introduced.
You can download an old version of DynamoDB Local < 1.17.X should you wish to use the Web Shell.
Available versions:
aws s3 ls s3://dynamodb-local-frankfurt/
Download most recent working version with Web Shell:
aws s3 ls s3://dynamodb-local-frankfurt/dynamodb_local_2021-04-27.tar.gz .
The next release of DynamoDB Local will have an updated README indicating its deprecation
As I answered in DynamoDB local http://localhost:8000/shell this appears to be a regression in new versions of DynamoDB Local, where the shell mysteriously stopped working, whereas in versions from a year ago it does work.
Somebody should report it to Amazon. If there is some flag that new versions require you to set to enable the shell, it isn't documented anywhere that I can find.
Update JAVA to the latest version and voila, it works!

not able to connect superset to druid

I have Druid and superset running locally, but I am not able to connect them together. I have sample data wikiticker in Druid. I already installed pydruid with pip3: pip3 install pydruid (I am not sure if I need to install this to any particular location). I have also installed superset using docker-compose locally using This Link, However, I am not able to connect Druid with Superset. I went to Data->Databases->add database. In Connection, I gave Database name as Druid and not sure what to give in SQLALCHEMY URI*
. I tried these:
druid//admin:admin#localhost:8082/wikiticker
pydruid//admin:admin#localhost:8082/wikiticker
druid://admin:admin#localhost:8082/druid/v2/sql
but nothing is working.
As far as I know, Druid has no built-in authentication. The SQLALCHEMY_URI string should be druid+https://localhost:8082/druid/v2/sql/ (or druid+http://localhost:8082/druid/v2/sql/ if you're using HTTP).
As per documentation the connection string should look like this (third variant in the question):
druid://<User>:<password>#<Host>:<Port-default-9088>/druid/v2/sql
Why you cannot connect might be, because of your docker setup. In the context of your superset docker container localhost refers to that particular docker container. For example the database and the redis cache are referred to as db and redis for the connection setup within the docker-compose.yml and the environment variables set in .env.
So you could extend the docker-compose.yml to include the druid container, named druid as well and then connect to it like this:
druid://admin:admin#druid:PORTTHATYOUEXPOSED/druid/v2/sql
There is a good chance that you didn't add the Root Certificate. You can either do that or disable SSL verification. See the documentation here: https://superset.apache.org/docs/databases/druid

_tds.InterfaceError when trying to connect to Azure Data Warehouse through Python 2.7 and ctds

I'm trying to connect a python 2.7 script to Azure SQL Data Warehouse.
The coding part is done and the test cases work in our development environment. We're are coding in Python 2.7 in MacOS X and connecting to ADW via ctds.
The problem appears when we deploy on our Azure Kubernetes pod (running Debian 9).
When we try to instantiate a connection this way:
# init a connection
self._connection = ctds.connect(
server='myserver.database.windows.net',
port=1433,
user="my_user#myserver.database.windows.net",
timeout=1200,
password="XXXXXXXX",
database="my_db",
autocommit=True
)
we get an exception that only prints the user name
my_user#myserver.database.windows.net
the type of the exception is
_tds.InterfaceError
The code deployed is the exact same and also the requirements are.
The documentation we found for this exception is almost non-existent.
Do you guys recognize it? Do you know how can we go around it?
We also tried in our old AWS instances of EC2 and AWS Kubernetes (which rans the same OS as the Azure ones) and it also doesn't work.
We managed to connect to ADW via sqlcmd, so that proves the pod can in fact connect (I guess).
EDIT: SOLVED. JUST CHANGED TO PYODBC
def connection(self):
""":rtype: pyodbc.Connection"""
if self._connection is None:
env = '' # whichever way you have to identify it
# init a connection
driver = '/usr/local/lib/libmsodbcsql.17.dylib' if env == 'dev' else '{ODBC Driver 17 for SQL Server}' # my dev env is MacOS and my prod is Debian 9
connection_string = 'Driver={driver};Server=tcp:{server},{port};Database={db};Uid={user};Pwd={password};Encrypt=yes;TrustServerCertificate=no;Connection Timeout=30;'.format(
driver=driver,
server='myserver.database.windows.net',
port=1433,
db='mydb',
user='myuser#myserver',
password='XXXXXXXXXXXX'
)
self._connection = pyodbc.connect(connection_string, autocommit=True)
return self._connection
As Ron says, pyodbc is recommended because it enables you to use a Microsoft-supported ODBC Driver.
I'm going to go ahead and guess that ctds is failing on redirect, and you need to force your server into "proxy" mode. See: Azure SQL Connectivity Architecture
EG
# Get SQL Server ID
sqlserverid=$(az sql server show -n sql-server-name -g sql-server-group --query 'id' -o tsv)
# Set URI
id="$sqlserverid/connectionPolicies/Default"
# Get current connection policy
az resource show --ids $id
# Update connection policy
az resource update --ids $id --set properties.connectionType=Proxy

Cannot connect to endpoint dynamo db windows 10

I've been following this tutorial (https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DynamoDBLocal.DownloadingAndRunning.html) on how to set up a downloadable DynamoDB on my computer, but have been coming across an issue when I try to connect to a local host.
I have checked my host file and everything seems to be ok...
I am using Windows 10 cmd and these are the outputs on my command line:
C:\Users\Desktop\dynamodb_local_latest>java -
D"java.library.path=./DynamoDBLocal_lib" -jar DynamoDBLocal.jar
Initializing DynamoDB Local with the following configuration:
Port: 8000
InMemory: false
DbPath: null
SharedDb: false
shouldDelayTransientStatuses: false
CorsParams: *
C:\Users\Desktop\dynamodb_local_latest>aws dynamodb list-tables --endpoint-
url http://localhost:8000
Could not connect to the endpoint URL: "http://localhost:8000/"
C:\Users\Desktop\dynamodb_local_latest>
Any help will be greatly appreciated!
You must run 'aws configure' and set the required parameters (even if you're only using a local dynamo db emulator, just ignore the access/secret keys)
In addition to running aws configure as mentioned in #J.S.'s answer, you will need to ensure Dynamo DB is running. I recently had this error, where the service had shut down and I didn't realize it. If this is your case, make sure to restart it by going to the folder it is installed in and running java -Djava.library.path=./DynamoDBLocal_lib -jar DynamoDBLocal.jar -sharedDb &

InfluxDB Cannot see databases from localhost:8083 + Cannot access Command Line Interface

Please feel free to redirect me to any other place if this isn't the right one for this question.
Problem: When I log to the administration panel : "localhost:8083" with "root" "root" I cannot see the existing databases nor the data in it. Also, I have no way to access InfluxDB from the command line.
Also the line sudo /etc/init.d/influxdb start does not work for my setup. I have to go into /etc/init.d/ and run sudo ./influxdb start -config=config.toml in order to get the server running.
I've installed influxDB v0.8 from https://influxdb.com/docs/v0.8/introduction/installation.html for Ubuntu 14.04.
I've been developing a Clojure program using the Capacitor API just to get started and interact with InfluxDB. It runs well, I can create delete, insert and query a database without problems.
netstat -anp | grep LISTEN confirms me that ports 8083 8086 8090 and 8099 are listening.
I've been Googling all around but cannot manage to get a solution.
Thanks for the support and enjoy building things !
Problem solved: the database weren't visible in firefox but everything is visible in Chromium!
Why couldn't I access the CLI ? I was expecting the v0.8 to behave exactly like the v0.9.
You help was appreciated anyway !
For InfluxDB 0.9 the CLI could be started with:
/opt/influxdb/influx
then you can display available databases:
Connected to http://localhost:8086 version 0.9.1
InfluxDB shell 0.9.1
> show databases
name: databases
---------------
name
collectd
graphite
> use collectd
Using database collectd
> show series limit 5
You can try creating new database from CLI:
> CREATE DATABASE mydb
or with curl command:
curl -G 'http://localhost:8086/query' --data-urlencode "q=CREATE DATABASE mydb"
Web UI should be available on http://localhost:8083