AWS Athena + boto3: How am I supposed to execute named queries? - amazon-web-services

Concerning the following draft script I would like to know: How can I execute the named query I created?
I can access the query via the browser interface, but would like to execute it via the Session.
Here the answer is to use the client.start_query_execution(...) command. But whats the point when it is not the named query I created but instead a non-named query with the same query_string. Or am I missing something essential in how to use this?
import boto3
sess = boto3.session.Session(
region_name=region,
aws_access_key_id=ACCESS_KEY,
aws_secret_access_key=SECRET_KEY
)
athenaclient = sess.client('athena')
query_string = '''
SELECT *
FROM "ccindex"."ccindex"
WHERE crawl = 'CC-MAIN-2020-34'
AND subset = 'warc'
AND url_host_tld = 'de'
AND url_query IS NULL
AND url_path like '%impressum%'
LIMIT 20000
'''
resp = athenaclient.create_named_query(
Name='filter-ccindex-de',
Description='Filter *.de/impressum websites of Common Crawl index',
Database='ccindex',
QueryString=query_string
)

I don't think there is a direct option to pass named query to your start_query_execution method.But this can be achieved by using get_named_query which accepts Name of the named query and returns QueryString in response.
Then you can parse this response and pass QueryString to start_query_execution method.

Related

Athena query executed through boto3 python client gives smaller result compared to query executed through AWS cli

I want to execute a very simple query through Athena.
Query: select * from information_schema.tables
When I execute the query using the boto3 client with the following code:
...
def run_query(query_string):
query_execution_context = {"Catalog": "awsdatacatalog", "Database": "information_schema"}
response = athena_client.start_query_execution(
QueryString=query_string, QueryExecutionContext=query_execution_context, WorkGroup="primary"
)
return response
query_string_get_tables = "select * from information_schema.tables"
response = run_query(query_string_get_tables)
I get back a result of 9 rows in 0.6s.
When I then go to the AWS console and rerun the same query I get back a result of 500 rows in 6s.
The result from the AWS console is correct. How can I get the same result using the boto3 client?
EDIT:
I downloaded the query history and compared the query string. As you can see they are exactly the same. I also removed the QueryExecutionContext in the boto3 client call but this doesn't change anything. Besides, I tried all combinations of single/double quotes.
Query history:
37b72ac5-3223-496f-8293-79eab8a661a0,select * from information_schema.tables,2022-12-02T18:23:09.738-08:00,SUCCEEDED,6.503 sec,39.01 KB,Athena engine version 2,'-
9d3a274a-8109-4988-aaf8-bba9c8733208,select * from information_schema.tables,2022-12-02T18:14:11.385-08:00,SUCCEEDED,520 ms,0.67 KB,Athena engine version 2,'-
As mentioned in the comments using boto3 needs some efforts to start_query_execution, wait for its completion, and then get_query_results (https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/athena.html#Athena.Client.get_query_results).
To make your life easier, you can use the open-source library AWSWrangler or AWS-SDK-Pandas. With this library to can get the results in a blocking manner:
# Retrieving the data from Amazon Athena
df = wr.athena.read_sql_query("SELECT * FROM my_table", database="my_db")

How to query parameters with API Gateway to lambda

I made a REST API with AWS Lambda+ API Gateway.
my API Gateway's Integration Request is LAMBDA_PROXY Type,
and I use params in Lambda like this. ( myparam is list type)
def lambda_handler(event, context):
# TODO implement
try:
myparam = event['multiValueQueryStringParameters']['param1']
#...
I tested my REST API in python like this.
url = 'https://***.amazonaws.com/default/myAPI'
param = {'param1':['1','2']}
res = requests.get(url=url,params=param).json()
print(res)
It works. but when I tried with another way like this,
url = 'https://***.amazonaws.com/default/myAPI?param1=1,2'
res = requests.get(url=url).json()
print(res)
It didn't work with this way.
How to query parameters in case if I want to insert parameter into url directly?
Those tow requests are not equivalent. In order to prove it, we can print the formatted URL for the first request:
url = 'https://***.amazonaws.com/default/myAPI'
param = {'param1':['1','2']}
res = requests.get(url=url,params=param).json()
# Print the request URL
print(res.request.url)
This will print something like:
https://***.amazonaws.com/myAPI?param1=1&param1=2
So, in your second snippet, you probably would want to create your URL as follows:
url = 'https://***.amazonaws.com/myAPI?param1=1&param1=2'
res = requests.get(url=url).json()
print(res)
If you want to separate your parameters with commas, the value for param1 will be a string ('1,2'), not an list.

How to get an error of query to athena via boto3?

Does boto3 have any method which allows one to get the text of the error if the query failed? get_query_execution returns a status of the query only.
You can get the error message from 'StateChangeReason' field of your response['Status'].
As per get_query_execution documentation:
StateChangeReason (string) --
Further detail about the status of the query.
import boto3
client = boto3.client('athena')
failed_query_id = '08adbf00-5f14-4d54-9311-fd55e2024781'
response = client.get_query_execution(QueryExecutionId=failed_query_id)
print(response['Status']['StateChangeReason'])

Presigned URL for DynamoDB put_item

There are a few examples for the way to pre-sign the URL of an S3 request, but I couldn't find any working example to pre-sign other services in AWS.
I'm trying to write an item to DynamoDB using the Python SDK botos. The SDK included the option to generate the pre-signed URL here. I'm trying to make it work and I'm getting a URL, but the URL is responding with 404 and the Item is not appearing in the DynamoDB table.
import json
ddb_client = boto3.client('dynamodb')
response = ddb_client.put_item(
TableName='mutes',
Item={
'email': {'S':'g#g.c'},
'until': {'N': '123'}
}
)
print("PutItem succeeded:")
print(json.dumps(response, indent=4))
This code is working directly. But when I try to presign it:
ddb_client = boto3.client('dynamodb')
params = {
'TableName':'mutes',
'Item':
{
'email': {'S':'g#g.c'},
'until' : {'N': '1234'}
}
}
response = ddb_client.generate_presigned_url('put_item', Params = params)
and check the URL:
import requests
r = requests.post(response)
r
I'm getting: Response [404]
Any hint on how to get it working? I checked the IAM permissions, and they are giving full access to DynamoDB.
Please note that you can sign a request to DynamoDB using python, as you can see here: https://docs.aws.amazon.com/general/latest/gr/sigv4-signed-request-examples.html#sig-v4-examples-post . But for some reasons, the implementation in the boto3 library doesn't do that. Using the boto3 library is much easier than the code above, as I don't need to provide the credentials for the function.
You send an empty post request. You should add the data to the request:
import requests
r = requests.post(response, data = params)
I think you are having this issue, that's why you are recieving a 404.
They recommend using Cognito for authentication instead of IAM for this cases.

pass query param as dictionary with many values in postman

I am using postman to hit my APIs. I have a question regarding sending query params through postman params. In my API i am getting params using services = request.GET.get('services') and then returning response for the services.
My Question is if have more than one service like 'A', 'B', 'C', then how can we send these services in params using postman?
views.py
class SomeAPIView(ModelViewSet):
def get_queryset(self):
services = self.request.GET.get('services')
print(services) # getting services
print(type(services)) #type is string
response_list = []
for service in services:
result = API(service=service)
response_list.append(result)
return response_list
I want get list of services and then iterate over that list to return response for that service.
It depending on how you would use this API entry in production.
At first, the good way in django is to use request.query_params to obtain query parameters. Also you must provide default value to get() method to avoid exceptions if there's no 'services' parameter passed.
Then, if your services parameter contain names or ids of some objects, you may just pass it with parameters in GET request as http://someurl?services=A,B,C, or within tab, named 'Params' in postman. So request.query_params.get('sevices', '') will return string, contains 'A,B,C'. Now you can split it by ',' like services_names = str.split(',').
Anyway, parameters of GET requests may return only str values.
According to your example, it may looks like:
class SomeAPIView(ModelViewSet):
def get_queryset(self):
services = self.request.query_params.get('sevices', '').split(',')
print(services) # getting services
print(type(services)) # now it will be List[str]
response_list = []
for service in services:
result = API(service=service)
response_list.append(result)
return response_list