Lambda Python request athena error OutputLocation - python-2.7

I'm working with AWS Lambda and I would like to make a simple query in athena and store my data in an s3.
My code :
import boto3
def lambda_handler(event, context):
query_1 = "SELECT * FROM test_athena_laurent.stage limit 5;"
database = "test_athena_laurent"
s3_output = "s3://athena-laurent-result/lambda/"
client = boto3.client('athena')
response = client.start_query_execution(
QueryString=query_1,
ClientRequestToken='string',
QueryExecutionContext={
'Database': database
},
ResultConfiguration={
'OutputLocation': 's3://athena-laurent-result/lambda/'
}
)
return response
It works on spyder 2.7 but in AWS I have this error :
Parameter validation failed:
Invalid length for parameter ClientRequestToken, value: 6, valid range: 32-inf: ParamValidationError
Traceback (most recent call last):
File "/var/task/lambda_function.py", line 18, in lambda_handler
'OutputLocation': 's3://athena-laurent-result/lambda/'
I think that It doesn't understand my path and I don't know why.
Thanks

ClientRequestToken (string) --
A unique case-sensitive string used to ensure the request to create the query is idempotent (executes only once). If another StartQueryExecution request is received, the same response is returned and another query is not created. If a parameter has changed, for example, the QueryString , an error is returned. [Boto3 Docs]
This field is autopopulated if not provided.
If you are providing a string value for ClientRequestToken, ensure it is within length limits from 32 to 128.

Per #Tomalak's point ClientRequestToken is a string. However, per the documentation I just linked, you don't need it anyway when using the SDK.
This token is listed as not required because AWS SDKs (for example the AWS SDK for Java) auto-generate the token for users. If you are not using the AWS SDK or the AWS CLI, you must provide this token or the action will fail.
So, I would refactor as such:
import boto3
def lambda_handler(event, context):
query_1 = "SELECT * FROM some_database.some_table limit 5;"
database = "some_database"
s3_output = "s3://some_bucket/some_tag/"
client = boto3.client('athena')
response = client.start_query_execution(QueryString = query_1,
QueryExecutionContext={
'Database': database
},
ResultConfiguration={
'OutputLocation': s3_output
}
)
return response

Related

Linking dialogflow to aws dynamodb

I am new to AWS and following a tutorial to read data from my dynamo db to dialogdlow to use as a response, I have been following a tutorial and I get a Webhook call failed. Error:
UNAVAILABLE, State: URL_UNREACHABLE, Reason: UNREACHABLE_5xx, HTTP
status code: 500.
Someone suggested it due to lambda not working(the function is linked to an API gateway I posted the webhook on in Dialogflow), when I ran the function I got the Response
{
"errorMessage": "'queryResult'",
"errorType": "KeyError",
"requestId": "d51caf49-1cf2-4428-b42a-d8dbe9cdb4f8",
"stackTrace": [
" File \"/var/task/lambda_function.py\", line 18, in lambda_handler\n distance=event['queryResult']['parameters']['distance']\n"
]
}
Error and my code are attached below, someone said it has to with how I am accessing keys in my python dictionary but I fail to understand
import boto3
from boto3.dynamodb.conditions import Key
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('places')
def findPlaces(distance):
response=table.get_item(Key={'distance': distance})
if 'Item' in response:
return{'fulfillmentText':response['Item']['place']}
else:
return{'fulfillmentText':'No Places within given distance'}
def lambda_handler(event, context):
distance=event['queryResult']['parameters']['distance']
return findPlaces(int(distance))
I also need help with how I would store data from Dialogflow responses back to dynamo DB, thank you

How to Copy Large Files From AWS S3 bucket to another S3 buckets using boto3 Python API?

How to Copy Large Files From AWS S3 bucket to another S3 buckets using boto3 Python API? If we use client.copy(), it fails by throwing error as "An error occurred (InvalidArgument) when calling the UploadPartCopy operation: Range specified is not valid for source object of size:"
As per AWS S3 boto3 API documentation, we should use multipart upload. I have googled it but could not find clear, precise answer to my question. Finally after reading boto3 api's thoroughly, I have found answer to my question. Here is the answer. This code works perfectly with multi-threading also.
Create s3_client in each thread in case if you use multi threading. I tested this method, works perfectly copying huge Terra bytes of data from one S3 bucket to different s3 bucket.
Code to get s3_client
def get_session_client():
# session = boto3.session.Session(profile_name="default")
session = boto3.session.Session()
client = session.client("s3")
return session, client
def copy_with_multipart(local_s3_client, src_bucket, target_bucket, key, object_size):
current_thread_name = get_current_thread_name()
try:
initiate_multipart = local_s3_client.create_multipart_upload(
Bucket=target_bucket,
Key=key
)
upload_id = initiate_multipart['UploadId']
# 5 MB part size
part_size = 5 * 1024 * 1024
byte_position = 0
part_num = 1
parts_etags = []
while (byte_position < object_size):
# The last part might be smaller than partSize, so check to make sure
# that lastByte isn't beyond the end of the object.
last_byte = min(byte_position + part_size - 1, object_size - 1)
copy_source_range = f"bytes={byte_position}-{last_byte}"
# Copy this part
try:
info_log(f"{current_thread_name} Creating upload_part_copy source_range: {copy_source_range}")
response = local_s3_client.upload_part_copy(
Bucket=target_bucket,
CopySource={'Bucket': src_bucket, 'Key': key},
CopySourceRange=copy_source_range,
Key=key,
PartNumber=part_num,
UploadId=upload_id
)
except Exception as ex:
error_log(f"{current_thread_name} Error while CREATING UPLOAD_PART_COPY for key {key}")
raise ex
parts_etags.append({"ETag": response["CopyPartResult"]["ETag"], "PartNumber": part_num})
part_num += 1
byte_position += part_size
try:
response = local_s3_client.complete_multipart_upload(
Bucket=target_bucket,
Key=key,
MultipartUpload={
'Parts': parts_etags
},
UploadId=upload_id
)
info_log(f"{current_thread_name} {key} COMPLETE_MULTIPART_UPLOAD COMPLETED SUCCESSFULLY, response={response} !!!!")
except Exception as ex:
error_log(f"{current_thread_name} Error while CREATING COMPLETE_MULTIPART_UPLOAD for key {key}")
raise ex
except Exception as ex:
error_log(f"{current_thread_name} Error while CREATING CREATE_MULTIPART_UPLOAD for key {key}")
raise ex
Invoking multipart method:
_, local_s3_client = get_session_client()
copy_with_multipart(local_s3_client, src_bucket_name, target_bucket_name, key, src_object_size)

Internal Server Error when querying endpoint

I have a very simple lambda function that i created in aws. Please see below.
import json
print('Loading function')
def lambda_handler(event, context):
#1. Parse out query string params
userChestSize = event['userChestSize']
print('userChestSize= ' + userChestSize)
#2. Construct the body of the response object
transactionResponse = {}
transactionResponse['userChestSize'] = userChestSize
transactionResponse['message'] = 'Hello from Lambda'
#3. Construct http response object
responseObject = {}
responseObject['statusCode'] = 200
responseObject['headers'] = {}
responseObject['headers']['Content-Type'] = 'application/json'
responseObject['body'] = json.dumps(transactionResponse)
#4. Return the response object
return responseObject
Then I created a simple api with GET method. It generated a endpoint link for me to test my lambda. So when i use my link https://abcdefgh.execute-api.us-east-2.amazonaws.com/TestStage?userChestSize=30
I get
{"message": "Internal server error"}
Cloud Log has the following error
'userChestSize': KeyError
Traceback (most recent call last):
File "/var/task/lambda_function.py", line 7, in lambda_handler
userChestSize = event['userChestSize']
KeyError: 'userChestSize'
What am i doing wrong? I followed the basic instructions to create lambda and api gateway.
event['userChestSize'] does not exist. I suggest logging the entire event object so you can see what is actually in the event.

returning JSON response from AWS Glue Pythonshell job to the boto3 caller

Is there a way to send a JSON response (of a dictionary of outputs) from A AWS Glue pythonshell job? Similar to returning a JSON response from AWS Lambda?
I am calling a Glue pythonshell job like below:
response = glue.start_job_run(
JobName = 'test_metrics',
Arguments = {
'--test_metrics': 'test_metrics',
'--s3_target_path_key': 's3://my_target',
'--s3_target_path_value': 's3://my_target_value'} )
print(response)
The response I get is a 200 stating the fact that the Glue start_job_run was a success. From the documentation, all I see is the result if a Glue job is either written in s3 or some other database.
I tried adding return {'result':'some_string'} at the end of my Glue pythonshell job to test if it works or not with below code.
import sys
from awsglue.utils import getResolvedOptions
args = getResolvedOptions(sys.argv,
['JOB_NAME',
's3_target_path_key',
's3_target_path_value'])
print ("Target path key is: ", args['s3_target_path_key'])
print ("Target Path value is: ", args['s3_target_path_value'])
return {'result':"some_string"}
But it throws error SyntaxError: 'return' outside function
Glue is not made to return response as it is expected to run long running operation inside it. Blocking for response for long running task is not right approach in itself. Instead of it, you may use launch job (service 1) -> execute job (service 2)-> get result (service 3) pattern. You can send json response to AWS service 3 which you want to launch from AWS Service 2 (execute job) e.g. if you launch lambda from glue job, you can send json response to it.

aws boto3 client Stubber help stubbing unit tests

I'm trying to write some unit tests for aws RDS. Currently, the start stop rds api calls have not yet been implemented in moto. I tried just mocking out boto3 but ran into all sorts of weird issues. I did some googling and found http://botocore.readthedocs.io/en/latest/reference/stubber.html
So I have tried to implement the example for rds but the code appears to be behaving like the normal client, even though I have stubbed it. Not sure what's going on or if I am stubbing correctly?
from LambdaRdsStartStop.lambda_function import lambda_handler
from LambdaRdsStartStop.lambda_function import AWS_REGION
def tests_turn_db_on_when_cw_event_matches_tag_value(self, mock_boto):
client = boto3.client('rds', AWS_REGION)
stubber = Stubber(client)
response = {u'DBInstances': [some copy pasted real data here], extra_info_about_call: extra_info}
stubber.add_response('describe_db_instances', response, {})
with stubber:
r = client.describe_db_instances()
lambda_handler({u'AutoStart': u'10:00:00+10:00/mon'}, 'context')
so the mocking WORKS for the first line inside the stubber and the value of r is returned as my stubbed data. When I try and go into my lambda_handler method inside my lambda_function.py and still use the stubbed client it behaves like a normal unstubbed client:
lambda_function.py
def lambda_handler(event, context):
rds_client = boto3.client('rds', region_name=AWS_REGION)
rds_instances = rds_client.describe_db_instances()
error output:
File "D:\dev\projects\virtual_envs\rds_sloth\lib\site-packages\botocore\auth.py", line 340, in add_auth
raise NoCredentialsError
NoCredentialsError: Unable to locate credentials
You will need to patch boto3 where it is called in the routine that you will be testing. Also Stubber responses appear to be consumed on each call and thus will require another add_response for each stubbed call as below:
def tests_turn_db_on_when_cw_event_matches_tag_value(self, mock_boto):
client = boto3.client('rds', AWS_REGION)
stubber = Stubber(client)
# response data below should match aws documentation otherwise more errors due to botocore error handling
response = {u'DBInstances': [{'DBInstanceIdentifier': 'rds_response1'}, {'DBInstanceIdentifierrd': 'rds_response2'}]}
stubber.add_response('describe_db_instances', response, {})
stubber.add_response('describe_db_instances', response, {})
with mock.patch('lambda_handler.boto3') as mock_boto3:
with stubber:
r = client.describe_db_instances() # first_add_response consumed here
mock_boto3.client.return_value = client
response=lambda_handler({u'AutoStart': u'10:00:00+10:00/mon'}, 'context') # second_add_response would be consumed here
# asert.equal(r,response)