I have a lambda function which queries Athena and loads the results into S3. My query is like below. Currently, when I run the lambda function twice, I will have 2 files (with random, auto generated names) under an object called 'current_date.csv'. But what I want is the second run can overwrite the first file. Is it possible to overwrite 's3://abc/def/', or what I can do to specify the key (csv file) name? Or is there a way to set the s3 rule saying one object can only has 1 file?
def lambda_handler(event, context):
client = boto3.client('athena')
# Execution
response = client.start_query_execution(
QueryString=query,
QueryExecutionContext={
'Database': DATABASE
},
ResultConfiguration={
'OutputLocation': 's3://abc/def/current_date.csv',
}
)
return response
return
Related
I understand I can write an S3 Object Lambda which can transparently alter the returned S3 contents on the fly during retrieval, without modifying the original object.
I understand also that Athena can read json or csv files from S3.
My question is, can both of these capabilities be combined, so that Athena queries would read data which is transparently altered on the fly via S3 Object Lambda prior to being parsed by Athena?
SOME CODE
Suppose I write a CSV file:
hello
world
Then I write an S3 Object lambda:
import boto3
import requests
def lambda_handler(event, context):
print(event)
object_get_context = event["getObjectContext"]
request_route = object_get_context["outputRoute"]
request_token = object_get_context["outputToken"]
s3_url = object_get_context["inputS3Url"]
# Get object from S3
response = requests.get(s3_url)
original_object = response.content.decode('utf-8')
# Transform object
transformed_object = original_object.upper()
# Write object back to S3 Object Lambda
s3 = boto3.client('s3')
s3.write_get_object_response(
Body=transformed_object,
RequestRoute=request_route,
RequestToken=request_token)
return {'status_code': 200}
(astute readers will notice this is the example from aws docs)
Now suppose I create an Athena EXTERNAL table and write this query:
SELECT * from hello
How can I ensure that the Athena query will return WORLD instead of world in this scenario?
I need to add a s3 trigger in the lambda function in the source code itself instead of creating a trigger in the aws console. I need that trigger to read a file when it is uploaded on a particular folder of S3 bucket. I have done this using the creating the s3 trigger in the console itself with help of prefix. Can someone help with of creating this s3 trigger in the lambda function source code itself. Below is the source code of the lambda function for reading the file.
import json
import urllib.parse
import boto3
print('Loading function')
s3 = boto3.client('s3')
def lambda_handler(event, context):
#print("Received event: " + json.dumps(event, indent=2))
# Get the object from the event and show its content type
bucket = event['Records'][0]['s3']['bucket']['name']
key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8')
try:
response = s3.get_object(Bucket=bucket, Key=key)
print("CONTENT TYPE: " + response['ContentType'])
return response['ContentType']
except Exception as e:
print(e)
print('Error getting object {} from bucket {}. Make sure they exist and your bucket is in the same region as this function.'.format(key, bucket))
raise e
The AWS Lambda function would presumably only be run when the trigger detects a new object.
However, you are asking how to create the trigger from the AWS Lambda function, but that function will only be run when a trigger exists.
So, it is a Cart before the horse - Wikipedia situation.
Instead, you can create the trigger in Amazon S3 from the AWS Lambda console or via an API call, but this is done as part of the definition of the Lambda function -- it can't be done from within the source code of the function, since it isn't run until the trigger already exists!
import boto3
import json
s3 = boto3.client('s3')
def lambda_handler(event, context):
bucket = "cloud-translate-output"
key = "key value"
try:
data = s3.get_object(Bucket=bucket, Key=key)
json_data = data["Body"].read()
return{
"response_code" : 200,
"data": str(json_data)
}
except Exception as e:
print (e)
raise e
I'm making ios app with xcode.
And I want to use aws to bring data from s3 to app in order of app-api gateway-lambda-s3. But is there a way to use the data of api using api in app, and if I upload this data to bucket number 1 of s3, the cloudformation will translate the uploaded text file and automatically save it to bucket number 2, and I want to import the text data file stored in bucket number 2 back to app through lambda, not key value, is there a way to use only the name of the bucket?
if I upload this data to bucket number 1 of s3, the cloudformation will translate the uploaded text file and automatically save it to bucket number 2
Sadly, this is not how CloudFormation work. It can't read or translate automatically any files from buckets, or upload them to new buckets.
I would stick with a lambda function. It is more suited to such tasks.
I have three s3 buckets that are invoking a lambda function whenever there is a change in the content of specific objects inside the buckets.
Does anyone know if it is possible, using boto3, to retrieve those objects that have triggered the function?
Thanks!
UPDATE
I would like to get the objects that have triggered the lambda function from the response contents. I have tried to get it from the response of the get_function method of the lambda client but to no avail:
import boto3
lam = boto3.client('lambda')
response = lam.get_function(FunctionName='mylambdafunction')
Here's some sample code to retrieve the object that triggered the AWS Lambda function invocation:
import urllib
import boto3
# Connect to S3
s3_client = boto3.client('s3')
# This handler is executed every time the Lambda function is triggered
def lambda_handler(event, context):
# Get the bucket and object key from the Event
bucket = event['Records'][0]['s3']['bucket']['name']
key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'])
localFilename = '/tmp/foo.txt'
# Download the file from S3 to the local filesystem
s3_client.download_file(bucket, key, localFilename)
# Do other stuff here
Basically, it extracts the Bucket and Key (filename) from the event data that is passed to the function, then calls download_file().
My Requirement
I want to create a CloudWatch-Metric from Athena query results.
Example
I want to create a metric like user_count of each day.
In Athena, I will write an SQL query like this
select date,count(distinct user) as count from users_table group by 1
In the Athena editor I can see the result, but I want to see these results as a metric in Cloudwatch.
CloudWatch-Metric-Name ==> user_count
Dimensions ==> Date,count
If I have this cloudwatch metric and dimensions, I can easily create a Monitoring Dashboard and send send alerts
Can anyone suggest a way to do this?
You can use CloudWatch custom widgets, see "Run Amazon Athena queries" in Samples.
It's somewhat involved, but you can use a Lambda for this. In a nutshell:
Setup your query in Athena and make sure it works using the Athena console.
Create a Lambda that:
Runs your Athena query
Pulls the query results from S3
Parses the query results
Sends the query results to CloudWatch as a metric
Use EventBridge to run your Lambda on a recurring basis
Here's an example Lambda function in Python that does step #2. Note that the Lamda function will need IAM permissions to run queries in Athena, read the results from S3, and then put a metric into Cloudwatch.
import time
import boto3
query = 'select count(*) from mytable'
DATABASE = 'default'
bucket='BUCKET_NAME'
path='yourpath'
def lambda_handler(event, context):
#Run query in Athena
client = boto3.client('athena')
output = "s3://{}/{}".format(bucket,path)
# Execution
response = client.start_query_execution(
QueryString=query,
QueryExecutionContext={
'Database': DATABASE
},
ResultConfiguration={
'OutputLocation': output,
}
)
#S3 file name uses the QueryExecutionId so
#grab it here so we can pull the S3 file.
qeid = response["QueryExecutionId"]
#occasionally the Athena hasn't written the file
#before the lambda tries to pull it out of S3, so pause a few seconds
#Note: You are charged for time the lambda is running.
#A more elegant but more complicated solution would try to get the
#file first then sleep.
time.sleep(3)
###### Get query result from S3.
s3 = boto3.client('s3');
objectkey = path + "/" + qeid + ".csv"
#load object as file
file_content = s3.get_object(
Bucket=bucket,
Key=objectkey)["Body"].read()
#split file on carriage returns
lines = file_content.decode().splitlines()
#get the second line in file
count = lines[1]
#remove double quotes
count = count.replace("\"", "")
#convert string to int since cloudwatch wants numeric for value
count = int(count)
#post query results as a CloudWatch metric
cloudwatch = boto3.client('cloudwatch')
response = cloudwatch.put_metric_data(
MetricData = [
{
'MetricName': 'MyMetric',
'Dimensions': [
{
'Name': 'DIM1',
'Value': 'dim1'
},
],
'Unit': 'None',
'Value': count
},
],
Namespace = 'MyMetricNS'
)
return response
return