Creating a CloudWatch Metrics from the Athena Query results - amazon-web-services

My Requirement
I want to create a CloudWatch-Metric from Athena query results.
Example
I want to create a metric like user_count of each day.
In Athena, I will write an SQL query like this
select date,count(distinct user) as count from users_table group by 1
In the Athena editor I can see the result, but I want to see these results as a metric in Cloudwatch.
CloudWatch-Metric-Name ==> user_count
Dimensions ==> Date,count
If I have this cloudwatch metric and dimensions, I can easily create a Monitoring Dashboard and send send alerts
Can anyone suggest a way to do this?

You can use CloudWatch custom widgets, see "Run Amazon Athena queries" in Samples.

It's somewhat involved, but you can use a Lambda for this. In a nutshell:
Setup your query in Athena and make sure it works using the Athena console.
Create a Lambda that:
Runs your Athena query
Pulls the query results from S3
Parses the query results
Sends the query results to CloudWatch as a metric
Use EventBridge to run your Lambda on a recurring basis
Here's an example Lambda function in Python that does step #2. Note that the Lamda function will need IAM permissions to run queries in Athena, read the results from S3, and then put a metric into Cloudwatch.
import time
import boto3
query = 'select count(*) from mytable'
DATABASE = 'default'
bucket='BUCKET_NAME'
path='yourpath'
def lambda_handler(event, context):
#Run query in Athena
client = boto3.client('athena')
output = "s3://{}/{}".format(bucket,path)
# Execution
response = client.start_query_execution(
QueryString=query,
QueryExecutionContext={
'Database': DATABASE
},
ResultConfiguration={
'OutputLocation': output,
}
)
#S3 file name uses the QueryExecutionId so
#grab it here so we can pull the S3 file.
qeid = response["QueryExecutionId"]
#occasionally the Athena hasn't written the file
#before the lambda tries to pull it out of S3, so pause a few seconds
#Note: You are charged for time the lambda is running.
#A more elegant but more complicated solution would try to get the
#file first then sleep.
time.sleep(3)
###### Get query result from S3.
s3 = boto3.client('s3');
objectkey = path + "/" + qeid + ".csv"
#load object as file
file_content = s3.get_object(
Bucket=bucket,
Key=objectkey)["Body"].read()
#split file on carriage returns
lines = file_content.decode().splitlines()
#get the second line in file
count = lines[1]
#remove double quotes
count = count.replace("\"", "")
#convert string to int since cloudwatch wants numeric for value
count = int(count)
#post query results as a CloudWatch metric
cloudwatch = boto3.client('cloudwatch')
response = cloudwatch.put_metric_data(
MetricData = [
{
'MetricName': 'MyMetric',
'Dimensions': [
{
'Name': 'DIM1',
'Value': 'dim1'
},
],
'Unit': 'None',
'Value': count
},
],
Namespace = 'MyMetricNS'
)
return response
return

Related

AWS Overwirte the S3 file using lambda

I have a lambda function which queries Athena and loads the results into S3. My query is like below. Currently, when I run the lambda function twice, I will have 2 files (with random, auto generated names) under an object called 'current_date.csv'. But what I want is the second run can overwrite the first file. Is it possible to overwrite 's3://abc/def/', or what I can do to specify the key (csv file) name? Or is there a way to set the s3 rule saying one object can only has 1 file?
def lambda_handler(event, context):
client = boto3.client('athena')
# Execution
response = client.start_query_execution(
QueryString=query,
QueryExecutionContext={
'Database': DATABASE
},
ResultConfiguration={
'OutputLocation': 's3://abc/def/current_date.csv',
}
)
return response
return

How do I use AWS Lambda to trigger Comprehend with S3?

I'm currently using aws lambda to trigger an amazon comprehend job, but the code is only used to run one piece of text under sentiment analysis.
import boto3
def lambda_handler(event, context):
s3 = boto3.client("s3")
bucket = "bucketName"
key = "textName.txt"
file = s3.get_object(Bucket = bucket, Key = key)
analysisdata = str(file['Body'].read())
comprehend = boto3.client("comprehend")
sentiment = comprehend.detect_sentiment(Text = analysisdata, LanguageCode = "en")
print(sentiment)
return 'Sentiment detected'
I want to run a file where each line in the text file is a new piece of text to analyze with sentiment analysis (it's an option if you manually enter stuff into comprehend), but is there a way to alter this code to do that? And have the output sentiment analysis file be placed into that same S3 bucket? Thank you in advance.
It looks like you can use start_sentiment_detection_job():
response = client.start_sentiment_detection_job(
InputDataConfig={
'S3Uri': 'string',
'InputFormat': 'ONE_DOC_PER_FILE'|'ONE_DOC_PER_LINE',
'DocumentReaderConfig': {
'DocumentReadAction': 'TEXTRACT_DETECT_DOCUMENT_TEXT'|'TEXTRACT_ANALYZE_DOCUMENT',
'DocumentReadMode': 'SERVICE_DEFAULT'|'FORCE_DOCUMENT_READ_ACTION',
'FeatureTypes': [
'TABLES'|'FORMS',
]
}
},
OutputDataConfig={
'S3Uri': 'string',
'KmsKeyId': 'string'
},
...
)
It can read from an object in Amazon S3 (S3Uri) and store the output in an S3 object.
It looks like you could use 'InputFormat': 'ONE_DOC_PER_LINE' to meet your requirements.

Export DynamoDb metrics logs to S3 or CloudWatch

I'm trying to use DynamoDB metrics logs in an external observability tool.
To do that, I'll need to get these log data from S3 or CloudWatch log groups (not from Insights or CloudTrail).
For this reason, if there isn't a way to use CloudWatch, I'll need to export these metric logs from DynamoDb to S3, and from there export to CloudWatch or try to get those data directly from S3.
Do you know this is possible?
You could try using Logstash, it has a plugin for Cloudwatch and S3:
https://www.elastic.co/guide/en/logstash/current/plugins-inputs-cloudwatch.html
https://www.elastic.co/guide/en/logstash/current/plugins-inputs-s3.html
AWS puts DynamoDB metrics (table operation, table, and account) over CloudWatch metrics. Also, you can create as many metrics as you need. If you use Python, you can read it with boto3. The CloudWatch client has this method:
get_metric_data
Try this with your metrics:
cloudwatch_client = boto3.client('cloudwatch')
yesterday = date.today() - timedelta(days=1)
today = date.today()
response = cloudwatch_client.get_metric_data(
MetricDataQueries=[
{
'Id': 'some_request',
'MetricStat': {
'Metric': {
'Namespace': 'DynamoDB',
'MetricName': 'metric_name',
'Dimensions': []
},
'Period': 3600,
'Stat': 'Sum',
}
},
],
StartTime=datetime(yesterday.year, yesterday.month, yesterday.day),
EndTime=datetime(today.year, today.month, today.day),
)
print(response)

How to see progress when using Glue to export DynamoDB table

I'm trying to export every item in a DynamoDB table to S3. I found this tutorial https://aws.amazon.com/blogs/big-data/how-to-export-an-amazon-dynamodb-table-to-amazon-s3-using-aws-step-functions-and-aws-glue/ and followed the example. Basically,
table = glueContext.create_dynamic_frame.from_options(
"dynamodb",
connection_options={
"dynamodb.input.tableName": table_name,
"dynamodb.throughput.read.percent": read_percentage,
"dynamodb.splits": splits
}
)
glueContext.write_dynamic_frame.from_options(
frame=table,
connection_type="s3",
connection_options={
"path": output_path
},
format=output_format,
transformation_ctx="datasink"
)
I tested it in a tiny table in nonprod environment and it works fine. But my Dynamo table in production is over 400GB, 200 mil items. I suppose it'll take a while, but I have no idea how long to expect. Hours, or even days? Are there any way to show progress? For example, showing a count of how many items have been processed. I don't want to blindly start this job and wait.
One way would be to enable continuous logging for your AWS Glue Job to monitor its progress.
Another way would be to trigger a Lambda function whenever a file has been stored in S3, using Amazon S3 event notifications.
Did you try the custom waiter class within was docs?
For instance custom waiter for a Glue Job should look something like this:
class JobCompleteWaiter(CustomWaiter):
def __init__(self, client):
super().__init__(
"JobComplete",
"get_job_run",
"JobRun.JobRunState",
{"SUCCEEDED": WaitState.SUCCEEDED, "FAILED": WaitState.FAILED},
client,
max_tries=100,
)
def wait(self, JobName, RunId):
self._wait(JobName=JobName, RunId=RunId)
According to boto3 docs, you should expect a set of 6 different possible states from a JOB: STARTING'|'RUNNING'|'STOPPING'|'STOPPED'|'SUCCEEDED'|'FAILED'|'TIMEOUT'
So I chost checkein whether was SUCCEEDED or FAILED.

Lambda function not triggered by dynamodb update: Key Error

I am attempting to load a simple transactions.txt table into a S3 bucket where a Lambda function reads the file and populates DynamoDB tables for Customers and Transactions. This all works fine. However, I also have a Lambda function that is supposed to read the Transactions table as they populate the table and sum up the transaction totals by customer and insert them into another DynamoDB table--TransactionTotal.
My TotalNotifier Lambda function throws a "KeyError" regarding a "New Image". I believe the code is fine, and I have tried changing the type of Streams from 'New and Old' to just 'New' for the Transactions table and still encounter same error.
from __future__ import print_function
import json, boto3
# Connect to SNS
sns = boto3.client('sns')
alertTopic = 'HighBalanceAlert'
snsTopicArn = [t['TopicArn'] for t in sns.list_topics()['Topics'] if t['TopicArn'].endswith(':' + alertTopic)][0]
# Connect to DynamoDB
dynamodb = boto3.resource('dynamodb')
transactionTotalTableName = 'TransactionTotal'
transactionsTotalTable = dynamodb.Table(transactionTotalTableName);
# This handler is executed every time the Lambda function is triggered
def lambda_handler(event, context):
# Show the incoming event in the debug log
print("Event received by Lambda function: " + json.dumps(event, indent=2))
# For each transaction added, calculate the new Transactions Total
for record in event['Records']:
customerId = record['dynamodb']['NewImage']['CustomerId']['S']
transactionAmount = int(record['dynamodb']['NewImage']['TransactionAmount']['N'])
# Update the customer's total in the TransactionTotal DynamoDB table
response = transactionsTotalTable.update_item(
Key={
'CustomerId': customerId
},
UpdateExpression="add accountBalance :val",
ExpressionAttributeValues={
':val': transactionAmount
},
ReturnValues="UPDATED_NEW"
)
Here is a sample error from the CloudWatch log:
'NewImage': KeyError
Traceback (most recent call last):
File "/var/task/lambda_function.py", line 30, in lambda_handler
customerId = record['dynamodb']['NewImage']['CustomerId']['S']
KeyError: 'NewImage'
To elaborate on Oluwafemi's comment, you're likely experiencing this error when receiving a REMOVE event. Regardless of whether your stream is new and old images, or just new, you won't receive a NEW_IMAGE on a REMOVE event, since there is no new image. Check out the example events on aws docs.
A check on the value of record['eventName'] should solve the issue.