How to get a glue crawler event state? - amazon-web-services

I am following this doc https://aws.amazon.com/premiumsupport/knowledge-center/start-glue-job-run-end/ to setup an auto-trigger on lambda when crawler finishes. The event pattern I set on cloudwatch is:
{
"detail": {
"crawlerName": [
"reddit_movie"
],
"state": [
"Succeeded"
]
},
"detail-type": [
"Glue Crawler State Change"
],
"source": [
"aws.glue"
]
}
And I add a lambda function as target for this rule in cloudwatch.
I manually trigger the crawler but it doesn't trigger the lambda after it finished. From the crawler log I can see:
04:36:28
[6c8450a5-970a-4190-bd2b-829a82d67fdf] INFO : Table redditmovies_bb008c32d0d970f0465f47490123f749 in database video has been updated with new schema

04:36:30
[6c8450a5-970a-4190-bd2b-829a82d67fdf] BENCHMARK : Finished writing to Catalog

04:37:37
[6c8450a5-970a-4190-bd2b-829a82d67fdf] BENCHMARK : Crawler has finished running and is in state READY
Does above log mean crawler finished successfully? How do I know why the lambda function is not triggered from crawler?
And how I can debug this issue? which log should i look at?

Following works -
Cloudwatch Event Rule -
{
"source": [
"aws.glue"
],
"detail-type": [
"Glue Crawler State Change"
],
"detail": {
"state": [
"Succeeded"
]
}
}
Sample lambda -
def lambda_handler(event, context):
try:
if event and 'detail' in event and event['detail'] and 'crawlerName' in event['detail']:
crawler_name = event['detail']['crawlerName']
print('Received event from crawlerName - {0}'.format(crawler_name))
crawler = glue.get_crawler(Name=crawler_name)
print('Received crawler from glue - {0}'.format(str(crawler)))
database = crawler['Crawler']['DatabaseName']
except Exception as e:
print('Error handling events from crawler. Details - {0}'.format(e))
raise e
Here is screenshot -

At first, I follow the link https://aws.amazon.com/premiumsupport/knowledge-center/start-glue-job-run-end/ and it doesn't work. I found it is due to the python script lambda in the link is not correct if you paste it directly. Please have a check of your lambda.
The python lambda copied from link
import boto3
client = boto3.client('glue')
def lambda_handler(event, context):
response = client.start_job_run(JobName = 'MyTestJob')
We need to fix it as below:
import boto3
client = boto3.client('glue')
def lambda_handler(event, context):
response = client.start_job_run(JobName = 'MyTestJob')

Related

AWS Chatbot and EventBridge for Glue Job State Changes Error - Event received is not supported

I am trying to set up the AWS Chatbot with Slack integrations to display error messages for changes in states (errors) for AWS Glue. I have set up AWS EventBridge event pattern to catch Glue Job State Changes as follows:
{
"source": ["aws.glue"],
"detail-type": ["Glue Job State Change"],
"detail": {
"state": [
"FAILED"
]
}
}
This successfully catches all failed Glue Jobs and I have set up an AWS SNS topic as the target using the input transformer.
Input Transformer Input Path
{"jobname":"$.detail.jobName","jobrunid":"$.detail.jobRunId","jobstate":"$.detail.state"}
Input Transformer Input Template
"{\"detail-type\": \"Glue Job <job-name> has entered the state <job-state> with the message <message>.\"}"
AWS SNS has a subscriptions endpoint to the AWS Chatbot which fails to send the notification to Slack.
AWS Chatbot CloudWatch logs after an event using Input Transformer
Event received is not supported (see https://docs.aws.amazon.com/chatbot/latest/adminguide/related-services.html ):
{
"subscribeUrl": null,
"type": "Notification",
"signatureVersion": "1",
"signature": <signature>,
"topicArn": <topic-arn>,
"signingCertUrl": <signing-cert-url>,
"messageId": <message-id>,
"message": "{\"detail-type\": \"Glue Job MyJob has entered the state FAILED with the message SystemExit: None.\"}",
"subject": null,
"unsubscribeUrl": <unsubscribe-url>,
"timestamp": "2022-03-02T12:17:16.879Z",
"token": null
}
When the input is set to 'Matched Events' in the AWS EventBridge Select Target, the Slack Notification will send however it lacks any details.
Slack Notification
Glue Job State Change | eu-west-1 | Account: <account>
Glue Job State Change
AWS EventBridge Matched Events JSON Output
{
"Type" : "Notification",
"MessageId" : <message-id>,
"TopicArn" : <topic-arn>,
"Message" : "{\"detail-type\": [\"Glue Job State Change\"]}",
"Timestamp" : "2022-03-02T11:17:52.443Z",
"SignatureVersion" : "1",
"Signature" : <signature>,
"SigningCertURL" : <signing-cert-url>,
"UnsubscribeURL" : <unsubscribe-url>
}
There are very little differences between the two JSON outputs however the input transformer is considered an unsupported event. Is it possible to generate a custom message when using the AWS Chatbot for errors?
The best solution was to create a Lambda function as the target of the AWS EventBridge which performs a POST to a Slack Webhook.
# Import modules
import logging
import json
import urllib3
# Set up logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)
# Define Lambda function
def lambda_handler(event, context):
http = urllib3.PoolManager()
url = <url>
link = <glue-studio-monitoring-link>
message = f"A Glue Job {event['detail']['jobName']} with Job Run ID {event['detail']['jobRunId']} has entered the state {event['detail']['state']} with error message: {event['detail']['message']}. Visit the link for job monitoring {link}"
logger.info(message)
headers = {"Content-type": "application/json"}
data = {'text': message}
response = http.request('POST',
url,
body = json.dumps(data),
headers = headers,
retries = False)
logger.info(response.status)

Is it possible to extract "instanceId" from EventBridge event data, and use it as Target Value?

I was able to setup AutoScaling events as rules in EventBridge to trigger SSM Commands, but I've noticed that with my chosen Target Value the event is passed to all my active EC2 Instances. My Target key is a tag shared by those instances, so my mistake makes sense now.
I'm pretty new to EventBridge, so I was wondering if there's a way to actually target the instance that triggered the AutoScaling event (as in extracting the "InstanceId" that's present in the event data and use that as my new Target Value). I saw the Input Transformer, but I think that just transforms the event data to pass to the target.
Thanks!
EDIT - help with js code for Lambda + SSM RunCommand
I realize I can achieve this by setting EventBridge to invoke a Lambda function instead of the SSM RunCommand directly. Can anyone help with the javaScript code to call a shell command on the ec2 instance specified in the event data (event.detail.EC2InstanceId)? I can't seem to find a relevant and up-to-date base template online, and I'm not familiar enough with js or Lambda. Any help is greatly appreciated! Thanks
Sample of Event data, as per aws docs
{
"version": "0",
"id": "12345678-1234-1234-1234-123456789012",
"detail-type": "EC2 Instance Launch Successful",
"source": "aws.autoscaling",
"account": "123456789012",
"time": "yyyy-mm-ddThh:mm:ssZ",
"region": "us-west-2",
"resources": [
"auto-scaling-group-arn",
"instance-arn"
],
"detail": {
"StatusCode": "InProgress",
"Description": "Launching a new EC2 instance: i-12345678",
"AutoScalingGroupName": "my-auto-scaling-group",
"ActivityId": "87654321-4321-4321-4321-210987654321",
"Details": {
"Availability Zone": "us-west-2b",
"Subnet ID": "subnet-12345678"
},
"RequestId": "12345678-1234-1234-1234-123456789012",
"StatusMessage": "",
"EndTime": "yyyy-mm-ddThh:mm:ssZ",
"EC2InstanceId": "i-1234567890abcdef0",
"StartTime": "yyyy-mm-ddThh:mm:ssZ",
"Cause": "description-text"
}
}
Edit 2 - my Lambda code so far
'use strict'
const ssm = new (require('aws-sdk/clients/ssm'))()
exports.handler = async (event) => {
const instanceId = event.detail.EC2InstanceId
var params = {
DocumentName: "AWS-RunShellScript",
InstanceIds: [ instanceId ],
TimeoutSeconds: 30,
Parameters: {
commands: ["/path/to/my/ec2/script.sh"],
workingDirectory: [],
executionTimeout: ["15"]
}
};
const data = await ssm.sendCommand(params).promise()
const response = {
statusCode: 200,
body: "Run Command success",
};
return response;
}
Yes, but through Lambda
EventBridge -> Lambda (using SSM api) -> EC2
Thank you #Sándor Bakos for helping me out!! My JavaScript ended up not working for some reason, so I ended up just using part of the python code linked in the comments.
1. add ssm:SendCommand permission:
After I let Lambda create a basic role during function creation, I added an inline policy to allow Systems Manager's SendCommand. This needs access to your documents/*, instances/* and managed-instances/*
2. code - python 3.9
import boto3
import botocore
import time
def lambda_handler(event=None, context=None):
try:
client = boto3.client('ssm')
instance_id = event['detail']['EC2InstanceId']
command = '/path/to/my/script.sh'
client.send_command(
InstanceIds = [ instance_id ],
DocumentName = 'AWS-RunShellScript',
Parameters = {
'commands': [ command ],
'executionTimeout': [ '60' ]
}
)
You can do this without using lambda, as I just did, by using eventbridge's input transformers.
I specified a new automation document that called the document I was trying to use (AWS-ApplyAnsiblePlaybooks).
My document called out the InstanceId as a parameter and is passed this by the input transformer from EventBridge. I had to pass the event into lambda just to see how to parse the JSON event object to get the desired instance ID - this ended up being
$.detail.EC2InstanceID
(it was coming from an autoscaling group).
I then passed it into a template that was used for the runbook
{"InstanceId":[<instance>]}
This template was read in my runbook as a parameter.
This was the SSM playbook inputs I used to run the AWS-ApplyAnsiblePlaybook Document, I just mapped each parameter to the specified parameters in the nested playbook:
"inputs": {
"InstanceIds": ["{{ InstanceId }}"],
"DocumentName": "AWS-ApplyAnsiblePlaybooks",
"Parameters": {
"SourceType": "S3",
"SourceInfo": {"path": "https://testansiblebucketab.s3.amazonaws.com/"},
"InstallDependencies": "True",
"PlaybookFile": "ansible-test.yml",
"ExtraVariables": "SSM=True",
"Check": "False",
"Verbose": "-v",
"TimeoutSeconds": "3600"
}
See the document below for reference. They used a document that was already set up to receive the variable
https://docs.aws.amazon.com/systems-manager/latest/userguide/automation-tutorial-eventbridge-input-transformers.html
This is the full automation playbook I used, most of the parameters are defaults from the nested playbook:
{
"description": "Runs Ansible Playbook on Launch Success Instances",
"schemaVersion": "0.3",
"assumeRole": "<Place your automation role ARN here>",
"parameters": {
"InstanceId": {
"type": "String",
"description": "(Required) The ID of the Amazon EC2 instance."
}
},
"mainSteps": [
{
"name": "RunAnsiblePlaybook",
"action": "aws:runCommand",
"inputs": {
"InstanceIds": ["{{ InstanceId }}"],
"DocumentName": "AWS-ApplyAnsiblePlaybooks",
"Parameters": {
"SourceType": "S3",
"SourceInfo": {"path": "https://testansiblebucketab.s3.amazonaws.com/"},
"InstallDependencies": "True",
"PlaybookFile": "ansible-test.yml",
"ExtraVariables": "SSM=True",
"Check": "False",
"Verbose": "-v",
"TimeoutSeconds": "3600"
}
}
}
]
}

Lamda to process SQS messages not triggered

I have a lambda function that's supposed to process my SQS messages. However, it doesn't seem to be triggered automatically even though I have my SQS as its event trigger. Below is my code.
import json
import boto3
def lambda_handler(event, context):
max=2 # Number of messages to process # a time
if "max" in event:
max=int(event["max"])
else:
max=1
sqs = boto3.resource('sqs') # Get access to resource
queue = sqs.get_queue_by_name(QueueName='mysqs.fifo') # Get queue by name
count=0
# Process messages
for message in queue.receive_messages(MaxNumberOfMessages=max):
body = json.loads(message.body) # Attribute list
print("test")
payload = message[body]
print(str(payload))
count+=1
message.delete()
return {
'statusCode': 200,
'body': str(count)
}
When an AWS Lambda function is configured with Amazon SQS as a trigger, the messages are passed in via the event variable. The code inside the function should not call Amazon SQS directly.
Here is an example of an event being passed to the function:
{
"Records": [
{
"messageId": "11d6ee51-4cc7-4302-9e22-7cd8afdaadf5",
"receiptHandle": "AQEBBX8nesZEXmkhsmZeyIE8iQAMig7qw...",
"body": "Test message.",
"attributes": {
"ApproximateReceiveCount": "1",
"SentTimestamp": "1573251510774",
"SequenceNumber": "18849496460467696128",
"MessageGroupId": "1",
"SenderId": "AIDAIO23YVJENQZJOL4VO",
"MessageDeduplicationId": "1",
"ApproximateFirstReceiveTimestamp": "1573251510774"
},
"messageAttributes": {},
"md5OfBody": "e4e68fb7bd0e697a0ae8f1bb342846b3",
"eventSource": "aws:sqs",
"eventSourceARN": "arn:aws:sqs:us-east-2:123456789012:fifo.fifo",
"awsRegion": "us-east-2"
}
]
}
Your code could then access the messages with:
for record in event['Records']:
payload = record['body']
print(payload)
count += 1
There is no need to delete the message -- it will be automatically deleted once the Lambda function successfully executes.
See: Using AWS Lambda with Amazon SQS - AWS Lambda

How to check Spark step status programmatically (submitted on EMR cluster)?

I created a simple step function as follows :
Start -> Start EMR cluster & submit job -> End
I want to find out a mechanism to identify whether my spark step completed successfully or not?
I am able to start EMR cluster and attach a spark job to it, which successfully completes and terminates the cluster.
Followed steps in this link :
Creating AWS EMR cluster with spark step using lambda function fails with "Local file does not exist"
Now, I am looking to get the status, th ejob poller will get me information whether the EMR cluster created successfully or not.
I am looking at ways how I can find out Spark job status
from botocore.vendored import requests
import boto3
import json
def lambda_handler(event, context):
conn = boto3.client("emr")
cluster_id = conn.run_job_flow(
Name='xyz',
ServiceRole='xyz',
JobFlowRole='asd',
VisibleToAllUsers=True,
LogUri='<location>',
ReleaseLabel='emr-5.16.0',
Instances={
'Ec2SubnetId': 'xyz',
'InstanceGroups': [
{
'Name': 'Master',
'Market': 'ON_DEMAND',
'InstanceRole': 'MASTER',
'InstanceType': 'm4.xlarge',
'InstanceCount': 1,
}
],
'KeepJobFlowAliveWhenNoSteps': False,
'TerminationProtected': False,
},
Applications=[
{
'Name': 'Spark'
},
{
'Name': 'Hadoop'
}
],
Steps=[{ 'Name': "mystep",
'ActionOnFailure': 'TERMINATE_CLUSTER',
'HadoopJarStep': {
'Jar': 'jar',
'Args' : [
<insert args> , jar, mainclass
]
}
}]
)
return cluster_id
You can use cli or sdk to list all steps for the cluster and then describe particular step to get its status.

Get emails whenever a file is uploaded on s3 bucket using serverless

i want to get emails whenever a file is uploaded to s3 bucket as described in the title above, i am using serverless, the issue is that the event that i have created on s3 gives me just notification on s3-aws console, and don't know how to configure cloudwatch event on S3 to trigger lambda. So please if someone knows how to triggered events on S3 using cloudwatch i am all ears.
Here is my code:
import json
import boto3
import botocore
import logging
import sys
import os
import traceback
from botocore.exceptions import ClientError
from pprint import pprint
from time import strftime, gmtime
email_from = '*****#******.com'
email_to = '******#******.com'
#email_cc = '********#gmail.com'
email_subject = 'new event on s3 '
email_body = 'a new file is uploaded'
#setup simple logging for INFO
logger = logging.getLogger()
logger.setLevel(logging.INFO)
from botocore.exceptions import ClientError
def sthree(event, context):
"""Send email whenever a file is uploaded to S3"""
body = {}
status_code = 200
try:
s3 = boto3.client('s3')
ses = boto3.client('ses')
response = ses.send_email(Source = email_from,
Destination = {'ToAddresses': [email_to,],},
Message = {'Subject': {'Data': email_subject}, 'Body':{'Text' : {'Data': email_body}}}
)
response = {
"statusCode": 200,
"body": json.dumps(body)
}
return response
and here is my serverless.yml file
service: aws-python # NOTE: update this with your service name
plugins:
- serverless-external-s3-event
provider: name: aws
runtime: python2.7
stage: dev
region: us-east-1
iamRoleStatements:
- Effect: "Allow"
Action:
- s3:*
- "ses:SendEmail"
- "ses:SendRawEmail"
- "s3:PutBucketNotification"
Resource: "*"
functions: sthree:
handler: handler.sthree
description: send mail whenever a file is uploaded on S3
events:
- s3:
bucket: cartegie-nirmine
event: s3:ObjectCreated:*
rules:
- prefix: uploads/
- suffix: .jpg
- cloudwatchEvent:
description: 'CloudWatch Event triggered '
event:
source:
- "aws.S3"
detail-type:
- "S3 event Notification"
enabled : true
If your motto is just to receive email notification of operations on a S3 bucket, then you dont need lambda functions for that. For the use-case mentioned in the question, you can achieve that using SNS topic and S3 events. I will mention the steps to follow from console(through the same can be achieved via sdk or cli).
1) Create a Topic using SNS console.
2) Subscribe to the topic. Use email as the communications protocol and provide your email-id.
3) You will get email requesting you to confirm your subscription to the topic. Confirm the subscription.
4) IMPORTANT: Replace the access policy of the topic with the below policy:
{
"Version": "2008-10-17",
"Id": "__default_policy_ID",
"Statement": [
{
"Sid": "__default_statement_ID",
"Effect": "Allow",
"Principal": {
"AWS": "*"
},
"Action": "SNS:Publish",
"Resource": "sns-topic-arn",
"Condition": {
"ArnLike": {
"aws:SourceArn": "arn:aws:s3:*:*:s3-bucket-name"
}
}
}
]
}
Basically you are giving permission for your s3-bucket to publish to the SNS topic.
Replace sns-topic-arn with your ARN of the topic you created above.
Replce s3-bucket-name with your bucket name, for which you want to receive notifications.
5) Go to S3 Console. Click on your S3 bucket and open the Properties tab.
6) Under Advanced settings, Click on Events Card.
7) Click Add Notifications and enter values. A sample has been shown below.
Select the required s3-events to monitor and the SNS topic you created.
8) Click Save. Now you should start receiving notifications to your email.