Boto3 - Create S3 'object created' notification to trigger a lambda function - amazon-web-services

How do I use boto3 to simulate the Add Event Source action on the AWS GUI Console in the Event Sources tab.
I want to programatically create a trigger such that if an object is created in MyBucket, it will call MyLambda function(qualified with an alias).
The relevant api call that I see in the Boto3 documentation is create_event_source_mapping but it states explicitly that it is only for AWS Pull Model while I think that S3 belongs to the Push Model. Anyways, I tried using it but it didn't work.
Scenarios:
Passing a prefix filter would be nice too.

I was looking at the wrong side. This is configured on S3
s3 = boto3.resource('s3')
bucket_name = 'mybucket'
bucket_notification = s3.BucketNotification(bucket_name)
response = bucket_notification.put(
NotificationConfiguration={'LambdaFunctionConfigurations': [
{
'LambdaFunctionArn': 'arn:aws:lambda:us-east-1:033333333:function:mylambda:staging',
'Events': [
's3:ObjectCreated:*'
],
},
]})

Related

Can I create Slack subscriptions to an AWS SNS topic?

I'm trying to create a SNS topic in AWS and subscribe a lambda function to it that will send notifications to Slack apps/users.
I did read this article -
https://aws.amazon.com/premiumsupport/knowledge-center/sns-lambda-webhooks-chime-slack-teams/
that describes how to do it using this lambda code:
#!/usr/bin/python3.6
import urllib3
import json
http = urllib3.PoolManager()
def lambda_handler(event, context):
url = "https://hooks.slack.com/services/xxxxxxx"
msg = {
"channel": "#CHANNEL_NAME",
"username": "WEBHOOK_USERNAME",
"text": event['Records'][0]['Sns']['Message'],
"icon_emoji": ""
}
encoded_msg = json.dumps(msg).encode('utf-8')
resp = http.request('POST',url, body=encoded_msg)
print({
"message": event['Records'][0]['Sns']['Message'],
"status_code": resp.status,
"response": resp.data
})
but the problem is, that in that implementation I have to create a lambda function for every user.
I want to subscribe multiple Slack apps/users to one SNS topic.
Is there a way of doing that without creating a lambda function for each one?
You really DON'T need Lambda. Just SNS and SLACK are enough.
I found a way to integrate AWS SNS with slack WITHOUT AWS Lambda or AWS chatbot. With this approach you can confirm the subscription easily.
Follow the video which show all the step clearly.
https://www.youtube.com/watch?v=CszzQcPAqNM
Steps to follow:
Create slack channel or use existing channel
Create a work flow with selecting Webhook
Create a variable name as "SubscribeURL". The name
is very important
Add the above variable in the message body of the
workflow Publish the workflow and get the url
Add the above Url as subscription of the SNS You will see the subscription URL in the
slack channel
Follow the URl and complete the subscription
Come back to the work flow and change the "SubscribeURL" variable to "Message"
The publish the
message in SNS. you will see the message in the slack channel.
Hi i would say you should go for a for loop and make a list of all the users. Either manually state them in the lambda or get them with api call from slack e.g. this one here: https://api.slack.com/methods/users.list
#!/usr/bin/python3.6
import urllib3
import json
http = urllib3.PoolManager()
def lambda_handler(event, context):
userlist = ["name1", "name2"]
for user in userlist:
url = "https://hooks.slack.com/services/xxxxxxx"
msg = {
"channel": "#" + user, # not sure if the hash has to be here
"username": "WEBHOOK_USERNAME",
"text": event['Records'][0]['Sns']['Message'],
"icon_emoji": ""
}
encoded_msg = json.dumps(msg).encode('utf-8')
resp = http.request('POST',url, body=encoded_msg)
print({
"message": event['Records'][0]['Sns']['Message'],
"status_code": resp.status,
"response": resp.data
})
Another solution you can do is set up email for the slack users, see link:
https://slack.com/help/articles/206819278-Send-emails-to-Slack
When you can just add the emails as subscribers to the sns topic. You can fileter the msg that the receiver gets with Subscription filter policy.

AWS lambda with dynamic trigger for S3 buckets

I have a S3 bucket by name "archive_A". I have created a lambda function to retrieve meta data info for any object "creation" or "permanently delete" from S3 bucket as triggers to my lambda function (python) and insert the meta data collected into DynamoDB.
For S3 bucket archive_A, I have manually added the triggers, one for "creation" and another one for "permanently delete" in my lambda function via GUI.
import boto3
from uuid import uuid4
def lambda_handler(event, context):
s3 = boto3.client("s3")
dynamodb = boto3.resource('dynamodb')
for record in event['Records']:
bucket_name = record['s3']['bucket']['name']
object_key = record['s3']['object']['key']
size = record['s3']['object'].get('size', -1)
event_name = record ['eventName']
event_time = record['eventTime']
dynamoTable = dynamodb.Table('S3metadata')
dynamoTable.put_item(
Item={'Resource_id': str(uuid4()), 'Bucket': bucket_name, 'Object': object_key,'Size': size, 'Event': event_name, 'EventTime': event_time})
In the future there could be more S3 buckets like archive_B, archive_C etc. In that case I have to keep adding triggers manually for each S3 bucket which is bit cumbersome.
Is there any dynamic way or adding triggers to lambda for S3 buckets with name "archive_*" and hence any future S3 bucket with name like "archive_G" will have a dynamically added triggers to lambda.
Please suggest. I am quite new to AWS too. Any example would be easier to follow.
There is no in-built way to automatically add triggers for new buckets.
You could probably create an Amazon EventBridge rule that triggers on CreateBucket and calls an AWS Lambda function with details of the new bucket.
That Lambda function could then programmatically add a trigger on your existing Lambda function.

Set up S3 Bucket level Events using AWS CloudFormation

I am trying to get AWS CloudFormation to create a template that will allow me to attach an event to an existing S3 Bucket that will trigger a Lambda Function whenever a new file is put into a specific directory within the bucket. I am using the following YAML as a base for the CloudFormation template but cannot get it working.
---
AWSTemplateFormatVersion: '2010-09-09'
Resources:
SETRULE:
Type: AWS::S3::Bucket
Properties:
BucketName: bucket-name
NotificationConfiguration:
LambdaConfigurations:
- Event: s3:ObjectCreated:Put
Filter:
S3Key:
Rules:
- Name: prefix
Value: directory/in/bucket
Function: arn:aws:lambda:us-east-1:XXXXXXXXXX:function:lambda-function-trigger
Input: '{ CONFIGS_INPUT }'
I have tried rewriting this template a number of different ways to no success.
Since you have mentioned that those buckets already exists, this is not going to work. You can use CloudFormation in this way but only to create a new bucket, not to modify existing bucket if that bucket was not created via that template in the first place.
If you don't want to recreate your infrastructure, it might be easier to just use some script that will subscribe lambda function to each of the buckets. As long as you have a list of buckets and the lambda function, you are ready to go.
Here is a script in Python3. Assuming that we have:
2 buckets called test-bucket-jkg2 and test-bucket-x1gf
lambda function with arn: arn:aws:lambda:us-east-1:605189564693:function:my_func
There are 2 steps to make this work. First, you need to add function policy that will allow s3 service to execute that function. Second, you will loop through the buckets one by one, subscribing lambda function to each one of them.
import boto3
s3_client = boto3.client("s3")
lambda_client = boto3.client('lambda')
buckets = ["test-bucket-jkg2", "test-bucket-x1gf"]
lambda_function_arn = "arn:aws:lambda:us-east-1:605189564693:function:my_func"
# create a function policy that will permit s3 service to
# execute this lambda function
# note that you should specify SourceAccount and SourceArn to limit who (which account/bucket) can
# execute this function - you will need to loop through the buckets to achieve
# this, at least you should specify SourceAccount
try:
response = lambda_client.add_permission(
FunctionName=lambda_function_arn,
StatementId="allow s3 to execute this function",
Action='lambda:InvokeFunction',
Principal='s3.amazonaws.com'
# SourceAccount="your account",
# SourceArn="bucket's arn"
)
print(response)
except Exception as e:
print(e)
# loop through all buckets and subscribe lambda function
# to each one of them
for bucket in buckets:
print("putting config to bucket: ", bucket)
try:
response = s3_client.put_bucket_notification_configuration(
Bucket=bucket,
NotificationConfiguration={
'LambdaFunctionConfigurations': [
{
'LambdaFunctionArn': lambda_function_arn,
'Events': [
's3:ObjectCreated:*'
]
}
]
}
)
print(response)
except Exception as e:
print(e)
You could write a custom resource to do this, in fact that's what I've ended up doing at work for the same problem. At the simplest level, define a lambda that takes a put bucket notification configuration and then just calls the put bucket notification api with the data that was passed it.
If you want to be able to control different notifications across different cloudformation templates, then it's a bit more complex. Your custom resource lambda will need to read the existing notifications from S3 and then update these based on what data was passed to it from CF.

Tagging EMR cluster via an AWS Lambda tiggered by a Cloudwatch event rule

I need to catch the event RunflowJob in my Cloudwatch EventRule in order to tag AWS EMR starting clusters.
I'm looking for this event, because i need the username and account informations
Any idea?
Thanks
Calls to the ListClusters, DescribeCluster, and RunJobFlow actions generate entries in CloudTrail log files.
Every log entry contains information about who generated the request. For example, if a request is made to create and run a new job flow (RunJobFlow), CloudTrail logs the user identity of the person or service that made the request
https://docs.aws.amazon.com/emr/latest/ManagementGuide/logging_emr_api_calls.html#understanding_emr_log_file_entries
Here is a sample snippet to get the username using Python Boto3.
import boto3
cloudtrail = boto3.client("cloudtrail")
response = cloudtrail.lookup_events (
LookupAttributes=[
{
'AttributeKey': 'EventName',
'AttributeValue': 'RunJobFlow'
}
],
)
for event in response.get ("Events"):
print(event.get ("Username"))
username and cluster details can be retrieved from the RunJobFlow event itself. Easier solution would be to use Cloudwatch event rule along with Lambda function as a target to fetch these info and subsequently further action can be taken as required. Example below:
Event Pattern to be used with Cloudwatch event rule
{
"source": ["aws.elasticmapreduce"],
"detail": {
"eventName": ["RunJobFlow"]
}
}
Lambda code snippet
def lambda_handler(event, context):
#print("Received event: " + json.dumps(event, indent=2))
user = event['detail']['userIdentity']['userName']
cluster_id = event['detail']['responseElements']['jobFlowId']
region = event['region']

How do you full text search an Amazon S3 bucket?

I have a bucket on S3 in which I have large amount of text files.
I want to search for some text within a text file. It contains raw data only.
And each text file has a different name.
For example, I have a bucket name:
abc/myfolder/abac.txt
xyx/myfolder1/axc.txt
& I want to search text like "I am human" in the above text files.
How to achieve this? Is it even possible?
The only way to do this will be via CloudSearch, which can use S3 as a source. It works using rapid retrieval to build an index. This should work very well but thoroughly check out the pricing model to make sure that this won't be too costly for you.
The alternative is as Jack said - you'd otherwise need to transfer the files out of S3 to an EC2 and build a search application there.
Since october 1st, 2015 Amazon offers another search service with Elastic Search, in more or less the same vein as cloud search you can stream data from Amazon S3 buckets.
It will work with a lambda function to make sure any new data sent to an S3 bucket triggers an event notification to this Lambda and update the ES index.
All steps are well detailed in amazon doc with Java and Javascript example.
At a high level, setting up to stream data to Amazon ES requires the following steps:
Creating an Amazon S3 bucket and an Amazon ES domain
Creating a Lambda deployment package.
Configuring a Lambda function.
Granting authorization to stream data to Amazon ES.
Although not an AWS native service, there is Mixpeek, which runs text extraction like Tika, Tesseract and ImageAI on your S3 files then places them in a Lucene index to make them searchable.
You integrate it as follows:
Download the module: https://github.com/mixpeek/mixpeek-python
Import the module and your API keys:
from mixpeek import Mixpeek, S3
from config import mixpeek_api_key, aws
Instantiate the S3 class (which uses boto3 and requests):
s3 = S3(
aws_access_key_id=aws['aws_access_key_id'],
aws_secret_access_key=aws['aws_secret_access_key'],
region_name='us-east-2',
mixpeek_api_key=mixpeek_api_key
)
Upload one or more existing S3 files:
# upload all S3 files in bucket "demo"
s3.upload_all(bucket_name="demo")
# upload one single file called "prescription.pdf" in bucket "demo"
s3.upload_one(s3_file_name="prescription.pdf", bucket_name="demo")
Now simply search using the Mixpeek module:
# mixpeek api direct
mix = Mixpeek(
api_key=mixpeek_api_key
)
# search
result = mix.search(query="Heartgard")
print(result)
Where result can be:
[
{
"_id": "REDACTED",
"api_key": "REDACTED",
"highlights": [
{
"path": "document_str",
"score": 0.8759502172470093,
"texts": [
{
"type": "text",
"value": "Vetco Prescription\nVetcoClinics.com\n\nCustomer:\n\nAddress: Canine\n\nPhone: Australian Shepherd\n\nDate of Service: 2 Years 8 Months\n\nPrescription\nExpiration Date:\n\nWeight: 41.75\n\nSex: Female\n\n℞ "
},
{
"type": "hit",
"value": "Heartgard"
},
{
"type": "text",
"value": " Plus Green 26-50 lbs (Ivermectin 135 mcg/Pyrantel 114 mg)\n\nInstructions: Give one chewable tablet by mouth once monthly for protection against heartworms, and the treatment and\ncontrol of roundworms, and hookworms. "
}
]
}
],
"metadata": {
"date_inserted": "2021-10-07 03:19:23.632000",
"filename": "prescription.pdf"
},
"score": 0.13313256204128265
}
]
Then you parse the results
You can use Filestash (Disclaimer: I'm the author), install you own instance and connect to your S3 bucket. Eventually give it a bit of time to index the entire thing if you have a whole lot of data and you should be good
If you have an EMR, then create a spark application and do a search . We did this. This will work as distributed searcn
I know this is really old, but hopefully someone find my solution handy.
This is a python script, using boto3.
def search_word (info, search_for):
res = False
if search_for in info:
res = True
elif search_for not in info:
res = False
return res
import boto3
import json
aws_access_key_id='AKIAWG....'
aws_secret_access_key ='p9yrNw.....'
client = boto3.client('s3', aws_access_key_id=aws_access_key_id, aws_secret_access_key = aws_secret_access_key)
s3 = boto3.resource('s3')
bucket_name = 'my.bucket.name'
bucket_prefix='2022/05/'
search_for = 'looking#emailaddress.com'
search_results = []
search_results_keys = []
response = client.list_objects_v2(
Bucket=bucket_name,
Prefix=bucket_prefix
)
for i in response['Contents']:
mini = {}
obj = client.get_object(
Bucket=bucket_name,
Key=i['Key']
)
body = obj['Body'].read().decode("utf-8")
key = i['Key']
if search_word(body, search_for):
mini = {}
mini[key] = body
search_results.append(mini)
search_results_keys.append(key)
# YOU CAN EITHER PRINT THE KEY (FILE NAME/DIRECTORY), OR A MAP WHERE THE KEY IS THE FILE NAME/DIRECTORY. AND THE VALUE IS THE TXT OF THE FILE
print(search_results)
print(search_results_keys)
there is serverless and cheaper option available
Use AWS Glue and you can convert the txt fils into a table
use AWS AThena and you can run sql queries on top of it.
I wouldrecommend you to put data in parquets on s3 and this makes the data size on s3 very small and super fast!