How to download new uploaded files from s3 to ec2 everytime - amazon-web-services

I have an s3 bucket which will receive new files throughout the day. I want to download these to my ec2 instance everytime a new file is uploaded to the bucket.
I have read that its possible using sqs or sns or lambda. Which is the easiest of them all? I need the file to be downloaded as early as possible once it is uploaded into the bucket.
EDIT
I basically will be getting png images in the bucket every few seconds or minutes. Everytime a new image is uploaded, I want to download that on the instance which is already running. I will do some AI processing. As the images will keeep coming into the bucket, I want to constantly keep downloading it in the ec2 and process it as soon as possible.
This is my code in the Lambda function so far.
import boto3
import json
def lambda_handler(event, context):
"""Read file from s3 on trigger."""
#print(event)
s3 = boto3.client("s3")
client = boto3.client("ec2")
ssm = boto3.client("ssm")
instanceid = "******"
if event:
file_obj = event["Records"][0]
#print(file_obj)
bucketname = str(file_obj["s3"]["bucket"]["name"])
print(bucketname)
filename = str(file_obj["s3"]["object"]["key"])
print(filename)
response = ssm.send_command(
InstanceIds=[instanceid],
DocumentName="AWS-RunShellScript",
Parameters={
"commands": [f"aws s3 cp {filename} ."]
}, # replace command_to_be_executed with command
)
# fetching command id for the output
command_id = response["Command"]["CommandId"]
time.sleep(3)
# fetching command output
output = ssm.get_command_invocation(CommandId=command_id, InstanceId=instanceid)
print(output)
return
However I am getting the following error
Test Event Name
test
Response
{
"errorMessage": "2021-12-01T14:11:30.781Z 88dbe51b-53d6-4c06-8c16-207698b3a936 Task timed out after 3.00 seconds"
}
Function Logs
START RequestId: 88dbe51b-53d6-4c06-8c16-207698b3a936 Version: $LATEST
END RequestId: 88dbe51b-53d6-4c06-8c16-207698b3a936
REPORT RequestId: 88dbe51b-53d6-4c06-8c16-207698b3a936 Duration: 3003.58 ms Billed Duration: 3000 ms Memory Size: 128 MB Max Memory Used: 87 MB Init Duration: 314.81 ms
2021-12-01T14:11:30.781Z 88dbe51b-53d6-4c06-8c16-207698b3a936 Task timed out after 3.00 seconds
Request ID
88dbe51b-53d6-4c06-8c16-207698b3a936
When I remove all the lines related to ssm, it works fine. Is there any permission issue or is there any problem with the code?
EDIT2
My code is working but I dont see any output or change in my ec2 instance. I should be seeing an empty text file in the home directory but I dont see anything
Code
import boto3
import json
import time
import logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)
def lambda_handler(event, context):
"""Read file from s3 on trigger."""
#print(event)
s3 = boto3.client("s3")
client = boto3.client("ec2")
ssm = boto3.client("ssm")
instanceid = "******"
print("HI")
if event:
file_obj = event["Records"][0]
#print(file_obj)
bucketname = str(file_obj["s3"]["bucket"]["name"])
print(bucketname)
filename = str(file_obj["s3"]["object"]["key"])
print(filename)
print("sending")
try:
response = ssm.send_command(
InstanceIds=[instanceid],
DocumentName="AWS-RunShellScript",
Parameters={
"commands": ["touch hi.txt"]
}, # replace command_to_be_executed with command
)
# fetching command id for the output
command_id = response["Command"]["CommandId"]
time.sleep(3)
# fetching command output
output = ssm.get_command_invocation(CommandId=command_id, InstanceId=instanceid)
print(output)
except Exception as e:
logger.error(e)
raise e

There are several ways. One would be to setup s3 notifications to invoke a lambda function. Then lambda function would use SSM Run Command to execute AWS CLI S3 command on your instance to download the file from S3.

I don't know why there is any recommendation of Lambda here. What you need is simple: S3 object created event notification -> SQS and some job on your EC2 instance watching a long polling queue.
Here is an example of such a python script. You need to sort out how the object key is encoded in the event, but it will be there. I haven't tested this, but it should be pretty close.
import boto3
def main() -> None:
s3 = boto3.client("s3")
sqs = boto3.client("sqs")
while True:
res = sqs.receive_message(
QueueUrl="yourQueue",
WaitTimeSeconds=20,
)
for msg in res.get("Messages", []):
s3.download_file("yourBucket", msg["key"], "local/file/path")
if __name__ == "__main__":
main()

You can use S3 Event Notifications, which react to a new file coming into the s3 bucket.
The destinations supported by s3 event are SNS, SQS or AWS lambda.
You can directly use the lambda as destination as described by #Marcin
You can use SQS has queue with a lambda behind pulling from the queue. It allows you to have some capability like dead letter queue. You can then pull messages from the queue using different methods:
AWS CLI
AWS SDK
You can use SNS with different things behind (you can have many of these desinations in a row which symbolise the fan-out pattern:
a SQS queue to manage the files
an email to notify
a lambda function
...
You can find more explication in ths article: https://aws.plainenglish.io/system-design-s3-events-to-lambda-vs-s3-events-to-sqs-sns-to-lambda-2d41477d1cc9

Related

Save file into EC2 directly through Lambda

I've made a Lambda function that stores a binary file into S3 and it works fine.
Instead, now I would like to save this file directly into my EC2 instance storage volume .
I searched a lot but I didn't understand if it's possible yet. Do you know?
I've already made an SSH connection (inside the Lambda..) to run SSH commands but I don't how to use in my case and if is the right way to save my data ...Do you have some idea?
I know that there is possibility to connect S3 to EC2 but first I would like to understand the possibility above..
Thanks
I made a solution ( Python ):
Using Boto3 and Paramiko package I build an SSH client to EC2, so I move my file to S3 by AWSCLI.
If useful for anyone I post part of code below:
import json
import boto3
import paramiko
def lambda_handler(event, context):
#My Parameters
myBucket = "lorem"
myPemKeyFile="lorem.pem"
myEc2Username="lorem"
ec2_client = boto3.client('ec2')
s3_client = boto3.client("s3")
OutFileName= "lorem.txt"
# PREPARING FOR SSH CLIENT
try:
# GETTING ISTANCE INFORMATION
describeInstance = ec2_client.describe_instances()
hostPublicIP=[]
# fetchin public IP address of the running instances
for i in describeInstance['Reservations']:
for instance in i['Instances']:
if instance["State"]["Name"] == "running":
hostPublicIP.append(instance['PublicIpAddress'])
#print(hostPublicIP)
# DOWNLOADING PEM FILE FROM S3
s3_client.download_file(myBucket,myPemKeyFile, '/tmp/file.pem')
# reading pem file and creating key object
key = paramiko.RSAKey.from_private_key_file("/tmp/file.pem")
# CREATING Paramiko.SSHClient
ssh_client = paramiko.SSHClient()
# setting policy to connect to unknown host
ssh_client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
host=hostPublicIP[0]
#print("Connecting to : " + host)
# connecting to server
ssh_client.connect(hostname=host, username=myEc2Username, pkey=key)
#print("Connected to :" + host)
except:
raise Exception('OPS, there whas a crash preparing for SSH client!! 500 ')
# MOVING FILE INTO S3
commands = [
"aws s3 mv ~/directoryFrom/"+OutFileName+" s3://"+myBucket+"/"+OutFileName
]
try:
for command in commands:
stdin , stdout, stderr = ssh_client.exec_command(command)
SSHout=stdout.read()
except:
raise Exception('OPS, somethig happends to SSH client. Move file to S3 didn\'t run 500')

AWS Lambda Function Handler Error Says There Is Not Enough Values To Unpack

I have a Lambda function (created in Boto3) which is triggered by an SQS message. The Lambda function is meant to take the objects uploaded to S3 and process them with AWS Transcribe. The Lambda function is being triggered but I'm receiving the following error:
{
"errorMessage": "Bad handler 'lambda_handler': not enough values to unpack (expected 2, got 1)",
"errorType": "Runtime.MalformedHandlerName"
}
Function Logs
START RequestId: e6080a7f-b5b7-4995-a469-351c144bb93e Version: $LATEST
[ERROR] Runtime.MalformedHandlerName: Bad handler 'lambda_handler': not enough values to unpack (expected 2, got 1)
END RequestId: e6080a7f-b5b7-4995-a469-351c144bb93e
REPORT RequestId: e6080a7f-b5b7-4995-a469-351c144bb93e Duration: 1.64 ms Billed Duration: 2 ms Memory Size: 500 MB Max Memory Used: 50 MB
Request ID
e6080a7f-b5b7-4995-a469-351c144bb93e
This is where I create my Lambda function in Boto3:
response = l.create_function(
FunctionName = lambda_name,
Runtime = 'python3.7',
Role = lambda_role,
Handler = 'lambda_handler',
Code = {
'ZipFile': open('./transcribe.zip', 'rb').read()
},
Description = 'Function to parse content from SQS message and pass content to Transcribe.',
Timeout = 123,
MemorySize = 500,
Publish = True,
PackageType = 'Zip',
)
And this is what the Lambda function looks like in AWS console:
from __future__ import print_function
import time
import boto3
def lambda_handler(event, context):
transcribe = boto3.client('transcribe')
job_name = "testJob"
job_uri = "https://my-bucket1729788.s3.eu-west-2.amazonaws.com/Audio3.wav"
transcribe.start_transcription_job(
TranscriptionJobName=job_name,
Media={'MediaFileUri': job_uri},
MediaFormat='wav',
LanguageCode='en-US'
)
while True:
status = transcribe.get_transcription_job(TranscriptionJobName=job_name)
if status['TranscriptionJob']['TranscriptionJobStatus'] in ['COMPLETED', 'FAILED']:
break
print("Not ready yet...")
time.sleep(5)
print(status)
I'm really not sure where I'm going wrong as I don't find the documentation particularly helpful, so any help is appreciated. Thanks.
My function wasn't receiving either the context or the event when I tried running:
def lambda_handler(event, context):
print("Event: {}".format(event))
print("Context: {}".format(context))
I needed to change the 'Handler' setting on runtime settings from 'lambda_handler' to 'transcribe.lambda_handler', as transcribe.py was the name of the file.
There is no function named lambda_handler in your Lambda function's code. Please look at the documentation for Python Lambda functions.
I believe the issue is that your handler path is not setup correctly in the create function call. It should be something like, <function_name>.<handler>, note the . in the path. What you have in the post is just, lambda_handler, no ..

How to troubleshoot and solve lambda function issue?

import sys
import botocore
import boto3
from botocore.exceptions import ClientError
def lambda_handler(event, context):
# TODO implement
rds = boto3.client('rds')
lambdaFunc = boto3.client('lambda')
print 'Trying to get Environment variable'
try:
funcResponse = lambdaFunc.get_function_configuration(
FunctionName='RDSInstanceStart'
)
#print (funcResponse)
DBinstance = funcResponse['Environment']['Variables']['DBInstanceName']
print 'Starting RDS service for DBInstance : ' + DBinstance
except ClientError as e:
print(e)
try:
response = rds.start_db_instance(
DBInstanceIdentifier=DBinstance
)
print 'Success :: '
return response
except ClientError as e:
print(e)
return
{
'message' : "Script execution completed. See Cloudwatch logs for complete output"
}
I have a running rds instance. Every day I start and stop my RDS instance(db.t2.micro (MSSQL Server)) of AWS using a lambda expression. It was working fine previously but unexpectedly today I faced the issue.
Where my rds instance is not started automatically by the lambda expression. I watched an error log but there is not an issue it usually seems like the daily log. I am unable to troubleshoot and solve the issue. Can anyone tell me about this issue?
FYI, a shortened version would be:
import boto3
import os
def lambda_handler(event, context):
rds_client = boto3.client('rds')
response = rds.start_db_instance(DBInstanceIdentifier=os.environ['DBInstanceName'])
print response
You can see the logs of each lambda calls in cloudwatch or in aws lambda-> monitoring -> view logs in cloud watch. This will open a page with logs of each lambda call.
if there is not logs. it means that lambda is not invoking.
you can check the roles and policies assign to lambda if that is correct.
You should print the response of the api you use to start the db (ex- start-db-instance). The response will be printed to CloudWatch Log.
https://docs.aws.amazon.com/cli/latest/reference/rds/start-db-instance.html
for later automation you might want to create a metric-filter on the Lambda's CloudWatch Logs on a certain keyword such as -
"\"DBInstanceStatus\": \"starting\""
there will be an Alarm created as well with setting say threshold < 1, and if the keyword is not found in a log the metric will push no Value (you can customize this setting under Advanced option) and the Alarm will go in to INSUFFICIENT_DATA and you can set notification for INSUFFICIENT_DATA using SNS.
You can tweak the Alarm a bit to treat missing data as Bad and then Alarm will transition to ALARM state when metric filter does not match with the incoming log.

returning JSON response from AWS Glue Pythonshell job to the boto3 caller

Is there a way to send a JSON response (of a dictionary of outputs) from A AWS Glue pythonshell job? Similar to returning a JSON response from AWS Lambda?
I am calling a Glue pythonshell job like below:
response = glue.start_job_run(
JobName = 'test_metrics',
Arguments = {
'--test_metrics': 'test_metrics',
'--s3_target_path_key': 's3://my_target',
'--s3_target_path_value': 's3://my_target_value'} )
print(response)
The response I get is a 200 stating the fact that the Glue start_job_run was a success. From the documentation, all I see is the result if a Glue job is either written in s3 or some other database.
I tried adding return {'result':'some_string'} at the end of my Glue pythonshell job to test if it works or not with below code.
import sys
from awsglue.utils import getResolvedOptions
args = getResolvedOptions(sys.argv,
['JOB_NAME',
's3_target_path_key',
's3_target_path_value'])
print ("Target path key is: ", args['s3_target_path_key'])
print ("Target Path value is: ", args['s3_target_path_value'])
return {'result':"some_string"}
But it throws error SyntaxError: 'return' outside function
Glue is not made to return response as it is expected to run long running operation inside it. Blocking for response for long running task is not right approach in itself. Instead of it, you may use launch job (service 1) -> execute job (service 2)-> get result (service 3) pattern. You can send json response to AWS service 3 which you want to launch from AWS Service 2 (execute job) e.g. if you launch lambda from glue job, you can send json response to it.

Python Lambda function to capture AWS cloud watch logs of AWS MQ and send to kinesis

I got an Python script from put_records() only accepts keyword arguments in Kinesis boto3 Python API which will load json files to kinesis stream.
My architecture is something like this
In AWS console, I have created a Lambda function with the above code added to it.Lambda Function
How will I integrate or tell my lambda function that it wakes up every one minute. Should I need to add the capture messages via Cloud-watch events. If so how..?
I did get this solution form below link.
Python Script:-
import time
import boto3
import stomp
kinesis_client = boto3.client('kinesis')
class Listener(stomp.ConnectionListener):
def on_error(self, headers, message):
print('received an error "%s"' % message)
def on_message(self, headers, message):
print('received a message "%s"' % message)
kinesis_client.put_record(
StreamName='inter-lambda',
Data=u'{}\r\n'.format(message).encode('utf-8'),
PartitionKey='0'
)
def handler(event, context):
conn = stomp.Connection(host_and_ports=[('localhost', 61616)])
conn.set_listener('', Listener(conn))
conn.start()
conn.connect(login='user', passcode='pass')
conn.subscribe(destination='A.B.C.D', ack='auto')
print('Waiting for messages...')
time.sleep(10)
conn.close()
return ''
https://github.com/aws-samples/amazonmq-invoke-aws-lambda
You can schedule Lambda functions to run using CloudWatch Events.
Another alternative may be to subscribe to the log events and deliver them to Lambda.