Save file into EC2 directly through Lambda - amazon-web-services

I've made a Lambda function that stores a binary file into S3 and it works fine.
Instead, now I would like to save this file directly into my EC2 instance storage volume .
I searched a lot but I didn't understand if it's possible yet. Do you know?
I've already made an SSH connection (inside the Lambda..) to run SSH commands but I don't how to use in my case and if is the right way to save my data ...Do you have some idea?
I know that there is possibility to connect S3 to EC2 but first I would like to understand the possibility above..
Thanks

I made a solution ( Python ):
Using Boto3 and Paramiko package I build an SSH client to EC2, so I move my file to S3 by AWSCLI.
If useful for anyone I post part of code below:
import json
import boto3
import paramiko
def lambda_handler(event, context):
#My Parameters
myBucket = "lorem"
myPemKeyFile="lorem.pem"
myEc2Username="lorem"
ec2_client = boto3.client('ec2')
s3_client = boto3.client("s3")
OutFileName= "lorem.txt"
# PREPARING FOR SSH CLIENT
try:
# GETTING ISTANCE INFORMATION
describeInstance = ec2_client.describe_instances()
hostPublicIP=[]
# fetchin public IP address of the running instances
for i in describeInstance['Reservations']:
for instance in i['Instances']:
if instance["State"]["Name"] == "running":
hostPublicIP.append(instance['PublicIpAddress'])
#print(hostPublicIP)
# DOWNLOADING PEM FILE FROM S3
s3_client.download_file(myBucket,myPemKeyFile, '/tmp/file.pem')
# reading pem file and creating key object
key = paramiko.RSAKey.from_private_key_file("/tmp/file.pem")
# CREATING Paramiko.SSHClient
ssh_client = paramiko.SSHClient()
# setting policy to connect to unknown host
ssh_client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
host=hostPublicIP[0]
#print("Connecting to : " + host)
# connecting to server
ssh_client.connect(hostname=host, username=myEc2Username, pkey=key)
#print("Connected to :" + host)
except:
raise Exception('OPS, there whas a crash preparing for SSH client!! 500 ')
# MOVING FILE INTO S3
commands = [
"aws s3 mv ~/directoryFrom/"+OutFileName+" s3://"+myBucket+"/"+OutFileName
]
try:
for command in commands:
stdin , stdout, stderr = ssh_client.exec_command(command)
SSHout=stdout.read()
except:
raise Exception('OPS, somethig happends to SSH client. Move file to S3 didn\'t run 500')

Related

Airflow SSHOperator: How To Securely Access Pem File Across Tasks?

We are running Airflow via AWS's managed MWAA Offering. As part of their offering they include a tutorial on securely using the SSH Operator in conjunction with AWS Secrets Manager. The gist of how their solution works is described below:
Run a Task that fetches the pem file from a Secrets Manager location and store it on the filesystem at /tmp/mypem.pem.
In the SSH Connection include the extra information that specifies the file location
{"key_file":"/tmp/mypem.pem"}
Use the SSH Connection in the SSHOperator.
In short the workflow is supposed to be:
Task1 gets the pem -> Task2 uses the pem via the SSHOperator
All of this is great in theory, but it doesn't actually work. It doesn't work because Task1 may run on a different node from Task2, which means Task2 can't access the /tmp/mypem.pem file location that Task1 wrote the file to. AWS is aware of this limitation according to AWS Support, but now we need to understand another way to do this.
Question
How can we securely store and access a pem file that can then be used by Tasks running on different nodes via the SSHOperator?
I ran into the same problem. I extended the SSHOperator to do both steps in one call.
In AWS Secrets Manager, two keys are added for airflow to retrieve on execution.
{variables_prefix}/airflow-user-ssh-key : the value of the private key
{connections_prefix}/ssh_airflow_user : ssh://replace.user#replace.remote.host?key_file=%2Ftmp%2Fairflow-user-ssh-key
from typing import Optional, Sequence
from os.path import basename, splitext
from airflow.models import Variable
from airflow.providers.ssh.operators.ssh import SSHOperator
from airflow.providers.ssh.hooks.ssh import SSHHook
class SSHOperator(SSHOperator):
"""
SSHOperator to execute commands on given remote host using the ssh_hook.
:param ssh_conn_id: :ref:`ssh connection id<howto/connection:ssh>`
from airflow Connections.
:param ssh_key_var: name of Variable holding private key.
Creates "/tmp/{variable_name}.pem" to use in SSH connection.
May also be inferred from "key_file" in "extras" in "ssh_conn_id".
:param remote_host: remote host to connect (templated)
Nullable. If provided, it will replace the `remote_host` which was
defined in `ssh_hook` or predefined in the connection of `ssh_conn_id`.
:param command: command to execute on remote host. (templated)
:param timeout: (deprecated) timeout (in seconds) for executing the command. The default is 10 seconds.
Use conn_timeout and cmd_timeout parameters instead.
:param environment: a dict of shell environment variables. Note that the
server will reject them silently if `AcceptEnv` is not set in SSH config.
:param get_pty: request a pseudo-terminal from the server. Set to ``True``
to have the remote process killed upon task timeout.
The default is ``False`` but note that `get_pty` is forced to ``True``
when the `command` starts with ``sudo``.
"""
template_fields: Sequence[str] = ("command", "remote_host")
template_ext: Sequence[str] = (".sh",)
template_fields_renderers = {"command": "bash"}
def __init__(
self,
*,
ssh_conn_id: Optional[str] = None,
ssh_key_var: Optional[str] = None,
remote_host: Optional[str] = None,
command: Optional[str] = None,
timeout: Optional[int] = None,
environment: Optional[dict] = None,
get_pty: bool = False,
**kwargs,
) -> None:
super().__init__(
ssh_conn_id=ssh_conn_id,
remote_host=remote_host,
command=command,
timeout=timeout,
environment=environment,
get_pty=get_pty,
**kwargs,
)
if ssh_key_var is None:
key_file = SSHHook(ssh_conn_id=self.ssh_conn_id).key_file
key_filename = basename(key_file)
key_filename_no_extension = splitext(key_filename)[0]
self.ssh_key_var = key_filename_no_extension
else:
self.ssh_key_var = ssh_key_var
def import_ssh_key(self):
with open(f"/tmp/{self.ssh_key_var}", "w") as file:
file.write(Variable.get(self.ssh_key_var))
def execute(self, context):
self.import_ssh_key()
super().execute(context)
The answer by holly is good. I am sharing a different way I solved this problem. I used the strategy of converting the SSH Connection into a URI and then input that into Secrets Manager under the expected connections path, and everything worked great via the SSH Operator. Below are the general steps I took.
Generate an encoded URI
import json
from airflow.models.connection import Connection
from pathlib import Path
pem = Path(“/my/pem/file”/pem).read_text()
myconn= Connection(
conn_id="connX”,
conn_type="ssh",
host="10.x.y.z,
login=“mylogin”,
extra=json.dumps(dict(private_key=pem)),
print(myconn.get_uri())
Input that URI under the environment's configured path in Secrets Manager. The important note here is to input the value in the plaintext field without including a key. Example:
airflow/connections/connX and under Plaintext only include the URI value
Now in the SSHOperator you can reference this connection Id like any other.
remote_task = SSHOperator(
task_id="ssh_and_execute_command",
ssh_conn_id="connX"
command="whoami",
)

How to download new uploaded files from s3 to ec2 everytime

I have an s3 bucket which will receive new files throughout the day. I want to download these to my ec2 instance everytime a new file is uploaded to the bucket.
I have read that its possible using sqs or sns or lambda. Which is the easiest of them all? I need the file to be downloaded as early as possible once it is uploaded into the bucket.
EDIT
I basically will be getting png images in the bucket every few seconds or minutes. Everytime a new image is uploaded, I want to download that on the instance which is already running. I will do some AI processing. As the images will keeep coming into the bucket, I want to constantly keep downloading it in the ec2 and process it as soon as possible.
This is my code in the Lambda function so far.
import boto3
import json
def lambda_handler(event, context):
"""Read file from s3 on trigger."""
#print(event)
s3 = boto3.client("s3")
client = boto3.client("ec2")
ssm = boto3.client("ssm")
instanceid = "******"
if event:
file_obj = event["Records"][0]
#print(file_obj)
bucketname = str(file_obj["s3"]["bucket"]["name"])
print(bucketname)
filename = str(file_obj["s3"]["object"]["key"])
print(filename)
response = ssm.send_command(
InstanceIds=[instanceid],
DocumentName="AWS-RunShellScript",
Parameters={
"commands": [f"aws s3 cp {filename} ."]
}, # replace command_to_be_executed with command
)
# fetching command id for the output
command_id = response["Command"]["CommandId"]
time.sleep(3)
# fetching command output
output = ssm.get_command_invocation(CommandId=command_id, InstanceId=instanceid)
print(output)
return
However I am getting the following error
Test Event Name
test
Response
{
"errorMessage": "2021-12-01T14:11:30.781Z 88dbe51b-53d6-4c06-8c16-207698b3a936 Task timed out after 3.00 seconds"
}
Function Logs
START RequestId: 88dbe51b-53d6-4c06-8c16-207698b3a936 Version: $LATEST
END RequestId: 88dbe51b-53d6-4c06-8c16-207698b3a936
REPORT RequestId: 88dbe51b-53d6-4c06-8c16-207698b3a936 Duration: 3003.58 ms Billed Duration: 3000 ms Memory Size: 128 MB Max Memory Used: 87 MB Init Duration: 314.81 ms
2021-12-01T14:11:30.781Z 88dbe51b-53d6-4c06-8c16-207698b3a936 Task timed out after 3.00 seconds
Request ID
88dbe51b-53d6-4c06-8c16-207698b3a936
When I remove all the lines related to ssm, it works fine. Is there any permission issue or is there any problem with the code?
EDIT2
My code is working but I dont see any output or change in my ec2 instance. I should be seeing an empty text file in the home directory but I dont see anything
Code
import boto3
import json
import time
import logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)
def lambda_handler(event, context):
"""Read file from s3 on trigger."""
#print(event)
s3 = boto3.client("s3")
client = boto3.client("ec2")
ssm = boto3.client("ssm")
instanceid = "******"
print("HI")
if event:
file_obj = event["Records"][0]
#print(file_obj)
bucketname = str(file_obj["s3"]["bucket"]["name"])
print(bucketname)
filename = str(file_obj["s3"]["object"]["key"])
print(filename)
print("sending")
try:
response = ssm.send_command(
InstanceIds=[instanceid],
DocumentName="AWS-RunShellScript",
Parameters={
"commands": ["touch hi.txt"]
}, # replace command_to_be_executed with command
)
# fetching command id for the output
command_id = response["Command"]["CommandId"]
time.sleep(3)
# fetching command output
output = ssm.get_command_invocation(CommandId=command_id, InstanceId=instanceid)
print(output)
except Exception as e:
logger.error(e)
raise e
There are several ways. One would be to setup s3 notifications to invoke a lambda function. Then lambda function would use SSM Run Command to execute AWS CLI S3 command on your instance to download the file from S3.
I don't know why there is any recommendation of Lambda here. What you need is simple: S3 object created event notification -> SQS and some job on your EC2 instance watching a long polling queue.
Here is an example of such a python script. You need to sort out how the object key is encoded in the event, but it will be there. I haven't tested this, but it should be pretty close.
import boto3
def main() -> None:
s3 = boto3.client("s3")
sqs = boto3.client("sqs")
while True:
res = sqs.receive_message(
QueueUrl="yourQueue",
WaitTimeSeconds=20,
)
for msg in res.get("Messages", []):
s3.download_file("yourBucket", msg["key"], "local/file/path")
if __name__ == "__main__":
main()
You can use S3 Event Notifications, which react to a new file coming into the s3 bucket.
The destinations supported by s3 event are SNS, SQS or AWS lambda.
You can directly use the lambda as destination as described by #Marcin
You can use SQS has queue with a lambda behind pulling from the queue. It allows you to have some capability like dead letter queue. You can then pull messages from the queue using different methods:
AWS CLI
AWS SDK
You can use SNS with different things behind (you can have many of these desinations in a row which symbolise the fan-out pattern:
a SQS queue to manage the files
an email to notify
a lambda function
...
You can find more explication in ths article: https://aws.plainenglish.io/system-design-s3-events-to-lambda-vs-s3-events-to-sqs-sns-to-lambda-2d41477d1cc9

Swisscom Appcloud S3 Connection reset by peer

We have a Django Webservice that uses Swisscom AppCloud's S3 solution. So far we had no problems, but without changing anything on the application we are experiencing ConnectionError: ('Connection aborted.', error(104, 'Connection reset by peer')) errors when we are trying to upload files. We are using boto3 1.4.4.
Edit:
The error occures after somwhere between 10 and 30s. When I try from my local development machine it works.
from django.conf import settings
from boto3 import session
from botocore.exceptions import ClientError
class S3Client(object):
def __init__(self):
s3_session = session.Session()
self.s3_client = s3_session.client(
service_name='s3',
aws_access_key_id=settings.AWS_ACCESS_KEY,
aws_secret_access_key=settings.AWS_SECRET_ACCESS_KEY,
endpoint_url=settings.S3_ENDPOINT,
)
.
.
.
def add_file(self, bucket, fileobj, file_name):
self.s3_client.upload_fileobj(fileobj, bucket, file_name)
url = self.s3_client.generate_presigned_url(
ClientMethod='get_object',
Params={
'Bucket': bucket,
'Key': file_name
},
ExpiresIn=60*24*356*10 # signed for 10 years. Should be enough..
)
url, signature = self._split_signed_url(url)
return url, signature, file_name
Could this be a version problem or anything else on our side?
Edit:
Made some tests with s3cmd: I can list the buckets I have access to but for all other commands like listing all objects or just listing the objects in a bucket I get a Retrying failed request: / ([Errno 54] Connection reset by peer)
After some investigation I found the error:
Swisscom's implementation of S3 is somehow not up-to-date with Amazon's. To solve the problem I had to downgrade botocore from 1.5.78 to 1.5.62.

Errno 11004 getaddrinfo failed error in connecting to Amazon S3 bucket

I am trying to use the boto (ver 2.43.0) library in Python to connect to S3, but I keep getting socket.gaierror: [Errno 11004] when I try to do this:
from boto.s3.connection import S3Connection
access_key = 'accesskey_here'
secret_key = 'secretkey_here'
conn = S3Connection(access_key, secret_key)
mybucket = conn.get_bucket('s3://diap.prod.us-east-1.mybucket/')
print("success!")
I can connect to and access folders in mybucket using AWS CLI by using a command like this in Windows:
> aws s3 ls s3://diap.prod.us-east-1.mybucket/
<list of folders in mybucket will be here>
or using software like CloudBerry or S3Browser.
Is there something that I am doing wrong here to access S3 bucket and folders properly?
get_bucket() expects a bucket name.
get_bucket(bucket_name, validate=True, headers=None)
Try:
mybucket = conn.get_bucket('mybucket')
If it doesn't work, show the full stack trace.
{Update]: There is a bug in boto library for bucket names with dot. Update your boto config
[s3]
calling_format = boto.s3.connection.OrdinaryCallingFormat
Or
from boto.s3.connection import S3Connection, OrdinaryCallingFormat
conn = S3Connection(access_key, secret_key, calling_format=OrdinaryCallingFormat())

Start EC2 Windows instance and logon using boto

I want to start windows EC2 instance and logon using my credentials, the following scripts creates a EC2 instance and waits until it is running.
The problem is after this i have to manually go to the aws console and download the remote desktop shortcut and then log-on using my windows credentials (I am using my own AMI which has my credentials saved) but what i want is boto to start my machine without going to AWS console. Do you have any idea about how to do this ?
import boto
import boto.ec2
from settings import AWS_ACCESS_KEY, AWS_SECRET_ACCESS_KEY
from settings import BUCKET_NAME
import time
import os
conn = boto.ec2.connect_to_region("us-west-2",
aws_access_key_id=AWS_ACCESS_KEY,
aws_secret_access_key=AWS_SECRET_ACCESS_KEY)
#Create a Instance
reservation= conn.run_instances(
'ami-c8910***',
key_name='*****',
instance_type='t1.micro',
security_groups=['R***rFarm'])
instance=reservation.instances[0]
#wait until EC2 instance is intitated
while instance.state != 'running':
time.sleep(5)
instance.update() # Updates Instance metadata
print "Instance state: %s" % (instance.state)
print "instance %s done!" % (instance.id)
The remote desktop shortcut is a simple text file with a ".rdp" file extension. So you can create it yourself:
if instance.platform == u'windows':
fobj = open("%s.rdp" % (instance.ip_address), "w")
fobj.write("auto connect:i:1\n")
fobj.write("full address:s:%s\n" % (instance.ip_address))
fobj.write("username:s:Administrator\n")
fobj.close()