I have a working python/boto script which posts a message to my AWS SQS queue. The message body however is hardcoded into the script.
I creates a file called ~/file which contains two values
$ cat ~/file
Username 'encrypted_password_string'
I would like my boto script (see below) to send a message to my AWS SQS queue that contains these two values.
Can anyone please advise how to modify my script below so the message body sent to SQS contains the contents of file ~/file. Please also take note of the special characters that exists within a encrypted password string
Example:
~/file
username d5MopV/EsfSKk8BExCyLHFwNfBrOTzQ1
#!/usr/bin/env python
conf = {
"sqs-access-key": "xxxx",
"sqs-secret-key": "xxxx",
"sqs-queue-name": "UserPassChange",
"sqs-region": "xxxx",
"sqs-path": "sqssend"
}
import boto.sqs
conn = boto.sqs.connect_to_region(
conf.get('sqs-region'),
aws_access_key_id = conf.get('sqs-access-key'),
aws_secret_access_key = conf.get('sqs-secret-key')
)
q = conn.create_queue(conf.get('sqs-queue-name'))
from boto.sqs.message import RawMessage
m = RawMessage()
m.set_body('hardcoded message')
retval = q.write(m)
print 'added message, got retval: %s' % retval
one way to get it working:
in the script I added
import commands
then added,
USERNAME = commands.getoutput("echo $(who am i | awk '{print $1}')")
PASS = commands.getoutput("cat /tmp/.s")
and then added these values to my message body :
MSG = RawMessage()
MSG.set_body(json.dumps({'pass': PASS, 'user': USERNAME}))
The following example shows how to use Boto3 to send a file to a receiver.
test_sqs.py
import boto3
from moto import mock_sqs
#mock_sqs
def test_sqs():
sqs = boto3.resource('sqs', 'us-east-1')
queue = sqs.create_queue(QueueName='votes')
queue.send_message(MessageBody=open('beer.txt').read())
messages = queue.receive_messages()
assert len(messages) == 1
assert messages[0].body == 'tasty\n'
Related
I have an s3 bucket which will receive new files throughout the day. I want to download these to my ec2 instance everytime a new file is uploaded to the bucket.
I have read that its possible using sqs or sns or lambda. Which is the easiest of them all? I need the file to be downloaded as early as possible once it is uploaded into the bucket.
EDIT
I basically will be getting png images in the bucket every few seconds or minutes. Everytime a new image is uploaded, I want to download that on the instance which is already running. I will do some AI processing. As the images will keeep coming into the bucket, I want to constantly keep downloading it in the ec2 and process it as soon as possible.
This is my code in the Lambda function so far.
import boto3
import json
def lambda_handler(event, context):
"""Read file from s3 on trigger."""
#print(event)
s3 = boto3.client("s3")
client = boto3.client("ec2")
ssm = boto3.client("ssm")
instanceid = "******"
if event:
file_obj = event["Records"][0]
#print(file_obj)
bucketname = str(file_obj["s3"]["bucket"]["name"])
print(bucketname)
filename = str(file_obj["s3"]["object"]["key"])
print(filename)
response = ssm.send_command(
InstanceIds=[instanceid],
DocumentName="AWS-RunShellScript",
Parameters={
"commands": [f"aws s3 cp {filename} ."]
}, # replace command_to_be_executed with command
)
# fetching command id for the output
command_id = response["Command"]["CommandId"]
time.sleep(3)
# fetching command output
output = ssm.get_command_invocation(CommandId=command_id, InstanceId=instanceid)
print(output)
return
However I am getting the following error
Test Event Name
test
Response
{
"errorMessage": "2021-12-01T14:11:30.781Z 88dbe51b-53d6-4c06-8c16-207698b3a936 Task timed out after 3.00 seconds"
}
Function Logs
START RequestId: 88dbe51b-53d6-4c06-8c16-207698b3a936 Version: $LATEST
END RequestId: 88dbe51b-53d6-4c06-8c16-207698b3a936
REPORT RequestId: 88dbe51b-53d6-4c06-8c16-207698b3a936 Duration: 3003.58 ms Billed Duration: 3000 ms Memory Size: 128 MB Max Memory Used: 87 MB Init Duration: 314.81 ms
2021-12-01T14:11:30.781Z 88dbe51b-53d6-4c06-8c16-207698b3a936 Task timed out after 3.00 seconds
Request ID
88dbe51b-53d6-4c06-8c16-207698b3a936
When I remove all the lines related to ssm, it works fine. Is there any permission issue or is there any problem with the code?
EDIT2
My code is working but I dont see any output or change in my ec2 instance. I should be seeing an empty text file in the home directory but I dont see anything
Code
import boto3
import json
import time
import logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)
def lambda_handler(event, context):
"""Read file from s3 on trigger."""
#print(event)
s3 = boto3.client("s3")
client = boto3.client("ec2")
ssm = boto3.client("ssm")
instanceid = "******"
print("HI")
if event:
file_obj = event["Records"][0]
#print(file_obj)
bucketname = str(file_obj["s3"]["bucket"]["name"])
print(bucketname)
filename = str(file_obj["s3"]["object"]["key"])
print(filename)
print("sending")
try:
response = ssm.send_command(
InstanceIds=[instanceid],
DocumentName="AWS-RunShellScript",
Parameters={
"commands": ["touch hi.txt"]
}, # replace command_to_be_executed with command
)
# fetching command id for the output
command_id = response["Command"]["CommandId"]
time.sleep(3)
# fetching command output
output = ssm.get_command_invocation(CommandId=command_id, InstanceId=instanceid)
print(output)
except Exception as e:
logger.error(e)
raise e
There are several ways. One would be to setup s3 notifications to invoke a lambda function. Then lambda function would use SSM Run Command to execute AWS CLI S3 command on your instance to download the file from S3.
I don't know why there is any recommendation of Lambda here. What you need is simple: S3 object created event notification -> SQS and some job on your EC2 instance watching a long polling queue.
Here is an example of such a python script. You need to sort out how the object key is encoded in the event, but it will be there. I haven't tested this, but it should be pretty close.
import boto3
def main() -> None:
s3 = boto3.client("s3")
sqs = boto3.client("sqs")
while True:
res = sqs.receive_message(
QueueUrl="yourQueue",
WaitTimeSeconds=20,
)
for msg in res.get("Messages", []):
s3.download_file("yourBucket", msg["key"], "local/file/path")
if __name__ == "__main__":
main()
You can use S3 Event Notifications, which react to a new file coming into the s3 bucket.
The destinations supported by s3 event are SNS, SQS or AWS lambda.
You can directly use the lambda as destination as described by #Marcin
You can use SQS has queue with a lambda behind pulling from the queue. It allows you to have some capability like dead letter queue. You can then pull messages from the queue using different methods:
AWS CLI
AWS SDK
You can use SNS with different things behind (you can have many of these desinations in a row which symbolise the fan-out pattern:
a SQS queue to manage the files
an email to notify
a lambda function
...
You can find more explication in ths article: https://aws.plainenglish.io/system-design-s3-events-to-lambda-vs-s3-events-to-sqs-sns-to-lambda-2d41477d1cc9
I am very new to programming. I am working on a pipeline to analyze DMARC report files that are sent to my email account, that I am manually placing in an s3 bucket. The goal of this task is to download, extract, and analyze files using parsedmarc: https://github.com/domainaware/parsedmarc The part I'm having difficulty with is setting a conditional statement to extract .gz files if the target file is not a .zip file. I'm assuming the gzip library will be sufficient for this purpose. Here is the code I have so far. I'm using python3 and the boto3 library for AWS. Any help is appreciated!
import parsedmarc
import pprint
import json
import boto3
import zipfile
import gzip
pp = pprint.PrettyPrinter(indent=2)
def main():
#Set default session profile and region for sandbox account. Access keys are pulled from /.aws/config and /.aws/credentials.
#The 'profile_name' value comes from the header for the account in question in /.aws/config and /.aws/credentials
boto3.setup_default_session(region_name="aws-region-goes-here")
boto3.setup_default_session(profile_name="aws-account-profile-name-goes-here")
#Define the s3 resource, the bucket name, and the file to download. It's hardcoded for now...
s3_resource = boto3.resource(s3)
s3_resource.Bucket('dmarc-parsing').download_file('source-dmarc-report-filename.zip' '/home/user/dmarc/parseme.zip')
#Use the zipfile python library to extract the file into its raw state.
with zipfile.ZipFile('/home/user/dmarc/parseme.zip', 'r') as zip_ref:
zip_ref.extractall('/home/user/dmarc')
#Ingest all locations for xml file source
dmarc_report_directory = '/home/user/dmarc/'
dmarc_report_file = 'parseme.xml'
"""I need an if statement here for extracting .gz files if the file type is not .zip. The contents of every archive are .xml files"""
#Set report output variables using functions in parsedmarc. Variable set to equal the output
pd_report_output=parsedmarc.parse_aggregate_report_file(_input=f"{dmarc_report_directory}{dmarc_report_file}")
#use jsonify to make the output in json format
pd_report_jsonified = json.loads(json.dumps(pd_report_output))
dkim_status = pd_report_jsonified['records'][0]['policy_evaluated']['dkim']
spf_status = pd_report_jsonified['records'][0]['policy_evaluated']['spf']
if dkim_status == 'fail' or spf_status == 'fail':
print(f"{dmarc_report_file} reports failure. oh crap. report:")
else:
print(f"{dmarc_report_file} passes. great. report:")
pp.pprint(pd_report_jsonified['records'][0]['auth_results'])
if __name__ == "__main__":
main()
Here is the code using the parsedmarc.parse_aggregate_report_xml method I found. Hope this helps others in parsing these reports:
import parsedmarc
import pprint
import json
import boto3
import zipfile
import gzip
pp = pprint.PrettyPrinter(indent=2)
def main():
#Set default session profile and region for account. Access keys are pulled from ~/.aws/config and ~/.aws/credentials.
#The 'profile_name' value comes from the header for the account in question in ~/.aws/config and ~/.aws/credentials
boto3.setup_default_session(profile_name="aws_profile_name_goes_here", region_name="region_goes_here")
source_file = 'filename_in_s3_bucket.zip'
destination_directory = '/tmp/'
destination_file = 'compressed_report_file'
#Define the s3 resource, the bucket name, and the file to download. It's hardcoded for now...
s3_resource = boto3.resource('s3')
s3_resource.Bucket('bucket-name-for-dmarc-report-files').download_file(source_file, f"{destination_directory}{destination_file}")
#Extract xml
outputxml = parsedmarc.extract_xml(f"{destination_directory}{destination_file}")
#run parse dmarc analysis & convert output to json
pd_report_output = parsedmarc.parse_aggregate_report_xml(outputxml)
pd_report_jsonified = json.loads(json.dumps(pd_report_output))
#loop through results and find relevant status info and pass fail status
dmarc_report_status = ''
for record in pd_report_jsonified['records']:
if False in record['alignment'].values():
dmarc_report_status = 'Failed'
#************ add logic for interpreting results
#if fail, publish to sns
if dmarc_report_status == 'Failed':
message = "Your dmarc report failed a least one check. Review the log for details"
sns_resource = boto3.resource('sns')
sns_topic = sns_resource.Topic('arn:aws:sns:us-west-2:112896196555:TestDMARC')
sns_publish_response = sns_topic.publish(Message=message)
if __name__ == "__main__":
main()
I have created a flask application and it consist of 2 celery tasks.
Task 1: Generate a file through a process
Task 2: Email the generated file
Normally task one needs more time compared to task 2. I want to execute task 1 and then task 2. But the problem is both start to execute at the same time inside celery.
How can I resolve this issue.
#celery.task(name='celery_example.process')
def process(a,b,c,d,e,f):
command='rnx2rtkp -p '+a+' -f '+b+' -m '+c+' -n -o oout.pos '+d+' '+e+' '+f
os.system(command)
return 'Successfully created POS file'
#celery.task(name='celery_example.emailfile')
def emailfile(recipientemail):
email_user = ''
email_password = ''
subject = 'subject'
msg = MIMEMultipart()
msg['From'] = email_user
msg['To'] = recipientemail
msg['Subject'] = subject
body = 'This is your Post-Processed position file'
msg.attach(MIMEText(body,'plain'))
filename='oout.pos'
attachment =open(filename,'rb')
part = MIMEBase('application','octet-stream')
part.set_payload((attachment).read())
encoders.encode_base64(part)
part.add_header('Content-Disposition',"attachment; filename= "+filename)
msg.attach(part)
text = msg.as_string()
server = smtplib.SMTP('smtp.gmail.com',587)
server.starttls()
server.login(email_user,email_password)
server.sendmail(email_user,recipientemail,text)
server.quit()
return 'Email has been successfully sent'
This is the app.route
#app.route('/pp.php', methods=['GET', 'POST'])
def pp():
pp = My1Form()
target = os.path.join(APP_ROOT)
print(target)
for fileBase in request.files.getlist("fileBase"):
print(fileBase)
filename = fileBase.filename
destination = "/".join([target, filename])
print(destination)
fileBase.save(destination)
for fileObsRover in request.files.getlist("fileObsRover"):
print(fileObsRover)
filename = fileObsRover.filename
destination = "/".join([target, filename])
print(destination)
fileObsRover.save(destination)
for fileNavRover in request.files.getlist("fileNavRover"):
print(fileNavRover)
filename = fileNavRover.filename
destination = "/".join([target, filename])
print(destination)
fileNavRover.save(destination)
a=fileObsRover.filename
b=fileBase.filename
c=fileNavRover.filename
elevation=pp.ema.data
Freq=pp.frq.data
posMode=pp.pmode.data
emailAdd=pp.email.data
process.delay(posMode,Freq,elevation,a,b,c)
emailfile.delay(emailAdd)
return render_template('results.html', email=pp.email.data, Name=pp.Name.data, ema=elevation, frq=Freq, pmode=posMode, fileBase=a)
return render_template('pp.php', pp=pp)
As it currently stands your code does the following:
# schedule process to run asynchronously
process.delay(posMode,Freq,elevation,a,b,c)
# schedule emailfile to run asynchronously
emailfile.delay(emailAdd)
Both of these will immediately be picked up by workers and executed. You have provided nothing to inform celery that emailfile should wait until processfile is complete.
Instead you should:
alter the signature of emailfile to include another parameter that will be the output of a successful processfile call; then
call processfile using link.
For example:
deferred = processfile.apply_async(
(posMode,Freq,elevation,a,b,c),
link=emailfile.s())
deferred.get()
An alternative to using link, but semantically identical in this case, would be to use a chain.
Is there a way via the API to export Mailgun's logs to a local file for long term storage? We need to keep our mailing logs for over the 30 days Mailgun provides for.
Thanks!
You can only request 300 events at a time, so you'll have to continue fetching the next page until you run out of results. You can then do whatever you'd like with the log items, such as generate a csv, or add items in your database. Check out https://documentation.mailgun.com/en/latest/api-events.html#events for the API docs. Here's an example in Python:
import requests
import csv
from datetime import datetime, timedelta
DATETIME_FORMAT = '%d %B %Y %H:%M:%S -0000'
def get_logs(start_date, end_date, next_url=None):
if next_url:
logs = requests.get(next_url,auth=("api", [YOUR MAILGUN ACCESS KEY]))
else:
logs = requests.get(
'https://api.mailgun.net/v3/{0}/events'.format(
[YOUR MAILGUN SERVER NAME]
),
auth=("api", [YOUR MAILGUN ACCESS KEY]),
params={"begin" : start_date.strftime(DATETIME_FORMAT),
"end" : end_date.strftime(DATETIME_FORMAT),
"ascending" : "yes",
"pretty" : "yes",
"limit" : 300,
"event" : "accepted",}
)
return logs.json()
start = datetime.now() - timedelta(2)
end = timezone.now() - timedelta(1)
log_items = []
current_page = get_logs(start, end)
while current_page.get('items'):
items = current_page.get('items')
log_items.extend(items)
next_url = current_page.get('paging').get('next', None)
current_page = get_logs(start, end, next_url=next_url)
keys = log_items[0].keys()
with open('mailgun{0}.csv'.format(start.strftime('%Y-%M-%d')), 'wb') as output_file:
dict_writer = csv.DictWriter(output_file, keys)
dict_writer.writeheader()
dict_writer.writerows(log_items)
There's a simple python script to retrieve logs for a domain, however i haven't checked if it hits the events api instead of the now deprecated logs api...
https://github.com/getupcloud/python-mailgunlog
The original answer doesn't work without modifications. Here is the updated code that works:
#!/usr/bin/env python3
# Uses the Mailgun API to save logs to JSON file
# Set environment variables MAILGUN_API_KEY and MAILGUN_SERVER
# Optionally set MAILGUN_LOG_DAYS to number of days to retrieve logs for
# Based on https://stackoverflow.com/a/49825979
# See API guide https://documentation.mailgun.com/en/latest/api-intro.html#introduction
import os
import json
import requests
from datetime import datetime, timedelta
from email import utils
DAYS_TO_GET = os.environ.get("MAILGUN_LOG_DAYS", 7)
MAILGUN_API_KEY = os.environ.get("MAILGUN_API_KEY")
MAILGUN_SERVER = os.environ.get("MAILGUN_SERVER")
if not MAILGUN_API_KEY or not MAILGUN_SERVER:
print("Set environment variable MAILGUN_API_KEY and MAILGUN_SERVER")
exit(1)
ITEMS_PER_PAGE = 300 # API is limited to 300
def get_logs(start_date, next_url=None):
if next_url:
print(f"Getting next batch of {ITEMS_PER_PAGE} from {next_url}...")
response = requests.get(next_url,auth=("api", MAILGUN_API_KEY))
else:
url = 'https://api.mailgun.net/v3/{0}/events'.format(MAILGUN_SERVER)
start_date_formatted = utils.format_datetime(start_date) # Mailgun wants it in RFC 2822
print(f"Getting first batch of {ITEMS_PER_PAGE} from {url} since {start_date_formatted}...")
response = requests.get(
url,
auth=("api", MAILGUN_API_KEY),
params={"begin" : start_date_formatted,
"ascending" : "yes",
"pretty" : "yes",
"limit" : ITEMS_PER_PAGE,
"event" : "accepted",}
)
response.raise_for_status()
return response.json()
start = datetime.now() - timedelta(DAYS_TO_GET)
log_items = []
current_page = get_logs(start)
while current_page.get('items'):
items = current_page.get('items')
log_items.extend(items)
print(f"Retrieved {len(items)} records for a total of {len(log_items)}")
next_url = current_page.get('paging').get('next', None)
current_page = get_logs(start, next_url=next_url)
file_out = f"mailgun-logs-{MAILGUN_SERVER}_{start.strftime('%Y-%m-%d')}_to_{datetime.now().strftime('%Y-%m-%d')}.json"
print(f"Writing out {file_out}")
with open(file_out, 'w') as file_out_handle:
json.dump(log_items, file_out_handle, indent=4)
print("Done.")
You can have a look at MailgunLogger.
It's an open source project that can easily be deployed via Docker to fetch and store Mailgun events in a database. It features a dead simple, although rudimentary, search and allows you to add multiple accounts/domains.
Run via Docker:
docker run -d -p 5050:5050 \
-e "ML_DB_USER=username" \
-e "ML_DB_PASSWORD=password" \
-e "ML_DB_NAME=mailgun_logger" \
-e "ML_DB_HOST=my_db_host" \
--name mailgun_logger jackjoe/mailgun_logger
From there on, the interface guides you to configure everything.
In the OP case, this project can be used in a more headless fashion where you only use the database instead of the provided UI.
You can use Skyvia for exporting logs from Mailgun for LTS. Skyvia is a cloud tool for automatic Mailgun CSV import/export with powerful transformations. You can also export Mailgun ListMembers, Templates, Tags, etc. to CSV automatically on a schedule.
My python code looks like :
import json
import boto.sqs
import boto
from boto.sqs.connection import SQSConnection
from boto.sqs.message import Message
from boto.sqs.message import RawMessage
sqs = boto.connect_sqs(aws_access_key_id='XXXXXXXXXXXXXXX',aws_secret_access_key='XXXXXXXXXXXXXXXXX')
q = sqs.create_queue("Nishantqueue") // Already present
q.set_message_class(RawMessage)
results = q.get_messages()
ret = "Got %s result(s) this time.\n\n" % len(results)
for result in results:
msg = json.loads(result.get_body())
ret += "Message: %s\n" % msg['message']
ret += "\n... done."
print ret
My SQS queue contains atleast 5 to 6 messages... when i execute this ... i get output as this and this is on every run, this code isn't able to pull the mssgs from the queue :
Got 0 result(s) this time.
...done.
I am sure i am missing something in the loop.... couldn't find though
Your code is retrieving messages from an Amazon SQS queue, but it doesn't seem to be deleting them. This means that messages will be invisible for a period of time (specified by the visibility_timeout parameter), after which they will reappear. The expectation is that if a message is not deleted within this time, then it has failed to be processed and should reappear on the queue to try again.
Here's some code that pulls a message from a queue, then deletes it after processing. Note the visibility_timeout specified when a message is retrieved. It is using read() to simply return one message:
#!/usr/bin/python27
import boto, boto.sqs
from boto.sqs.message import Message
# Connect to Queue
q_conn = boto.sqs.connect_to_region("ap-southeast-2")
q = q_conn.get_queue('queue-name')
# Get a message
m = q.read(visibility_timeout=15)
if m == None:
print "No message!"
else:
print m.get_body()
q.delete_message(m)
It's possible that your messages were invisible ("in-flight") when you tried to retrieve them.