How to get requests with serverless lambda - flask

I'm trying to get data from an url as a file and serve it back with the right mimetype.
I've tried a lot of different options this is some of the python flask code I currently have
## download video
#app.route('/download/<string:resource>')
def download(resource):
asset = getasset(resource)
# headers = {"Content-Type":"application/octet-stream","Accept-Encoding":"gzip, deflate, br","Accept":"*/*"}
response = requests.get(asset['downloads']['h264_720'], stream=True)
# length = response.headers.get('Content-Length')
def exhaust(response):
while True:
response.raw.decode_content = True
out = response.content.read(1024*1024)
if not out:
break
yield out
if IS_OFFLINE:
return Response(exhaust(response), mimetype='video/mp4')
else:
return Response(base64.b64decode(exhaust(response)), mimetype='video/mp4')
Offline the response is fine reviewing it locally with "serverless wsgi serve --stage dev"
Online the response is different (after doing "serverless deploy --stage dev"...
Please have a look at the image, left the correct mp4 video file. Right a file that is bigger and not a mp4 file.
It has something to do with base64.b64encode(r.content) but there is more to it.
I started of with this function:
### download video
# #app.route('/download/<string:resource>')
# def download(resource):
# asset = getasset(resource)
# r = requests.get(asset['downloads']['h264_720'],stream=True)
# if IS_OFFLINE:
# return Response(r.content, mimetype='video/mp4')
# else:
# return Response(base64.b64decode(r.content), mimetype='video/mp4')
This results in a file that looks like this and is only 200 bytes:
ftypisomisomiso2avc1mp41moovlmvhdTtraktkhd8edtselst8treftmcdmdiamdhd2UhdlrvideVideoHandlerjminfvmhddinfdrefurlstblstsdavc1HH9avcCMgMPfxrhcolrnclxpaspbtrtq+Vsttsstss3estscstszOBNC7468x69G8BClAiBBKGHAEArLiDGuc=
It has some of the first characters that I can see in the correct file:
Any one knows what's going on and how to fix it?
I did manage to reproduce the issue locally:
import requests
import base64
url = 'to a video file...'
r = requests.get(url)
with open("test.mp4", "wb") as out_file:
#reproducing the issue with this
base64_bytes = base64.b64encode(r.content)
#uncomment this will produce correct output
#message_bytes = base64.b64decode(base64_bytes)
out_file.write(message_bytes)

Ok I found the issue and added this to my serverless.yml
provider:
name: aws
runtime: python3.9
### fix:
apiGateway:
binaryMediaTypes:
- '*/*'
###
source:
https://github.com/dherault/serverless-offline/issues/464

Related

Why is the file uploaded to AWS S3 0B in size?

I am developing a webapplication with Flask as the backend and Nuxt JS as the frontend. I receive an image file from the frontend and can save it to my Flask directory structure locally. The file is ok and the images is being shown if I open it. Now i want to upload this image to AWS S3 instead of saving it to my disk. I use the boto3 SDK, here is my code:
Here is my save_picture method, that opens the image file and resizes it. I had the save method, but commented it out to avoid saving the file to disk as I want it only on S3.
def save_picture(object_id, form_picture, path):
if form_picture is None:
return None
random_hex = token_hex(8)
filename = form_picture.filename
if '.' not in filename:
return None
extension = filename.rsplit('.', 1)[1].lower()
if not allowed_file(extension, form_picture):
return None
picture_fn = f'{object_id}_{random_hex}.{extension}'
picture_path = current_app.config['UPLOAD_FOLDER'] / path / picture_fn
# resizing image and saving the small version
output_size = (1280, 720)
i = Image.open(form_picture)
i.thumbnail(output_size)
# i.save(picture_path)
return picture_fn
image_name = save_picture(object_id=new_object.id, form_picture=file, path=f'{object_type}_images')
s3 = boto3.client(
's3',
aws_access_key_id=current_app.config['AWS_ACCESS_KEY'],
aws_secret_access_key=current_app.config['AWS_SECRET_ACCESS_KEY']
)
print(file) # this prints <FileStorage: 'Capture.JPG' ('image/jpeg')>, so the file is ok
try:
s3.upload_fileobj(
file,
current_app.config['AWS_BUCKET_NAME'],
image_name,
ExtraArgs={
'ContentType': file.content_type
}
)
except Exception as e:
print(e)
return make_response({'msg': 'Something went wrong.'}, 500)
I can see the uploaded file in my S3, but it shows 0 B in size and if I download it, it says that it cannot be viewed.
I have tried different access policies in S3, as well as many tutorials online, nothing seems to help. Changing the version of S3 to v3 when creating the client breaks the whole system and the file is not being uploaded at all with an access error.
What could be the reason for this upload failure? I it the config of AWS or something else?
Thank you!
Thanks to #jarmod I tried to avoid the image processing and it worked. I am now resizing the image, saving it to disk, opening the saved image, not the initial file, and sending it to S3. I then delete the image on disk as I don't need it.

Retrieving results from Mturk Sandbox

I'm working on retrieving my HIT results from my local computer. I followed the template of get_results.py and entered my key_id, access_key correctly, and installed xmltodict but got the error message. Could anyone help me figure out why? Here is my HIT address if anyone needs the format of my HIT https://workersandbox.mturk.com/mturk/preview?groupId=3MKP0VNPM2VVY0K5UTNZX9OO9Q8RJE
import boto3
mturk = boto3.client('mturk',
aws_access_key_id = "PASTE_YOUR_IAM_USER_ACCESS_KEY",
aws_secret_access_key = "PASTE_YOUR_IAM_USER_SECRET_KEY",
region_name='us-east-1',
endpoint_url = MTURK_SANDBOX
)
# You will need the following library
# to help parse the XML answers supplied from MTurk
# Install it in your local environment with
# pip install xmltodict
import xmltodict
# Use the hit_id previously created
hit_id = 'PASTE_IN_YOUR_HIT_ID'
# We are only publishing this task to one Worker
# So we will get back an array with one item if it has been completed
worker_results = mturk.list_assignments_for_hit(HITId=hit_id, AssignmentStatuses=['Submitted'])

Python Requests: How can I properly submit a multipart/form POST using a file name

I have taken a look at other questions related to multipart/form POST requests in Python but unfortunately, they don't seem to address my exact question. Basically, I normally use CURL in order to hit an API service that allows me to upload zip files in order to create HTML5 assets. The CURL command I use looks like this:
curl -X POST -H "Authorization: api: 222111" --form "type=html" --form "file=Folder1/Folder2/example.zip" "https://example.api.com/upload?ins_id=123"
I am trying to use a python script to iterate through a folder of zip files in order to upload all of these files and receive a "media ID" back. This is what my script looks like:
import os
import requests
import json
ins_id = raw_input("Please enter your member ID: ")
auth = raw_input("Please enter your API authorization token: ")
for filename in os.listdir("zips"):
if filename.endswith(".zip"):
file_path = os.path.abspath(filename)
url = "https://example.api.com/upload?
ins_id="+str(ins_id)
header = {"Authorization": auth}
response = requests.post(url, headers=header, files={"form_type":
(None, "html"), "form_file_upload": (None, str(file_path))})
api_response = response.json()
print api_response
This API service requires the file path to be included when submitting the POST. However, when I use this script, the response indicates that "file not provided". Am I including this information correctly in my script?
Thanks.
Update:
I think I am heading in the right direction now (thanks to the answer provided) but now, I receive an error message stating that there is "no such file or directory". My thinking is that I am not using os.path correctly but even if I change my code to use "relpath" I still get the same message. My script is in a folder and I have a completely different folder called "zips" (in the same directory) which is where all of my zip files are stored.
To upload files with the request library, you can include the file handler directly in the JSON as described in the documentation. This is the corresponding example that I have taken from there:
url = 'http://httpbin.org/post'
files = {'file': open('path_to_your_file', 'rb')}
r = requests.post(url, files=files)
If we integrate this in your script, it would look as follows (I also made it slightly more pythonic):
import os
import requests
import json
folder = 'zips'
ins_id = raw_input("Please enter your member ID: ")
auth = raw_input("Please enter your API authorization token: ")
url = "https://example.api.com/upload?"
header = {"Authorization": auth}
for filename in os.listdir(folder):
if not filename.endswith(".zip"):
continue
file_path = os.path.abspath(os.path.join(folder, filename))
ins_id="+str(ins_id)"
response = requests.post(
url, headers=header,
files={"form_type": (None, "html"),
"form_file_upload": open(file_path, 'rb')}
)
api_response = response.json()
print api_response
As I don't have the API end point, I can't actually test this code block - but it should be something along these lines.

Exporting Logs from Mailgun for LTS

Is there a way via the API to export Mailgun's logs to a local file for long term storage? We need to keep our mailing logs for over the 30 days Mailgun provides for.
Thanks!
You can only request 300 events at a time, so you'll have to continue fetching the next page until you run out of results. You can then do whatever you'd like with the log items, such as generate a csv, or add items in your database. Check out https://documentation.mailgun.com/en/latest/api-events.html#events for the API docs. Here's an example in Python:
import requests
import csv
from datetime import datetime, timedelta
DATETIME_FORMAT = '%d %B %Y %H:%M:%S -0000'
def get_logs(start_date, end_date, next_url=None):
if next_url:
logs = requests.get(next_url,auth=("api", [YOUR MAILGUN ACCESS KEY]))
else:
logs = requests.get(
'https://api.mailgun.net/v3/{0}/events'.format(
[YOUR MAILGUN SERVER NAME]
),
auth=("api", [YOUR MAILGUN ACCESS KEY]),
params={"begin" : start_date.strftime(DATETIME_FORMAT),
"end" : end_date.strftime(DATETIME_FORMAT),
"ascending" : "yes",
"pretty" : "yes",
"limit" : 300,
"event" : "accepted",}
)
return logs.json()
start = datetime.now() - timedelta(2)
end = timezone.now() - timedelta(1)
log_items = []
current_page = get_logs(start, end)
while current_page.get('items'):
items = current_page.get('items')
log_items.extend(items)
next_url = current_page.get('paging').get('next', None)
current_page = get_logs(start, end, next_url=next_url)
keys = log_items[0].keys()
with open('mailgun{0}.csv'.format(start.strftime('%Y-%M-%d')), 'wb') as output_file:
dict_writer = csv.DictWriter(output_file, keys)
dict_writer.writeheader()
dict_writer.writerows(log_items)
There's a simple python script to retrieve logs for a domain, however i haven't checked if it hits the events api instead of the now deprecated logs api...
https://github.com/getupcloud/python-mailgunlog
The original answer doesn't work without modifications. Here is the updated code that works:
#!/usr/bin/env python3
# Uses the Mailgun API to save logs to JSON file
# Set environment variables MAILGUN_API_KEY and MAILGUN_SERVER
# Optionally set MAILGUN_LOG_DAYS to number of days to retrieve logs for
# Based on https://stackoverflow.com/a/49825979
# See API guide https://documentation.mailgun.com/en/latest/api-intro.html#introduction
import os
import json
import requests
from datetime import datetime, timedelta
from email import utils
DAYS_TO_GET = os.environ.get("MAILGUN_LOG_DAYS", 7)
MAILGUN_API_KEY = os.environ.get("MAILGUN_API_KEY")
MAILGUN_SERVER = os.environ.get("MAILGUN_SERVER")
if not MAILGUN_API_KEY or not MAILGUN_SERVER:
print("Set environment variable MAILGUN_API_KEY and MAILGUN_SERVER")
exit(1)
ITEMS_PER_PAGE = 300 # API is limited to 300
def get_logs(start_date, next_url=None):
if next_url:
print(f"Getting next batch of {ITEMS_PER_PAGE} from {next_url}...")
response = requests.get(next_url,auth=("api", MAILGUN_API_KEY))
else:
url = 'https://api.mailgun.net/v3/{0}/events'.format(MAILGUN_SERVER)
start_date_formatted = utils.format_datetime(start_date) # Mailgun wants it in RFC 2822
print(f"Getting first batch of {ITEMS_PER_PAGE} from {url} since {start_date_formatted}...")
response = requests.get(
url,
auth=("api", MAILGUN_API_KEY),
params={"begin" : start_date_formatted,
"ascending" : "yes",
"pretty" : "yes",
"limit" : ITEMS_PER_PAGE,
"event" : "accepted",}
)
response.raise_for_status()
return response.json()
start = datetime.now() - timedelta(DAYS_TO_GET)
log_items = []
current_page = get_logs(start)
while current_page.get('items'):
items = current_page.get('items')
log_items.extend(items)
print(f"Retrieved {len(items)} records for a total of {len(log_items)}")
next_url = current_page.get('paging').get('next', None)
current_page = get_logs(start, next_url=next_url)
file_out = f"mailgun-logs-{MAILGUN_SERVER}_{start.strftime('%Y-%m-%d')}_to_{datetime.now().strftime('%Y-%m-%d')}.json"
print(f"Writing out {file_out}")
with open(file_out, 'w') as file_out_handle:
json.dump(log_items, file_out_handle, indent=4)
print("Done.")
You can have a look at MailgunLogger.
It's an open source project that can easily be deployed via Docker to fetch and store Mailgun events in a database. It features a dead simple, although rudimentary, search and allows you to add multiple accounts/domains.
Run via Docker:
docker run -d -p 5050:5050 \
-e "ML_DB_USER=username" \
-e "ML_DB_PASSWORD=password" \
-e "ML_DB_NAME=mailgun_logger" \
-e "ML_DB_HOST=my_db_host" \
--name mailgun_logger jackjoe/mailgun_logger
From there on, the interface guides you to configure everything.
In the OP case, this project can be used in a more headless fashion where you only use the database instead of the provided UI.
You can use Skyvia for exporting logs from Mailgun for LTS. Skyvia is a cloud tool for automatic Mailgun CSV import/export with powerful transformations. You can also export Mailgun ListMembers, Templates, Tags, etc. to CSV automatically on a schedule.

Uploading video to YouTube and adding it to playlist using YouTube Data API v3 in Python

I wrote a script to upload a video to YouTube using YouTube Data API v3 in the python with help of example given in Example code.
And I wrote another script to add uploaded video to playlist using same YouTube Data API v3 you can be seen here
After that I wrote a single script to upload video and add that video to playlist. In that I took care of authentication and scops still I am getting permission error. here is my new script
#!/usr/bin/python
import httplib
import httplib2
import os
import random
import sys
import time
from apiclient.discovery import build
from apiclient.errors import HttpError
from apiclient.http import MediaFileUpload
from oauth2client.file import Storage
from oauth2client.client import flow_from_clientsecrets
from oauth2client.tools import run
# Explicitly tell the underlying HTTP transport library not to retry, since
# we are handling retry logic ourselves.
httplib2.RETRIES = 1
# Maximum number of times to retry before giving up.
MAX_RETRIES = 10
# Always retry when these exceptions are raised.
RETRIABLE_EXCEPTIONS = (httplib2.HttpLib2Error, IOError, httplib.NotConnected,
httplib.IncompleteRead, httplib.ImproperConnectionState,
httplib.CannotSendRequest, httplib.CannotSendHeader,
httplib.ResponseNotReady, httplib.BadStatusLine)
# Always retry when an apiclient.errors.HttpError with one of these status
# codes is raised.
RETRIABLE_STATUS_CODES = [500, 502, 503, 504]
CLIENT_SECRETS_FILE = "client_secrets.json"
# A limited OAuth 2 access scope that allows for uploading files, but not other
# types of account access.
YOUTUBE_UPLOAD_SCOPE = "https://www.googleapis.com/auth/youtube.upload"
YOUTUBE_API_SERVICE_NAME = "youtube"
YOUTUBE_API_VERSION = "v3"
# Helpful message to display if the CLIENT_SECRETS_FILE is missing.
MISSING_CLIENT_SECRETS_MESSAGE = """
WARNING: Please configure OAuth 2.0
To make this sample run you will need to populate the client_secrets.json file
found at:
%s
with information from the APIs Console
https://code.google.com/apis/console#access
For more information about the client_secrets.json file format, please visit:
https://developers.google.com/api-client-library/python/guide/aaa_client_secrets
""" % os.path.abspath(os.path.join(os.path.dirname(__file__),
CLIENT_SECRETS_FILE))
def get_authenticated_service():
flow = flow_from_clientsecrets(CLIENT_SECRETS_FILE, scope=YOUTUBE_UPLOAD_SCOPE,
message=MISSING_CLIENT_SECRETS_MESSAGE)
storage = Storage("%s-oauth2.json" % sys.argv[0])
credentials = storage.get()
if credentials is None or credentials.invalid:
credentials = run(flow, storage)
return build(YOUTUBE_API_SERVICE_NAME, YOUTUBE_API_VERSION,
http=credentials.authorize(httplib2.Http()))
def initialize_upload(title,description,keywords,privacyStatus,file):
youtube = get_authenticated_service()
tags = None
if keywords:
tags = keywords.split(",")
insert_request = youtube.videos().insert(
part="snippet,status",
body=dict(
snippet=dict(
title=title,
description=description,
tags=tags,
categoryId='26'
),
status=dict(
privacyStatus=privacyStatus
)
),
# chunksize=-1 means that the entire file will be uploaded in a single
# HTTP request. (If the upload fails, it will still be retried where it
# left off.) This is usually a best practice, but if you're using Python
# older than 2.6 or if you're running on App Engine, you should set the
# chunksize to something like 1024 * 1024 (1 megabyte).
media_body=MediaFileUpload(file, chunksize=-1, resumable=True)
)
vid=resumable_upload(insert_request)
#Here I added lines to add video to playlist
#add_video_to_playlist(youtube,vid,"PL2JW1S4IMwYubm06iDKfDsmWVB-J8funQ")
#youtube = get_authenticated_service()
add_video_request=youtube.playlistItems().insert(
part="snippet",
body={
'snippet': {
'playlistId': "PL2JW1S4IMwYubm06iDKfDsmWVB-J8funQ",
'resourceId': {
'kind': 'youtube#video',
'videoId': vid
}
#'position': 0
}
}
).execute()
def resumable_upload(insert_request):
response = None
error = None
retry = 0
vid=None
while response is None:
try:
print "Uploading file..."
status, response = insert_request.next_chunk()
if 'id' in response:
print "'%s' (video id: %s) was successfully uploaded." % (
title, response['id'])
vid=response['id']
else:
exit("The upload failed with an unexpected response: %s" % response)
except HttpError, e:
if e.resp.status in RETRIABLE_STATUS_CODES:
error = "A retriable HTTP error %d occurred:\n%s" % (e.resp.status,
e.content)
else:
raise
except RETRIABLE_EXCEPTIONS, e:
error = "A retriable error occurred: %s" % e
if error is not None:
print error
retry += 1
if retry > MAX_RETRIES:
exit("No longer attempting to retry.")
max_sleep = 2 ** retry
sleep_seconds = random.random() * max_sleep
print "Sleeping %f seconds and then retrying..." % sleep_seconds
time.sleep(sleep_seconds)
return vid
if __name__ == '__main__':
title="sample title"
description="sample description"
keywords="keyword1,keyword2,keyword3"
privacyStatus="public"
file="myfile.mp4"
vid=initialize_upload(title,description,keywords,privacyStatus,file)
print 'video ID is :',vid
I am not able to figure out what is wrong. I am getting permission error. both script works fine independently.
could anyone help me figure out where I am wrong or how to achieve uploading video and adding that too playlist.
I got the answer actually in both the independent script scope is different.
scope for uploading is "https://www.googleapis.com/auth/youtube.upload"
scope for adding to playlist is "https://www.googleapis.com/auth/youtube"
as scope is different so I had to handle authentication separately.