Why is the file uploaded to AWS S3 0B in size? - amazon-web-services

I am developing a webapplication with Flask as the backend and Nuxt JS as the frontend. I receive an image file from the frontend and can save it to my Flask directory structure locally. The file is ok and the images is being shown if I open it. Now i want to upload this image to AWS S3 instead of saving it to my disk. I use the boto3 SDK, here is my code:
Here is my save_picture method, that opens the image file and resizes it. I had the save method, but commented it out to avoid saving the file to disk as I want it only on S3.
def save_picture(object_id, form_picture, path):
if form_picture is None:
return None
random_hex = token_hex(8)
filename = form_picture.filename
if '.' not in filename:
return None
extension = filename.rsplit('.', 1)[1].lower()
if not allowed_file(extension, form_picture):
return None
picture_fn = f'{object_id}_{random_hex}.{extension}'
picture_path = current_app.config['UPLOAD_FOLDER'] / path / picture_fn
# resizing image and saving the small version
output_size = (1280, 720)
i = Image.open(form_picture)
i.thumbnail(output_size)
# i.save(picture_path)
return picture_fn
image_name = save_picture(object_id=new_object.id, form_picture=file, path=f'{object_type}_images')
s3 = boto3.client(
's3',
aws_access_key_id=current_app.config['AWS_ACCESS_KEY'],
aws_secret_access_key=current_app.config['AWS_SECRET_ACCESS_KEY']
)
print(file) # this prints <FileStorage: 'Capture.JPG' ('image/jpeg')>, so the file is ok
try:
s3.upload_fileobj(
file,
current_app.config['AWS_BUCKET_NAME'],
image_name,
ExtraArgs={
'ContentType': file.content_type
}
)
except Exception as e:
print(e)
return make_response({'msg': 'Something went wrong.'}, 500)
I can see the uploaded file in my S3, but it shows 0 B in size and if I download it, it says that it cannot be viewed.
I have tried different access policies in S3, as well as many tutorials online, nothing seems to help. Changing the version of S3 to v3 when creating the client breaks the whole system and the file is not being uploaded at all with an access error.
What could be the reason for this upload failure? I it the config of AWS or something else?
Thank you!

Thanks to #jarmod I tried to avoid the image processing and it worked. I am now resizing the image, saving it to disk, opening the saved image, not the initial file, and sending it to S3. I then delete the image on disk as I don't need it.

Related

File truncated on upload to GCS

I am uploading a relatively small(<1 MiB) .jsonl file on Google CLoud storage using the python API. The function I used is from the gcp documentation:
def upload_blob(key_path,bucket_name, source_file_name, destination_blob_name):
"""Uploads a file to the bucket."""
# The ID of your GCS bucket
# bucket_name = "your-bucket-name"
# The path to your file to upload
# source_file_name = "local/path/to/file"
# The ID of your GCS object
# destination_blob_name = "storage-object-name"
storage_client = storage.Client.from_service_account_json(key_path)
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(destination_blob_name)
blob.upload_from_filename(source_file_name)
print(
"File {} uploaded to {}.".format(
source_file_name, destination_blob_name
)
)
The issue I am having is that the .jsonl file is getting truncated at 9500 lines after the upload. In fact, the 9500th line is not complete. I am not sure what the issue is and don't think there would be any limit for this small file. Any help is appreciated.
I had a similar problem some time ago. In my case the upload to bucket was called inside a with python clause right after the line where I recorded contents to source_file_name, so I just needed to move the upload line outside the with in order to properly recorded and close local file to be uploaded.

Django, Store jpg file received as string in http POST

I am receiving an http request from a desktop application with a screenshot. I cannot speak with the developer or see source code, so all I have is the http request I am getting.
The file isn't in request.FILES, it is in request.POST.
#csrf_exempt
def create_contract_event_handler(request, contract_id, event_type):
keyboard_events_count = request.POST.get('keyboard_events_count')
mouse_events_count = request.POST.get('mouse_events_count')
screenshot_file = request.POST.get('screenshot_file')
barr2 = bytes(screenshot_file.encode(encoding='utf8'))
with open('.test/output.jpeg', 'wb') as f:
f.write(barr2)
f.close()
The file is corrupted.
The binary starts like this, I don't know if that helps:
����JFIFHH��C
%# , #&')*)-0-(0%()(��C
(((((((((((((((((((((((((((((((((((((((((((((((((((�� `"��
Also, if I try to open the image with PIL, I get the following error:
from PIL import Image
im = Image.open('./test/output.jpg')
#OSError: cannot identify image file './test/output.jpg'
Finally, I managed to touch the code in the other hand, the 'filename' was missing in the header and for that reason I was getting the file in the POST instead of in the FILES dictionary.

Django FileResponse - How to speed up file download

I have a setup that lets users download files that are stored in the DB as BYTEA data. Everything works OK, except the download speed is very slow...it seems to download in 33KB chunks, one chunk per second.
Is there a setting I can specify to speed this up?
views.py
from django.http import FileResponse
def getFileResponse(filedata, filename, filesize, contenttype):
response = FileResponse(filedata, content_type=contenttype)
response['Content-Disposition'] = 'attachment; filename=%s' % filename
response['Content-Length'] = filesize
return response
return getFileResponse(
filedata = myfile.filedata, # Binary data from DB
filename = myfile.filename + myfile.fileextension,
filesize = myfile.filesize,
contenttype = myfile.filetype
)
Previously, I had the binary data returned as an HttpResponse and it downloaded like a normal file, with normal speeds. This worked fine locally, but when I pushed to Heroku, it wouldn't download the file -- instead displaying <Memory at XXX> in the download file.
And another side issue...when I include a text file with non-ASCII data (i.e. á), I get an error as well:
UnicodeEncodeError: 'ascii' codec can't encode characters...: ordinal not in range(128)
How can I handle files with Unicode data?
Update
Anyone know why the download speed gets so slow when changing from HTTPResponse to FileResponse? Or alternatively, why the HTTPResponse to return a file doesn't work on Heroku?
Update - Google Drive
I re-worked my application and hooked it up with a Google Drive back-end for serving files. It employs BytesIO() suggested by Eric below:
def download_file(self, fileid, mimetype=None):
# Get binary file data
request = self.get_file(fileid=fileid, mediaflag=True)
stream = io.BytesIO()
downloader = MediaIoBaseDownload(stream, request)
done = False
# Retry if we received HTTPError
for retry in range(0, 5):
try:
while done is False:
status, done = downloader.next_chunk()
print("Download %d%%." % int(status.progress() * 100))
return stream.getvalue()
except (HTTPError) as error:
return ('API error: {}. Try # {} failed.'.format(error.response, retry))
I think the difference you observe between HttpResponse vs. FileResponse is caused by the spec: https://www.python.org/dev/peps/pep-3333/#buffering-and-streaming
In your previous code, an HttpResponse was created with one huge byte string containing your whole file, and the first iteration pass returned the complete response body. With a a FileResponse, the file is iterated in chunks (of 4kb, 8kb or other depending on your WSGI app server), which (I think) are streamed immediately upstream (to the reverse proxy then client), which may add overhead (more communication over process boundaries?).
It would help to know the app server used (uwsgi, gunicorn, waitress, other) and its relevant config. Also more details about the heroku error in case that can be solved!
why you store whole file in database.
best case is to store file on hard and store only path on database
then according to your web server you can let web server to serve file.
web services serve file better than Django.
if files have no access check store them on media
if your files have access control you according to your web server you can use some response headers
if you use Nginx must use X-Accel-Redirect and use any alternative on other web services tutorial on https://wellfire.co/learn/nginx-django-x-accel-redirects/

Generate thumbnail for inmemory uploaded video file

The client app uploaded a video file and i need to generate a thumbnail and dump it to AWS s3 and return the client the link to the thumbnail.
I searched around and found ffmpeg fit for the purpose.
The following was the code i could come up with:
from ffmpy import FFmpeg
import tempfile
def generate_thumbnails(file_name):
output_file = tempfile.NamedTemporaryFile(suffix='.jpg', delete=False, prefix=file_name)
output_file_path = output_file.name
try:
# generate the thumbnail using the first frame of the video
ff = FFmpeg(inputs={file_name: None}, outputs={output_file_path: ['-ss', '00:00:1', '-vframes', '1']})
ff.run()
# upload generated thumbnail to s3 logic
# return uploaded s3 path
except:
error = traceback.format_exc()
write_error_log(error)
finally:
os.remove(output_file_path)
return ''
I was using django and was greeted with permission error for the above.
I found out later than ffmpeg requires the file to be on the disk and doesn't just take into account the InMemory uploaded file (I may be wrong as i assumed this).
Is there a way to read in memory video file likes normal ones using ffmpeg or should i use StringIO and dump it onto a temp. file?
I prefer not to do the above as it is an overhead.
Any alternative solution with a better benchmark also would be appreciated.
Thanks.
Update:
To save the inmemory uploaded file to disk: How to copy InMemoryUploadedFile object to disk
One of the possible ways i got it to work were as follows:
Steps:
a) read the InMemory uploaded file onto a temp file chunk by chunk
temp_file = tempfile.NamedTemporaryFile(suffix='.mp4', delete=False)
temp_file_path = temp_file.name
with open(temp_file_path, 'wb+') as destination:
for chunk in in_memory_file_content.chunks():
destination.write(chunk)
b) generate thumbnail using ffmpeg and subprocess
ffmpeg_command = 'ffmpeg -y -i {} -ss 00:00:01 vframes 1 {}'.format(video_file_path, thumbail_file_path
subprocess.call(ffmpeg_command, shell=True)
where,
-y is to overwrite the destination if it already exists
00:00:01 is to grab the first frame
More info on ffmpeg: https://ffmpeg.org/ffmpeg.html

Django Tweepy can't access Amazon S3 file

I'm using Tweepy, a tweeting python library, django-storages and boto. I have a custom manage.py command that works correctly locally, it gets an image from the filesystem and tweets that image. If I change the storage to Amazon S3, however, I can't access the file. It gives me this error:
raise TweepError('Unable to access file: %s' % e.strerror)
I tried making the images in the bucket "public". Didn't work. This is the code (it works without S3):
filename = model_object.image.file.url
media_ids = api.media_upload(filename=filename) # ERROR
params = {'status': tweet_text, 'media_ids': [media_ids.media_id_string]}
api.update_status(**params)
This line:
model_object.image.file.url
Gives me the complete url of the image I want to tweet, something like this:
https://criptolibertad.s3.amazonaws.com/OrillaLibertaria/195.jpg?Signature=xxxExpires=1467645897&AWSAccessKeyId=yyy
I also tried constructing the url manually, since it is a public image stored in my bucket, like this:
filename = "https://criptolibertad.s3.amazonaws.com/OrillaLibertaria/195.jpg"
But it doesn't work.
¿Why do I get the Unable to access file error?
The source code from tweepy looks like this:
def media_upload(self, filename, *args, **kwargs):
""" :reference: https://dev.twitter.com/rest/reference/post/media/upload
:allowed_param:
"""
f = kwargs.pop('file', None)
headers, post_data = API._pack_image(filename, 3072, form_field='media', f=f) # ERROR
kwargs.update({'headers': headers, 'post_data': post_data})
def _pack_image(filename, max_size, form_field="image", f=None):
"""Pack image from file into multipart-formdata post body"""
# image must be less than 700kb in size
if f is None:
try:
if os.path.getsize(filename) > (max_size * 1024):
raise TweepError('File is too big, must be less than %skb.' % max_size)
except os.error as e:
raise TweepError('Unable to access file: %s' % e.strerror)
Looks like Tweepy can't get the image from the Amazon S3 bucket, but how can I make it work? Any advice will help.
The issue occurs when tweepy attempts to get file size in _pack_image:
if os.path.getsize(filename) > (max_size * 1024):
The function os.path.getsize assumes it is given a file path on disk; however, in your case it is given a URL. Naturally, the file is not found on disk and os.error is raised. For example:
# The following raises OSError on my machine
os.path.getsize('https://criptolibertad.s3.amazonaws.com/OrillaLibertaria/195.jpg')
What you could do is to fetch the file content, temporarily save it locally and then tweet it:
import tempfile
with tempfile.NamedTemporaryFile(delete=True) as f:
name = model_object.image.file.name
f.write(model_object.image.read())
media_ids = api.media_upload(filename=name, f=f)
params = dict(status='test media', media_ids=[media_ids.media_id_string])
api.update_status(**params)
For your convenience, I published a fully working example here: https://github.com/izzysoftware/so38134984