I'm using Tweepy, a tweeting python library, django-storages and boto. I have a custom manage.py command that works correctly locally, it gets an image from the filesystem and tweets that image. If I change the storage to Amazon S3, however, I can't access the file. It gives me this error:
raise TweepError('Unable to access file: %s' % e.strerror)
I tried making the images in the bucket "public". Didn't work. This is the code (it works without S3):
filename = model_object.image.file.url
media_ids = api.media_upload(filename=filename) # ERROR
params = {'status': tweet_text, 'media_ids': [media_ids.media_id_string]}
api.update_status(**params)
This line:
model_object.image.file.url
Gives me the complete url of the image I want to tweet, something like this:
https://criptolibertad.s3.amazonaws.com/OrillaLibertaria/195.jpg?Signature=xxxExpires=1467645897&AWSAccessKeyId=yyy
I also tried constructing the url manually, since it is a public image stored in my bucket, like this:
filename = "https://criptolibertad.s3.amazonaws.com/OrillaLibertaria/195.jpg"
But it doesn't work.
¿Why do I get the Unable to access file error?
The source code from tweepy looks like this:
def media_upload(self, filename, *args, **kwargs):
""" :reference: https://dev.twitter.com/rest/reference/post/media/upload
:allowed_param:
"""
f = kwargs.pop('file', None)
headers, post_data = API._pack_image(filename, 3072, form_field='media', f=f) # ERROR
kwargs.update({'headers': headers, 'post_data': post_data})
def _pack_image(filename, max_size, form_field="image", f=None):
"""Pack image from file into multipart-formdata post body"""
# image must be less than 700kb in size
if f is None:
try:
if os.path.getsize(filename) > (max_size * 1024):
raise TweepError('File is too big, must be less than %skb.' % max_size)
except os.error as e:
raise TweepError('Unable to access file: %s' % e.strerror)
Looks like Tweepy can't get the image from the Amazon S3 bucket, but how can I make it work? Any advice will help.
The issue occurs when tweepy attempts to get file size in _pack_image:
if os.path.getsize(filename) > (max_size * 1024):
The function os.path.getsize assumes it is given a file path on disk; however, in your case it is given a URL. Naturally, the file is not found on disk and os.error is raised. For example:
# The following raises OSError on my machine
os.path.getsize('https://criptolibertad.s3.amazonaws.com/OrillaLibertaria/195.jpg')
What you could do is to fetch the file content, temporarily save it locally and then tweet it:
import tempfile
with tempfile.NamedTemporaryFile(delete=True) as f:
name = model_object.image.file.name
f.write(model_object.image.read())
media_ids = api.media_upload(filename=name, f=f)
params = dict(status='test media', media_ids=[media_ids.media_id_string])
api.update_status(**params)
For your convenience, I published a fully working example here: https://github.com/izzysoftware/so38134984
Related
I am developing a webapplication with Flask as the backend and Nuxt JS as the frontend. I receive an image file from the frontend and can save it to my Flask directory structure locally. The file is ok and the images is being shown if I open it. Now i want to upload this image to AWS S3 instead of saving it to my disk. I use the boto3 SDK, here is my code:
Here is my save_picture method, that opens the image file and resizes it. I had the save method, but commented it out to avoid saving the file to disk as I want it only on S3.
def save_picture(object_id, form_picture, path):
if form_picture is None:
return None
random_hex = token_hex(8)
filename = form_picture.filename
if '.' not in filename:
return None
extension = filename.rsplit('.', 1)[1].lower()
if not allowed_file(extension, form_picture):
return None
picture_fn = f'{object_id}_{random_hex}.{extension}'
picture_path = current_app.config['UPLOAD_FOLDER'] / path / picture_fn
# resizing image and saving the small version
output_size = (1280, 720)
i = Image.open(form_picture)
i.thumbnail(output_size)
# i.save(picture_path)
return picture_fn
image_name = save_picture(object_id=new_object.id, form_picture=file, path=f'{object_type}_images')
s3 = boto3.client(
's3',
aws_access_key_id=current_app.config['AWS_ACCESS_KEY'],
aws_secret_access_key=current_app.config['AWS_SECRET_ACCESS_KEY']
)
print(file) # this prints <FileStorage: 'Capture.JPG' ('image/jpeg')>, so the file is ok
try:
s3.upload_fileobj(
file,
current_app.config['AWS_BUCKET_NAME'],
image_name,
ExtraArgs={
'ContentType': file.content_type
}
)
except Exception as e:
print(e)
return make_response({'msg': 'Something went wrong.'}, 500)
I can see the uploaded file in my S3, but it shows 0 B in size and if I download it, it says that it cannot be viewed.
I have tried different access policies in S3, as well as many tutorials online, nothing seems to help. Changing the version of S3 to v3 when creating the client breaks the whole system and the file is not being uploaded at all with an access error.
What could be the reason for this upload failure? I it the config of AWS or something else?
Thank you!
Thanks to #jarmod I tried to avoid the image processing and it worked. I am now resizing the image, saving it to disk, opening the saved image, not the initial file, and sending it to S3. I then delete the image on disk as I don't need it.
I have a large file around 6GB and using AWS lambda trigger to unzip the file when it's uploaded to an S3 bucket using Python and Boto3 but I am getting Memory Error while unzipping the file in buffer using ByteIO.
# zip file is in output dir
if '-output' in file.key:
# get base path other the zip file name
save_base_path = file.key.split('//')[0]
# starting unzip process
zip_obj = s3_resource.Object(bucket_name=source_bucket, key=file.key)
buffer = BytesIO(zip_obj.get()["Body"].read())
z = zipfile.ZipFile(buffer)
print(f'Unziping....')
for filename in z.namelist():
file_info = z.getinfo(filename)
try:
response = s3_resource.meta.client.upload_fileobj(
z.open(filename),
Bucket=target_bucket,
Key=f'{save_base_path}/{filename}'
)
except Exception as e:
print(e)
print('unziping process completed')
# deleting zip file after unzip
s3_resource.Object(source_bucket, file.key).delete()
my_bucket.delete_object()
print("iteration completed")
else:
print('Zip file invalid position')
s3_resource.Object(source_bucket, file.key).delete
print(f'{file.key} deleted...')
Issue 1
when I am reading bytes its give me Memory Error
I have set Memory to 10240(10GB) in the General Configuration of AWS lambda function
Issue 2
I want to delete Object from s3 it runs code properly and doesn't give any error but also it is not deleting the file
is there any solution through which I can solve my Unzip issue
It is possible to wrap both the reading of the file in a small wrapper so that the entire zip file does not need to be downloaded from S3. From there it's straight forward enough to upload completed files back to S3 without keeping the entire contents in RAM:
# Download a zip file from S3 and upload it's unzipped contents back
# to S3
def s3_zip_to_s3(source_bucket, source_key, dest_bucket, dest_prefix):
s3 = boto3.client('s3')
# Use the S3Wrapper class to avoid having to transfer the entire
# file into RAM
with zipfile.ZipFile(S3Wrapper(s3, source_bucket, source_key)) as zip:
for name in zip.namelist():
print(f"Uploading {name}...")
# Use upload_fileobj to only stream to S3
s3.upload_fileobj(zip.open(name, 'r'), dest_bucket, dest_prefix + name)
# Create a file like object with a bare-bones implementation
# for reading only. Other than caching the file size, data is read
# from S3 for each call
class S3Wrapper:
def __init__(self, s3, bucket, key):
self.s3 = s3
self.bucket = bucket
self.key = key
self.pos = 0
self.length = s3.head_object(Bucket=bucket, Key=key)['ContentLength']
def seekable(self):
return True
def seek(self, offset, whence=0):
if whence == 0:
self.pos = offset
elif whence == 1:
self.pos += offset
else:
self.pos = self.length + offset
def tell(self):
return self.pos
def read(self, count=None):
if count is None:
resp = self.s3.get_object(Bucket=self.bucket, Key=self.key, Range=f'bytes={self.pos}-')
else:
resp = self.s3.get_object(Bucket=self.bucket, Key=self.key, Range=f'bytes={self.pos}-{self.pos+count-1}')
data = resp['Body'].read()
self.pos += len(data)
return data
While this works, this technique uses much less than the size of the zip file in my testing, it's not fast by any means.
I'd probably recommend some solution like a worker on EC2 or ECS to do the work for you.
I am receiving an http request from a desktop application with a screenshot. I cannot speak with the developer or see source code, so all I have is the http request I am getting.
The file isn't in request.FILES, it is in request.POST.
#csrf_exempt
def create_contract_event_handler(request, contract_id, event_type):
keyboard_events_count = request.POST.get('keyboard_events_count')
mouse_events_count = request.POST.get('mouse_events_count')
screenshot_file = request.POST.get('screenshot_file')
barr2 = bytes(screenshot_file.encode(encoding='utf8'))
with open('.test/output.jpeg', 'wb') as f:
f.write(barr2)
f.close()
The file is corrupted.
The binary starts like this, I don't know if that helps:
����JFIFHH��C
%# , #&')*)-0-(0%()(��C
(((((((((((((((((((((((((((((((((((((((((((((((((((�� `"��
Also, if I try to open the image with PIL, I get the following error:
from PIL import Image
im = Image.open('./test/output.jpg')
#OSError: cannot identify image file './test/output.jpg'
Finally, I managed to touch the code in the other hand, the 'filename' was missing in the header and for that reason I was getting the file in the POST instead of in the FILES dictionary.
I’m doing simple operation to of downloading the gzip files from S3 bucket to the local directory. I’m extracting those into another local directory and then uploading them back to S3 bucket again into archive folder path. While doing this operation I want to make sure I am processing same set of files that I initially download from S3 bucket which is (f_name) in below code. Now, below code is not uploading those back to S3 , that’s where I’m stuck. But able to download from S3 and extract it into local directory. Can you please help me understand what is wrong with the _uploadFile function?
from boto.s3.connection import S3Connection
from boto.s3.key import *
import os
import os.path
aws_bucket= "event-logs-dev” ## S3 Bucket name
local_download_directory= "/Users/TargetData/Download/test_queue1/“ ## local directory to download the gzip files from S3.
Target_directory_to_extract = "/Users/TargetData/unzip” ##local directory to gunzip the downloaded files.
Target_s3_path_to_upload= "event-logs-dev/data/clean/xact/logs/archive/“ ## S3 bucket path to upload the files.
def decompressAllFilesFromNetfiler(self,aws_bucket,local_download_directory,Target_d irectory_to_extract,Target_s3_path_to_upload):
zipFiles = [f for f in os.listdir(local_download_directory) if re.match(r'.*\.tar\.gz', f)]
for f_name in zipFiles:
if os.path.exists(Target_directory_to_extract+"/"+f_name[:-len('.tar.gz')]) and os.access(Target_directory_to_extract+"/"+f_name[:-len('.tar.gz')], os.R_OK):
print ('File {} already exists!'.format(f_name))
else:
f_name_with_path = os.path.join(local_download_directory, f_name)
os.system('mkdir -p {} && tar vxzf {} -C {}'.format(Target_directory_to_extract, f_name_with_path, Target_directory_to_extract))
print ('Extracted file {}'.format(f_name))
self._uploadFile(aws_bucket,f_name,Target_s3_path_to_upload,Target_directory_to_extract)
def _uploadFile(self, aws_bucket, f_name,Target_s3_path_to_upload,Target_directory_to_extract):
full_key_name = os.path.expanduser(os.path.join(Target_s3_path_to_upload, f_name))
path = os.path.expanduser(os.path.join(Target_directory_to_extract, f_name))
try:
print "Uploaded extracted file to: %s" % (full_key_name)
key = aws_bucket.new_key(full_key_name)
key.set_contents_from_filename(path)
except:
if full_key_name is None:
print "Error uploading”
Currently, the output prints that Uploaded extracted file to: event-logs-dev/data/clean/xact/logs/archive/1442235602129200000.tar.gz, but nothing is uploaded to S3 bucket. Your help is greatly appreciated!! Thank you in advance!
It appears that you have cut and pasted parts of your code - and maybe formatting was lost as your code above will not work as pasted. I've taken the liberty to make it PEP8 (mostly) however there is still some missing code to create the S3 objects. Since your import the modules, I presume that you have that section of code and just didn't paste it.
here is a cleaned up version of your code formatted correctly. I also added a Exception code to your try: block to print out the error you get. You should update the Exception to be more specific to the Exceptions thrown for make_key or set_contents_... but the general Exception will get you started. If nothing more this is more readable, but you should include your S3 connection code too - and remove anything that is specific to your domain (e.g. keys, trade secrets, etc).
#!/usr/bin/env python
"""
do some download
some extract
and some upload
"""
from boto.s3.connection import S3Connection
from boto.s3.key import *
import os
import os.path
aws_bucket = 'event-logs-dev'
local_download_directory = '/Users/TargetData/Download/test_queue1/'
Target_directory_to_extract = '/Users/TargetData/unzip'
Target_s3_path_to_upload = 'event-logs-dev/data/clean/xact/logs/archive/'
'''
MUST BE SOME MAGIC HERE TO GET AN S3 CONNECTION ???
aws_bucket IS NOT A BUCKET OBJECT ...
'''
def decompressAllFilesFromNetfiler(self,
aws_bucket,
local_download_directory,
Target_directory_to_extract,
Target_s3_path_to_upload):
'''
decompress stuff
'''
zipFiles = [f for f in os.listdir(
local_download_directory) if re.match(r'.*\.tar\.gz', f)]
for f_name in zipFiles:
if os.path.exists(
"{}/{}".format(Target_directory_to_extract,
f_name[:len('.tar.gz')])) and os.access(
"{}/{}".format(Target_directory_to_extract,
f_name[:len('.tar.gz')])) and os.R_OK:
print ('File {} already exists!'.format(f_name))
else:
f_name_with_path = os.path.join(local_download_directory, f_name)
os.system('mkdir -p {} && tar vxzf {} -C {}'.format(
Target_directory_to_extract,
f_name_with_path,
Target_directory_to_extract))
print ('Extracted file {}'.format(f_name))
self._uploadFile(aws_bucket,
f_name,
Target_s3_path_to_upload,
Target_directory_to_extract)
def _uploadFile(self,
aws_bucket,
f_name,
Target_s3_path_to_upload,
Target_directory_to_extract):
full_key_name = os.path.expanduser(os.path.join(Target_s3_path_to_upload,
f_name))
path = os.path.expanduser(os.path.join(Target_directory_to_extract, f_name))
try:
S3CONN = S3Connection()
BUCKET = S3CONN.get_bucket(aws_bucket)
key = BUCKET.new_key(full_key_name)
key.set_contents_from_filename(path)
print "Uploaded extracted file to: {}".format(full_key_name)
except Exception as UploadERR:
if full_key_name is None:
print 'Error uploading'
else:
print "Error : {}".format(UploadERR)
I have written a custom filefield AudioFileField. For this i created a check if a file really is a valid audiofile. To be able to do that, i use the sox commandlinetool, so i have to create a file on disk first. As sox depends on the suffix to do that validation, i needed to write my own TemporaryUploadedAudioFile, using the original suffix (instead of .upload):
class TemporaryUploadedAudioFile(TemporaryUploadedFile):
"""
A file uploaded to a temporary location (i.e. stream-to-disk).
"""
def __init__(self, name, content_type, size, charset, suffix='.upload'):
"""
The init method overrides the name creation to allow passing
an extension, so that sox is able to test the file
"""
if settings.FILE_UPLOAD_TEMP_DIR:
file = tempfile.NamedTemporaryFile(suffix=suffix,
dir=settings.FILE_UPLOAD_TEMP_DIR)
else:
file = tempfile.NamedTemporaryFile(suffix=suffix)
super(TemporaryUploadedFile, self).__init__(file, name, content_type, size, charset)
That file i use to do the audiovalidation in the AudioFileForm to_python method:
def to_python(self, data):
"""
checks that the file-upload field data contains a valid audio file.
"""
f = super(AudioFileForm, self).to_python(data)
if f is None:
return None
# get the file suffix, sox needs this to be able to test the file
suffix = os.path.splitext(data.name)[1]
# We need to get a temporary file for sox. Even if we allready have a temporary
# file, we have to create a new one ending with the correct suffix
file = TemporaryUploadedAudioFile(data.name, data.content_type, 0, data.charset,suffix = suffix)
with open(file.temporary_file_path(), 'w') as f:
f.write(data.read())
# Do the validation of the audiofile.
filetype=subprocess.Popen([sox,'--i','-t','%s'%file.temporary_file_path()], shell=False, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
filetype=filetype.communicate()[0]
filetype=filetype.replace('\n','')
if not filetype in ['wav','aiff','flac']:
raise forms.ValidationError('Not a valid audiofile (valid are: aif, flac & wav | 16 or 24 bit | 44.1 or 48 kHz)')
return data
Now to the strange things happening: this works like a charm on the development server, but as soon as i switch to apache2/mod_wsgi it stops working. sox returns an error telling me that the file is missing.
I have allready checked rights, tmp-location on the production server is /tmp, all rights are granted there (777). What else could be happening here?
mod-wsgi is known to have problems with standard output and this subprocess thingy with django. There are already lot of questions answered about this in stackoverflow.com.
A quick Google search should help you!