I have successfully stored a Word document to S3 using Django-storages.
class Document(TitleSlugDescriptionModel, TimeStampedModel):
document = models.FileField(upload_to=user_directory_path)
Now in celery task, I need to download this file again for further processing in the worker.
Do I need to read the file from URL and then create the local copy explicitly or is there any way to create local copy using Django-storages?
You can read the file directly in Django using the read method as in
document.read()
this will output binary which you can save to a file using
f=open(filename, 'wb')
f.write(document.read())
f.close
You can also generate URLs to the file on S3 as well. More info on both in the docs: https://django-storages.readthedocs.io/en/latest/backends/amazon-S3.html
Related
I have uploaded files to s3 bucket with a UUID as a key for each file name ,
I have a requirement to keep the files key as the stored uuid but when download i need to have the downloaded file name as the actual file name eg: Foo.png
stored file on aws s3 -0e8221b9-9bf4-49d6-b0c0-d99e86f91f8e.png
Downloading file name should be : foo.bar
I have tried with setting Content-Disposition meta Data but still when downloading the file contains the uuid.
Perform the below changes and try.
Update Content-Disposition = attachment;filename="abc.csv". Please note the file name is case sensitive and if you are using CDN then it will take some time after you apply the changes. After you update the metadata then download the file using the OBJECT URL. Direct download is not working.. If I download the file using Object URL then the downloaded file name is abc.csv instead of test.csv.
I have a form in which files can be uploaded. the uploaded file has to be stored in azure-storage. I am using create_blob_from_path to upload a file to azure-storage.create_blob_from_path expects file path as one of the parameters. but how can I get file path in this case as the operation has to be in on the fly mode(The uploaded file cannot be stored in any local storage).it should get stored directly in Azure.
if request.method=="POST":
pic=request.FILES['pic']
block_blob_service = BlockBlobService(account_name='samplestorage', account_key='5G+riEzTzLmm3MR832NEVjgYxaBKA4yur6Ob+A6s5Qrw==')
container_name ='quickstart'
block_blob_service.create_container(container_name)
block_blob_service.set_container_acl(container_name, public_access=PublicAccess.Container)
block_blob_service.create_blob_from_path(container_name, pic, full_path_to_file)//full_path_to_file=?
the file uploaded dynamically has to be stored in Azure
If the uploaded file cannot be stored in any local storage, you can read the file content as stream or text(string), then use the create_blob_from_stream or create_blob_from_text method respectively.
Now, I have realized the uploading process is like that:
1. Generate the HTTP request object, and set the value to request.FILE by using uploadhandler.
2. In the views.py, the instance of FieldFile which is the mirror of FileField will call the storage.save() to upload file.
So, as you see, django always use the cache or disk to pass the data, if your file is too large, it will cost too much time.
And the design I want to figure this problem is to custom an uploadhandler which will call storage.save() by using input raw data. The only question is how can I modify the actions of FileField?
Thanks for any help.
you can use this package
Add direct uploads to AWS S3 functionality with a progress bar to file input fields.
https://github.com/bradleyg/django-s3direct
You can use one of the following packages
https://github.com/cloudinary/pycloudinary
http://django-storages.readthedocs.io/en/latest/backends/amazon-S3.html
I have been given access to a server that provides a directory listing of files which I will download and import into HDFS. What I am currently doing is hitting the server with an HTTP GET and downloading the HTML directory listing and then I use jsoup and parse all the links to the files which I need to download. Once I have a complete list I download each files one by one and then import each into HDFS. I don't believe that flume is able to read & parse html to download files. Is there an easier cleaner way to do what I am describing?
With Flume I would do the following:
1) have a process grep you URLs and store the dumped HTML file to a directory
2) Configure a SpoolDir source pointing to that directory with a customer deserializer:
deserializer LINE Specify the deserializer used to parse the file into events. Defaults to parsing each line as an event. The class specified must implement EventDeserializer.Builder.
that deserializer reads the HTML file and extracts the HTML file with JSoup. The extracted bits are then converted to multiple events in the desired format and sent to HDFSSink
That's basically it.
I have an image URL (for example: http://www.myexample.come/testImage.jpg) and I would to upload this image on Amazon S3 using Django.
I'm not found a way to copy directly the resource from URL in Amazon S3 passing directly the file URL.
So, I think that i have to implement these steps in my project:
Download the file locally from URL http://www.myexample.come/testImage.jpg. I will have a local file testImage.jpg
I have to upload the local file into Amazon S3. I will have a S3 Url.
I have to delete the local file testImage.jpg
Is this a good way to build this feature?
Is possible to improve these steps?
I have to use this features when I receive a REST request and I have to respond passing in the response the uploaded S3 File Url... Are these steps a good way about performance?
The easiest way off the top of my head would be to use requests with io from the python std lib -- this is a bit of code I used a while back, I just tested it with python 2.7.9 and it works
>>> requests_image('http://docs.python-requests.org/en/latest/_static/requests-sidebar.png')
and it works with the latest version of requests (2.6.0) - but I should point out that it's just a snippet, and I was in full control of the image urls being handed to the function, so there's nothing in the way of error checking (you could use Pillow to open the image and confirm it's really a jpeg, etc.)
import requests
from io import open as iopen
from urlparse import urlsplit
def requests_image(file_url):
suffix_list = ['jpg', 'gif', 'png', 'tif', 'svg',]
file_name = urlsplit(file_url)[2].split('/')[-1]
file_suffix = file_name.split('.')[1]
i = requests.get(file_url)
if file_suffix in suffix_list and i.status_code == requests.codes.ok:
with iopen(file_name, 'wb') as file:
file.write(i.content)
else:
return False