Django upload doc to S3 and generate PDF preview - django

I'm currently using Django regular upload. After a doc is uploaded, a celery task is executed to get it converted to PDF using libreoffice. Both files are publicly accessible on /media.
Now I want to start using S3 to store my files and make them private. Here are my options:
Upload doc to S3. Then have the celery task connect to S3, download the file, convert to PDF, upload and delete from local storage.
Upload doc to regular storage, then have the celery task make the conversion, upload both files to S3 and then delete.
Restriction: the S3 bucket is not public. I'm planning on authenticating the user before downloading, and then redirecting to the bucket with a temporary valid access key.
Do I have any other options? I haven't actually tried this yet, but it seems I'm going to need multiple django storages if I choose 2., one for uploading (local) and one for downloading (S3). Is this even possible?

Related

S3 Static site downloads index.html after uploading files

I have a static site that I served to s3 called tidbitstatistics.com
I wrote a script using boto3 to replace the files with new ones and since then, my site doesn't open - instead it downloads the index.html file.
From what I can tell, I didn't change any settings. The site was working fine before I re-uploaded the files. Since then, I deleted all the files and re-uploaded them manually, but I am still running into the same error.
I thought this might have to do with the file types, but they were the correct text/html file types when re-uploading manually and I am adjusting my script to specify file types when calling put_object instead of upload_file with boto3.
Static site hosting is turned on for that bucket and public permissions to read are set. I'm just not sure how s3 all of a sudden won't serve my static site.
I followed the answer here, but I don't see a Content-Disposition property.
Any help would be appreciated - web development is not my strong suit!

Migrating from Client-Django-S3 image/file upload to Client-S3 Presigned URL upload, while maintaining FileField/ImageField?

The current state of our app is as follows: A client makes a POST request to our Django + DRF app server with one or more files, the django server processes the files, then uploads and saves it to S3. This is done using the amazing django-storages & boto3 libraries. We eventually will have the reference url to our S3 files in our Database.
A very simplified example looks like this:
# models.py
class TestImageModel(models.Model):
image = models.ImageField(upload_to='<path_in_bucket>', storage=S3BotoStorage())
# serializers.py
class TestImageSerializer(serializers.ModelSerializer):
image = serializers.ImageField(write_only=True)
The storages library handles uploading to S3 and calling something like:
TestImageModel.objects.first().image.url
will return the reference to the image url in S3 (technically in our case it will be cloudfront URL since we use it as a CDN, and set the custom_domain in S3BotoStorage class)
This was the initial approach we took, but we are noticing heavy server memory usage due to the fact that images are first uploaded to our server, then uploaded again to S3. To scale efficiently, we would like to move to an approach where to instead upload directly from Client to S3 using presigned URLs. I found documentation on how to do so here: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/s3-presigned-urls.html.
The new strategy I will adopt for image upload:
Client requests Presigned url from Django API
Client uploads to S3 directly
Django updates reference to uploaded image in database with ImageField.
My question is about 3). How can I tell the django record that an already uploaded image in S3 should be the location of the ImageField, without having to reupload the image through S3BotoStorage class?
An alternative could be to change the ImageField to a URLField, and just store the link of the new image, but this will disallow me from using the features of an ImageField (Forms in django admin, deleting directly from S3 using the storage class .delete(), etc).
How can I update the ImageField to point to an existing file in the same storage, or is there a better way to go about moving to a direct Client-S3 upload strategy with Django and ImageField?
So apparantly you can just do something like this:
a = TestImageModel.objects.first()
a.image.name = 'path_to_image'
a.save()
And everything works perfectly :)

Migrating Django app to use online media storage

I have a Django app running on a server. Currently user uploads are stored on the server filesystem.
I already know how to set up S3 storage. What is the best way to migrate existing uploads to S3 without breaking the API and having existing uploads still available?
These files are served to the front end in two ways:
Directly through a /media/path/to/upload endpoint:
/media/<path> django.views.static.serve
Through a viewset:
/api/v1/users/<user_pk>/items/<item_pk>/attachments/<pk>/ project.app.views.ItemAttachmentsViewSet
Does the following make sense:
Change the storage backend
Go through all model objects
Save each model object to get the files uploaded
Have /media/path go to a new view that will serve the files similar to how ItemAttachmentsViewSet does it.
? Or is there a better way?
The procedure outlined in the question was what I ended up doing, with the exception of step 4 which turned out to be unnecessary.

Django google app engine No such file or directory

I have a Django 2.x with python 3.6 site in Google Cloud, the app is in app engine flex. (my first app :)
My app has an upload page, where I am asking the user upload a JSON file (that is never kept in the site), what I do is open it and generate another file from it
I know that django depending on the size of the file it goes into memory but I was never able to use this functionality, so what I did in local env, was creating a folder that I called, temp_reports, so I created the files here, uploaded them into a bucket and then deleted them, from temp_reports.
So I was thinking, as the site is already in gcloud, if I can directly create these files into the bucket? or do I still need to generate them in the site and then upload them?
Now if it is from my site I keep getting the following error:
Exception Value:
[Errno 2] No such file or directory: '/home/vmagent/app/temp_reports/file_516A3E1B80334372ADB440681BB5F030.xlsx
I had in my app.yaml
handlers:
- url: /temp_reports
static_dir: temp_reports
Is there something I am missing? in order to use temp_reports?
Or how can I create a file directly into my bucket?
You can certainly use the Storage Bucket without having to upload the file manually. This can be done by Google Cloud Storage client library (Preferred Method) . It allows you to store and retrieve data directly from the Storage Bucket. Secondly, you can use Cloud Storage API to do the same functionality but requires more efforts to set it up.
You want to use the upload_from_string method from google.cloud.storage.blob.Blob.
upload_from_string(data, content_type='text/plain', client=None,
predefined_acl=None)
So to create a text file directly on the bucket you could do this:
storage_client = storage.Client()
bucket = storage_client.get_bucket(‘mybucket’)
blob = bucket.blob(‘mytextfile.txt’)
blob.upload_from_string('Text file contents', content_type='text/plain')
For more information you can refer to the following page:
https://googleapis.github.io/google-cloud-python/latest/storage/blobs.html#google.cloud.storage.blob.Blob.upload_from_string

django internal file sharing with privacy

I am trying to write a job board / application system and I need the ability for clients to upload a CV and then share it with employers but I can't figure out the best way to do this. The CV needs to be kept private except for who it is shared with and there needs to be the ability for clients to update the cv after submitting it to an employer.
Is there a django app that does this already, or how would I go about setting up the privacy, file sharing etc so that the files can be copied and still private to just those shared with?
Use Apache's x-sendfile, for an example see: Having Django serve downloadable files
Store the files in a private folder. Django authorizes the request and let Apache serve the file using the x-sendfile header.
Use S3, and django-storages.
Upload the CV to S3, with the file set as private.
Create a view which will fetch a given CV from the S3 bucket, producing an "expiring URL", or that will just fetch the raw data from S3 and pass it through to the user through a view.
The file's privacy is completely controlled this way.
You could also do this by storing the uploaded file outside of your projects STATICs directory (which is assumed to be publicly accessible), and doing step 3 for that.
Or, if you want to make a DBA's head explode, store the CV as a BLOB in the database and use a view in the same way.