I have an app built in Django and currently deployed on Google App Engine. Everything works fine except for when I want to upload files larger than 32MB. I get an error which says, 413. That’s an error.
I have been doing some research and I've come to realize that I have to use Google App Engine's Blobstore API. I have no idea on how to implement that on my Django App.
Currently my code looks something like this:
Model:
class FileUploads(models.Model):
title = models.CharField(max_length=200)
file_upload = models.FileField(upload_to="uploaded_files/", blank=True, null=True)
Form:
class UploadFileForm(forms.ModelForm):
class Meta:
model = FileUploads
fields = ["title", "file_upload"]
View:
def upload_file(request):
if request.method == "POST":
form = UploadFileForm(request.POST, request.FILES)
if form.is_valid():
form.save()
form = UploadFileForm()
return render(request, "my_app/templates/template.html", {"form": form})
Everything works fine. I would just like to know how to implement Google App Engine's Blobstore API on my current code structure to enable large file uploads.
From Google Cloud Official Documentation:
There exists limits that apply specifically to the use of the Blobstore API.
The maximum size of Blobstore data that can be read by the application with one API call is 32 megabytes.
The maximum number of files that can be uploaded in a single form POST is 500.
Your code looks fine, and this is an expected error due to Blobstore API quotas and limits. One way would be to split up your file sizes, which should not exceed 32 MB, and make multiple API calls when uploading larger files. Another solution would be to upload directly to Google Cloud Storage.
Hope this clarifies your question.
Related
I have a Django app where users can upload PDF files. The PDF files will be saved on my cloud provider. After successfully submitting the PDF, I want to send an email to the user with the URL to the PDF on my cloud. I've been trying to do it by overriding form_valid() but at that point, the URL is not yet generated. The URL also isn't hardcoded, so I can't just point to a hard coded URL in form_valid()
Any ideas on how to solve this?
Can you please provide us the code within form_valid() ? You need to insert your logic after super.form_valid() e.g.:
def form_valid(self, form):
ret = super().form_valid(form)
instance = form.instance
# get the filename and send an email
...
return ret
I cant seem to understand how is it possible that for GCS the authenticated URL shows a different image then the public URL ?
Im uploading the images via a python django script
def upload_to_cloud(blob_name, file_obj):
file_type = imghdr.what(file_obj)
blob_name = str(blob_name) + '.' + file_type # concatenate string to create 'file_name.format'
stats = storage.Blob(bucket=bucket, name=blob_name).exists(client) # check if logo with the same reg.nr exists
if stats is True: # if exists then delete before uploading new logo
storage.Blob(bucket=bucket, name=blob_name).delete()
blob = bucket.blob(blob_name)
blob.upload_from_file(file_obj=file_obj, content_type=f'image/{file_type}')
path = blob.public_url
return path
class CompanyProfile(SuccessMessageMixin, UpdateView): # TODO why company logo differs from the one in ads_list?
model = Company
form_class = CompanyProfileCreationForm
def form_valid(self, form):
"""
Check if user uploaded a new logo. If yes
then upload the new logo to google cloud
"""
if 'logo' in self.request.FILES:
blob_name = self.request.user.company.reg_nr # get company registration number
file_obj = self.request.FILES['logo'] # store uploaded file in variable
form.instance.logo_url = upload_to_cloud(blob_name, file_obj) # update company.logo_url with path to uploaded file
company = Company.objects.get(pk=self.request.user.company.pk)
company.save()
return super().form_valid(form)
else:
return super().form_valid(form)
Any ideas on what Im doing wrong and how its even possible? The file that I actually uploaded is the one under authenticated url. The file thats under public url is a file that I uploaded for a different blob
EDIT
Im adding screenshot of the different images because after some time the images appear to be the same as they should be. Some people are confused by this and comment that the images are the same after all
Public URL
Authenticated URL
Note that caching issue is ruled out since I sent the public URL to my friend and he also saw that the image is the HTML text although the image in the authenticated URL (the correct image) was a light bulb. He also noted that the URL preview in fb messenger showed the light bulb image but when he actually opened the URL the HTML text image appeared
This problem persists in case a file is uploaded with the same blob name. This happens regardless if its overwritten by gcs or if I previously execute blob delete function and then create a new file with the same name as the deleted blob.
In general the same object will be served by storage.googleapis.com and storage.cloud.google.com.
The only exception is if there is some caching (either in your browser, in a proxy, with Cloud CDN or in GCS). If you read the object via storage.cloud.google.com before uploading a new version, then reading after by storage.cloud.google.com may serve the old version while storage.googleapis.com returns the new one. Caching can also be location dependent.
If you can't allow an hour of caching, set Cache control to no-cache.
There are a few tiny related questions buried in here, but they really point to one big, hairy best practice question. This is kind of a tough feature to implement because it's supposed to do a couple tricky things at once...
drag-and-drop multi-file uploader (via Javascript)
multi-page form (page one: upload and associate files with an existing document model;
page two: update and save file/document objects and meta-data to database)
...and I haven't found a pre-existing code sample or implementation anywhere. (Depending on one's approach, it could sweep off the table or automagically answer all the related/embedded/follow-on questions.) Bottom-line, the purpose of this post is to answer this question: What's the most elegant approach which minimizes the intervening questions/problems?
I'm using this implementation of a drag-and-drop JQuery File Uploader in Django to upload files...
https://github.com/miki725/Django-jQuery-File-Uploader-Integration-demo
The solution I link to above saves files on the filesystem, of course, but in batches per upload session, via creating a directory for each batch of files, and then assigning a UUID to each of those directories. Each uniquely named directory on the filesystem contains files uploaded during that particular upload session. That means any sort of database storage method first has to tease apart and iterate over all the files in the filesystem directory created for each upload session by this solution.
Note: the JQuery solution linked to above doesn't use a form (in forms.py) inside the app directory. The form is hardcoded into the template, which is already a bit of a bummer...'cause now I also have to find a nice way to bind each of the above files in each batch to a form.
I think the simplest--albeit perhaps least performant solution--is to create two views, for two forms...to save each file to the database in the view on the first page, and then update the database on the second page. Here's the direction I'm presently rolling in:
IN THE TEMPLATE...
...uploader javascripts in header...
<form action="{% url my_upload_handler %}" method="POST" enctype="multipart/form-data">
<input type="file" name="files[]" multiple
</form>
IN VIEWS.PY...
def my_upload_handler_0r_form_part_one(request):
# POST (in the upload handler; request triggered by an upload action)
if request.method == 'POST':
if not ("f" in request.GET.keys()):
...validators and exception handling...
...response_data, which is a dict...
uid = request.POST[u"uid"]
file = request.FILES[u'files[]']
filename = os.path.join(temp_path, str(uuid.uuid4()) + file.name)
destination = open(filename, "wb+")
for chunk in file.chunks():
destination.write(chunk)
destination.close()
response_data = simplejson.dumps([response_data])
response_type = "application/json"
# return the data to the JQuery uploader plugin...
return HttpResponse(response_data, mimetype=response_type)
# GET (in the same upload handler)
else:
return render_to_response('my_first_page_template.html',
{ <---NO 'form':form HERE
'uid': uuid.uuid4(),
},
context_instance = RequestContext(request))
def form_part_two(request):
#here I need to retrieve and update stuff uploaded on first page
return render_to_response('my_second_page_template.html',
{},
context_instance = RequestContext(request))
This view for the first page leverages the JQuery uploader, which works great for multi-file uploads per session and does what it's supposed to do. However, as hinted above, the view, as an upload handler, is only the first page in what needs to be a two page form. On page two, the end user would subsequently need to retrieve each uploaded file, attach additional data to the files they just uploaded on page one, and re-save to the database.
I've tried to make this work as a two-part form via various solutions, including form wizards and/or generic class based views...following examples mainly enabling data persistence via the session. These solutions get rather thorny very quickly.
In summary, I need to...
upload multiple files in a uniquely identified batch (via drag and drop)
tease apart and iterate over each batch of uploaded files
bind each file in the batch to a form and associate it with an existing document model
submit / save all of these files at once to the database
retrieve each of those files on the following page/template of a potentially new form
update metadata for each file
resubmit / save all of those files at once to the database
So...you can see how all of the above compounds the complexity of a simple file upload, and increases the complexity of providing the feature, by involving related questions like:
forms.py: how best to bind each file to a form
models.py: how to associate each file with a pre-existing document model
views.py how to save each file in accordance with pre-existing document model in Postgres in the first page; update and save each document in the second page
...and, again, I'd like to do all of that without a form wizard, and without class-based views. (CBVs, especially, for this use case elude me a bit.) In other words: I'm looking for advice leading toward the most bulletproof and easy to read/understand solution possible. If it causes multiple hits to the database, that's fine by me. (If saving a file to the database seems anti best practice, please see this other post: Storing file content in DB
Might I be able to just create a separate view for two forms, and subclass a standard upload form, like so...
In forms.py...
class FileUploadForm(forms.Form):
files = forms.FileField(widget=forms.ClearableFileInput(attrs={'name':'files[]', 'multiple':'multiple'}))
#how to iterate over files in list or batch of files here...?
file = forms.FileField()
file = forms.FileField()
def clean_file(self):
data = self.cleaned_data["file"]
# read, parse, and create `data_dict` from file...
# subclass pre-existing UploadModelForm
**form = UploadModelForm(data_dict)**
if form.is_valid():
self.instance = form.save(commit=False)
else:
raise forms.ValidationError
return data
...and then refactor the earlier upload handler above with something like...
In views.py, substituting the following for present upload handler...
def view_for_form_one(request):
...
# the aforementioned upload handler logic, plus...
...
form = FileUploadForm(request.POST, request.FILES)
if form.is_valid():
form.save()
else:
# display errors
pass
...
def view_for_form_two(request):
# update and commit all data here
...?
In general, with this type of problem, I like to create single page with one <form> on it, but multiple sections which the user progresses through with javascript.
Breaking a form into a multi-part, wizard-style form series is much easier with javascript, especially if the data it produces is dynamic in nature.
If you absolutely must break it out into multiple pages, I would advise you to set up your app to be able to save the data into the database at the end of each step.
You can do that by making the metadata which the user adds at step 2 a nullable field, or even moving the metadata to a separate model.
This is a follow up question for Django on Google App Engine: cannot upload images
I got part of the upload of images to GAE Blobstore working. Here's what I did:
In models.py I created a model PhotoFeature:
class PhotoFeature(models.Model):
property = models.ForeignKey(
Property,
related_name = "photo_features"
)
caption = models.CharField(
max_length = 100
)
blob_key = models.CharField(
max_length = 100
)
In admin.py I created an admin entry with an override for the rendering of the change_form to allow for insert of the correct action to the Blobstore upload url:
class PhotoFeatureAdmin(admin.ModelAdmin):
list_display = ("property", "caption")
form = PhotoFeatureForm
def render_change_form(self, request, context, *args, **kwargs):
from google.appengine.ext import blobstore
if kwargs.has_key("add"):
context['blobstore_url'] = blobstore.create_upload_url('/admin/add-photo-feature')
else:
context['blobstore_url'] = blobstore.create_upload_url('/admin/update-photo-feature')
return super(PhotoFeatureAdmin, self).render_change_form(request, context, args, kwargs)
As I use standard Django, I want to use the Django views to process the result once GAE has updated the BlobStore in stead of BlobstoreUploadHandler. I created the following views (as per the render_change_form method) and updated urls.py:
def add_photo_feature(request):
def update_photo_feature(request):
This all works nicely but once I get into the view method I'm a bit lost. How do I get the Blob key from the request object so I can store it with PhotoFeature? I use standard Django, not Django non-rel. I found this related question but it appears not to contain a solution. I also inspected the request object which gets passed into the view but could not find anything relating to the blob key.
EDIT:
The Django request object contains a FILES dictionary which will give me an instance of InMemoryUploadedFile. I presume that somehow I should be able to retrieve the blob key from that...
EDIT 2:
Just to be clear: the uploaded photo appears in the Blobstore; that part works. It's just getting the key back from the Blobstore that's missing here.
EDIT 3:
As per Daniel's suggestion I added storage.py from the djangoappengine project which contains the suggested upload handler and added it to my SETTINGS.PY. This results in the following exception when trying to upload:
'BlobstoreFileUploadHandler' object has no attribute 'content_type_extra'
This is really tricky to fix. The best solution I have found is to use the file upload handler from the djangoappengine project (which is associated with django-nonrel, but does not depend on it). That should handle the required logic to put the blob key into request.FILES, as you'd expect in Django.
Edit
I'd forgotten that django-nonrel uses a patched version of Django, and one of the patches is here to add the content-type-extra field. You can replicate the functionality by subclassing the upload handler as follows:
from djangoappengine import storage
class BlobstoreFileUploadHandler(storage.BlobstoreFileUploadHandler):
"""Handler that adds blob key info to the file object."""
def new_file(self, field_name, *args, **kwargs):
# We need to re-process the POST data to get the blobkey info.
meta = self.request.META
meta['wsgi.input'].seek(0)
fields = cgi.FieldStorage(meta['wsgi.input'], environ=meta)
if field_name in fields:
current_field = fields[field_name]
self.content_type_extra = current_field.type_options
super(BlobstoreFileUploadHandler, self).new_file(field_name,
*args, **kwargs)
and reference this subclass in your settings.py rather than the original.
I have a view to which I am trying to submit multiple ajax uploads via raw post data (e.g. via an octet-stream). These requests are submitted one after the other so that they process in parallel. The problem is that django thinks that only the last request is valid. For example, if I submit 5 files, the first four give:
Upload a valid image. The file you uploaded was either not an image or a corrupted image.
I'm guessing this occurs because somehow the requests overlap? And so the image isn't completely loaded before the form attempts to validate it?
And the last one works fine.
My upload view:
def upload(request):
form = UploadImageForm(request.POST, request.FILES)
print form
if form.is_valid():
# ..process image..
And my upload image form:
class UploadImageForm(forms.Form):
upload = forms.ImageField()
To submit the requests I'm using the html5uploader js pretty much right out of the box.
On a different not, have you tried https://github.com/blueimp/jQuery-File-Upload/ - is a pretty good non-flash based file uploader with progress bar.