File download with nginx, django - django

I'm currently writing a small personal-use website served with nginx, utilizing uwsgi, django, and bootstrap. All is going smoothly except for one issue that I cannot figure out. I have download links (buttons) on the site, and when clicked, should initiate a file download. Here is the view that is executed when a button is pressed:
#login_required
def download_file(request):
'''
downloads a file given a specified UUID
'''
if request.method == 'GET':
file_uuid = request.GET['file_id']
file_row = Files.objects.get(uuid=file_uuid)
file_name = file_row.file_name
response = HttpResponse()
response['Content-Disposition'] = 'attachment; filename=%s' % file_name
response['X-Accel-Redirect'] = '/media/files/%s' % file_name
return response
else:
return redirect('/files')
/media/files is served directly by nginx as a internal location:
location /media/files/ {
internal;
alias /mnt/files/;
}
How this view is being called with an onclick event assigned to each button:
$('.download_btn').on('click',function(){
download_file(this.id);
})
function download_file(uuid){
$('.file_id').val(uuid);
$('.get_file').submit();
}
I have a form with a single hidden field. This gets set to the id (uuid) of the button that is pressed.
Pretty simple right? My issue is that when the download button is pressed, the download is not initiated correctly. The user is not prompted with a save dialog, nor does the file begin automatically downloading (Chrome or Safari). Instead, in debug tools, I can see the file downloading to what seems to be local storage in the browser, or some memory location (these are large files; > 1GB). I see the memory ballooning, and eventually the browser will crash. Any clue what I'm doing wrong here? Based on what I've been reading, this should be working without issue.

Related

how can I avoid time out in django

I am creating a site that downloads videos from other sites and converts them to GIF when requested.
The problem is that it takes too long to download and convert videos.
This causes a 504 timeout error.
How can I avoid timeout?
Currently, I am downloading using celery when we receive a request.
While downloading, django redirects right away.
def post(self, request):
form = URLform(request.POST)
ctx = {'form':form}
....
t = downloand_video.delay(data)
return redirect('gif_to_mp4:home')
This prevents transferring the files to the user.
Because celery cannot return file to user or response.
How can I send the file to the user?

google cloud storage images differ for authenticated url and public url

I cant seem to understand how is it possible that for GCS the authenticated URL shows a different image then the public URL ?
Im uploading the images via a python django script
def upload_to_cloud(blob_name, file_obj):
file_type = imghdr.what(file_obj)
blob_name = str(blob_name) + '.' + file_type # concatenate string to create 'file_name.format'
stats = storage.Blob(bucket=bucket, name=blob_name).exists(client) # check if logo with the same reg.nr exists
if stats is True: # if exists then delete before uploading new logo
storage.Blob(bucket=bucket, name=blob_name).delete()
blob = bucket.blob(blob_name)
blob.upload_from_file(file_obj=file_obj, content_type=f'image/{file_type}')
path = blob.public_url
return path
class CompanyProfile(SuccessMessageMixin, UpdateView): # TODO why company logo differs from the one in ads_list?
model = Company
form_class = CompanyProfileCreationForm
def form_valid(self, form):
"""
Check if user uploaded a new logo. If yes
then upload the new logo to google cloud
"""
if 'logo' in self.request.FILES:
blob_name = self.request.user.company.reg_nr # get company registration number
file_obj = self.request.FILES['logo'] # store uploaded file in variable
form.instance.logo_url = upload_to_cloud(blob_name, file_obj) # update company.logo_url with path to uploaded file
company = Company.objects.get(pk=self.request.user.company.pk)
company.save()
return super().form_valid(form)
else:
return super().form_valid(form)
Any ideas on what Im doing wrong and how its even possible? The file that I actually uploaded is the one under authenticated url. The file thats under public url is a file that I uploaded for a different blob
EDIT
Im adding screenshot of the different images because after some time the images appear to be the same as they should be. Some people are confused by this and comment that the images are the same after all
Public URL
Authenticated URL
Note that caching issue is ruled out since I sent the public URL to my friend and he also saw that the image is the HTML text although the image in the authenticated URL (the correct image) was a light bulb. He also noted that the URL preview in fb messenger showed the light bulb image but when he actually opened the URL the HTML text image appeared
This problem persists in case a file is uploaded with the same blob name. This happens regardless if its overwritten by gcs or if I previously execute blob delete function and then create a new file with the same name as the deleted blob.
In general the same object will be served by storage.googleapis.com and storage.cloud.google.com.
The only exception is if there is some caching (either in your browser, in a proxy, with Cloud CDN or in GCS). If you read the object via storage.cloud.google.com before uploading a new version, then reading after by storage.cloud.google.com may serve the old version while storage.googleapis.com returns the new one. Caching can also be location dependent.
If you can't allow an hour of caching, set Cache control to no-cache.

Django: Redirect after HttpResponse

I am generating a report download in a view and starting the download after processing the POST data. That means the user sends a form and the download starts:
views.py
def export(request):
if request.method == 'POST' and 'export_treat' in request.POST:
form1 = TransExport(request.POST, instance= obj)
if form1.is_valid():
...
...
response=HttpResponse(ds.xls,content_type="application/xls")
response['Content-Disposition'] = 'attachment; filename="Report_Behandlungen.xls"'
return response
What I need is a page refresh after the download (or a redirect).
How can I achieve this?
I would just do it in simple logic with javascript:
user clicks the link
/download_file_now/
and comes to /file_downloaded/ where the download starts and after 3 seconds, you just redirect the page via js
location.replace('/another_url/');
to detect if the download is ready is not easy

Refresh scrapy response after selenium browser Click

I m trying to scrape a website that uses Ajax to load the different pages.
Although my selenium browser is navigating through all the pages, but scrapy response is still the same and it ends up scraping same response(no of pages times).
Proposed Solution :
I read in some answers that by using
hxs = HtmlXPathSelector(self.driver.page_source)
You can change the page source and then scrape. But it is not working ,also after adding this the browser also stopped navigating.
code
def parse(self, response):
self.driver.get(response.url)
pages = (int)(response.xpath('//p[#class="pageingP"]/a/text()')[-2].extract())
for i in range(pages):
next = self.driver.find_element_by_xpath('//a[text()="Next"]')
print response.xpath('//div[#id="searchResultDiv"]/h3/text()').extract()[0]
try:
next.click()
time.sleep(3)
#hxs = HtmlXPathSelector(self.driver.page_source)
for sel in response.xpath("//tr/td/a"):
item = WarnerbrosItem()
item['url'] = response.urljoin(sel.xpath('#href').extract()[0])
request = scrapy.Request(item['url'],callback=self.parse_job_contents,meta={'item': item}, dont_filter=True)
yield request
except:
break
self.driver.close()
Please Help.
When using selenium and scrapy together, after having selenium perform the click I've read the page back for scrapy using
resp = TextResponse(url=self.driver.current_url, body=self.driver.page_source, encoding='utf-8')
That would go where your HtmlXPathSelector selector line went. All the scrapy code from that point to the end of the routine would then need to refer to resp (page rendered after the click) rather than response (page rendered before the click).
The time.sleep(3) may give you issues as it doesn't guarantee the page has actually loaded, it's just an unconditional wait. It might be better to use something like
WebDriverWait(self.driver, 30).until(test page has changed)
which waits until the page you are waiting for passes a specific test, such as finding the expected page number or manufacturer's part number.
I'm not sure what the impact of closing the driver at the end of every pass through parse() is. I've used the following snippet in my spider to close the driver when the spider is closed.
def __init__(self, filename=None):
# wire us up to selenium
self.driver = webdriver.Firefox()
dispatcher.connect(self.spider_closed, signals.spider_closed)
def spider_closed(self, spider):
self.driver.close()
Selenium isn't in any way connected with scrapy, nor their response object, and in your code I don't see you changing the response object.
You'll have to work with them independently.

Django, Show certain media dirs to logged in users

I have an Django app where users can download files that was assigned to them. How can I be sure that only the user who is assigned the file may download that file. Because its in the media dir, any one can brows there, so is there a way of letting only the relevant user download the file?
I did the same thing one year ago.
I want only relevant users can download their photos:
# urls.py
(r'^data/photos/(?P<path>.*)$','views.data_access'),
my view 'data_access' gives the photo, or a 403 page
# view data_access(request, path)
# [...code...]
if user_can_download:
response = HttpResponse(mimetype="image/jpeg")
response['Content-Disposition'] = 'attachment; filename=%s' % unicode(photo.image.name)
response['X-Accel-Redirect'] = '/protected/'+ unicode(path)
return response
else:
return HttpResponseForbidden()