Trying to write a script to upload files to a django project - django

I have a django 3.x project where I can upload multiple files and associated form data through the admin pages to a model called Document. However, I need to upload a large number of files, so I wrote a small python script to automate that process.
I am having one problem with the script. I can't seem to set the name of the file as it is set when uploaded through the admin page.
Here is the script...I had a few problems getting the csrf token working correctly, so there may be some redundant code for that.
import requests
# Set up the urls to login to the admin pages and access the correct add page
URL1='http://localhost:8000/admin/'
URL2='http://localhost:8000/admin/login/?next=/admin/'
URL3 = 'http://localhost:8090/admin/memorabilia/document/add/'
USER='admin'
PASSWORD='xxxxxxxxxxxxx'
client = requests.session()
# Retrieve the CSRF token first
client.get(URL1) # sets the cookie
csrftoken = client.cookies['csrftoken']
print("csrftoken1=%s" % csrftoken)
login_data = dict(username=USER, password=PASSWORD, csrfmiddlewaretoken=csrftoken)
r = client.post(URL2, data=login_data, headers={"Referer": "foo"})
r = client.get(URL3)
csrftoken = client.cookies['csrftoken']
print("csrftoken2=%s" % csrftoken)
cookies = dict(csrftoken= csrftoken)
headers = {'X-CSRFToken': csrftoken}
file_path = "/media/mark/ea00fd8e-4330-4d76-81d8-8fe7dde2cb95/2017/Memorable/20047/Still Images/Photos/20047_Phillips_Photo_052_002.jpg"
data = {
"csrfmiddlewaretoken": csrftoken,
"documentType_id": '1',
"rotation" : '0',
"TBD": '350',
"Title": "A test title",
"Period": "353",
"Source Folder": '258',
"Decade": "168",
"Location": "352",
"Photo Type": "354",
}
file_data = None
with open(file_path ,'rb') as fr:
file_data = fr.read()
# storage_file_name is the name of the FileField in the Document model.
#response_1 = requests.post(url=URL3, data=data, files={'storage_file_name': file_data,}, cookies=cookies)
response_2 = client.post(url=URL3, data=data, files={'storage_file_name': file_data, 'name': "20047_Phillips_Photo_052_002.jpg"}, cookies=cookies,)
When I upload using the admin page, the name of the file is "20047_Phillips_Photo_052_002.jpg", as it should be (i.e. storage_file_name.name = 20047_Phillips_Photo_052_002.jpg).
When I run the script using files={'storage_file_name': file_data,} (see response_1 at the bottom of the script), the files uploads correctly except that the name of the file is "storage_file_name" and not "20047_Phillips_Photo_052_002.jpg" (i.e. storage_file_name.name = "storage_file_name").
When I upload using files={'storage_file_name': file_data, 'name': "20047_Phillips_Photo_052_002.jpg"} the name of the file is still "storage_file_name" (i.e. storage_file_name.name = "storage_file_name").
I looked in the request.FILES object when uploading a file through the admin page, and the _name field for each object is the name of the file being uploaded. The documentation for the django File object says it has a field called name.
What am I missing to get my script to upload a file the same way as the admin page does? By that I mean, the name of the file is not "storage_file_name".

When I change the last response= line to
response = client.post(url=URL3, data=metadata, files= {'storage_file_name': open(file_path ,'rb'),}, cookies=cookies, headers=headers)
the file upload works and the file name is correctly displayed.

Related

Cannot download html (entire web page)

I am trying to download the entire html code from
http://www.ivolatility.com/options/AMZN/NASDAQ/
The output does not include the data in the tables.
This is the code I am using
url = 'http://www.ivolatility.com/options/AMZN/NASDAQ/'
r = requests.get(url, allow_redirects=True)
open('C:.../Downloads/amzn.html', 'wb').write(r.content)
I think it might be related to registration issues.
Anything I can do?
Thanks
Your request returns a login form, which means you'll have to login in order to access the data.
The login process is relatively easy - all we have to do is submit the form data to the login page (and use a session object to store the cookies).
Then we can use that authenticated session to retrieve the table contents.
The code,
import requests
url = 'http://www.ivolatility.com/options/AMZN/NASDAQ/'
login_url = 'https://www.ivolatility.com/login.j'
usr = 'my username'
pwd = 'my password'
data = {
'username':usr, 'password':pwd,
'ref_url':login_url, 'service_name':'Home Page',
'step':1, 'login__is__sent':1
}
s = requests.session()
s.post(login_url, data)
r = s.get(url)
with open('my file', 'wb') as f:
f.write(r.content)

How to fix a connection to https NSIDC/NASA website?

I have been working in a python code to search and download SMAP satellite data from NSIDC https website. My code was working until last week when start a bug:
urllib2.HTTPError: HTTP Error 404: Not Found
Any help?
The code Is a adaptation from a NSIDC website proposed to do exactly what I need. The example below:
"""This script, NSIDC_parse_HTML_BatchDL.py, defines an HTML parser to scrape data files from an earthdata HTTPS URL and bulk downloads all files to your working directory.
This code was adapted from https://wiki.earthdata.nasa.gov/display/EL/How+To+Access+Data+With+Python
Last edited Jan 26, 2017 G. Deemer"""
import urllib2
import os
from cookielib import CookieJar
from HTMLParser import HTMLParser
# Define a custom HTML parser to scrape the contents of the HTML data table
class MyHTMLParser(HTMLParser):
def __init__(self):
HTMLParser.__init__(self)
self.inLink = False
self.dataList = []
self.directory = '/'
self.indexcol = ';'
self.Counter = 0
def handle_starttag(self, tag, attrs):
self.inLink = False
if tag == 'table':
self.Counter += 1
if tag == 'a':
for name, value in attrs:
if name == 'href':
if self.directory in value or self.indexcol in value:
break
else:
self.inLink = True
self.lasttag = tag
def handle_endtag(self, tag):
if tag == 'table':
self.Counter +=1
def handle_data(self, data):
if self.Counter == 1:
if self.lasttag == 'a' and self.inLink and data.strip():
self.dataList.append(data)
parser = MyHTMLParser()
# Define function for batch downloading
def BatchJob(Files, cookie_jar):
for dat in Files:
print "downloading: ", dat
JobRequest = urllib2.Request(url+dat)
JobRequest.add_header('cookie', cookie_jar) # Pass the saved cookie into additional HTTP request
JobRedirect_url = urllib2.urlopen(JobRequest).geturl() + '&app_type=401'
# Request the resource at the modified redirect url
Request = urllib2.Request(JobRedirect_url)
Response = urllib2.urlopen(Request)
f = open( dat, 'wb')
f.write(Response.read())
f.close()
Response.close()
print "Files downloaded to: ", os.path.dirname(os.path.realpath(__file__))
#===========================================================================
# The following code block is used for HTTPS authentication
#===========================================================================
# The user credentials that will be used to authenticate access to the data
username = "user"
password = "password"
# The FULL url of the directory which contains the files you would like to bulk download
url = "https://n5eil01u.ecs.nsidc.org/SMAP/SPL4SMGP.003/2017.10.14/" # Example URL
# Create a password manager to deal with the 401 reponse that is returned from
# Earthdata Login
password_manager = urllib2.HTTPPasswordMgrWithDefaultRealm()
password_manager.add_password(None, "https://urs.earthdata.nasa.gov", username, password)
# Create a cookie jar for storing cookies. This is used to store and return
# the session cookie given to use by the data server (otherwise it will just
# keep sending us back to Earthdata Login to authenticate). Ideally, we
# should use a file based cookie jar to preserve cookies between runs. This
# will make it much more efficient.
cookie_jar = CookieJar()
# Install all the handlers.
opener = urllib2.build_opener(
urllib2.HTTPBasicAuthHandler(password_manager),
#urllib2.HTTPHandler(debuglevel=1), # Uncomment these two lines to see
#urllib2.HTTPSHandler(debuglevel=1), # details of the requests/responses
urllib2.HTTPCookieProcessor(cookie_jar))
urllib2.install_opener(opener)
# Create and submit the requests. There are a wide range of exceptions that
# can be thrown here, including HTTPError and URLError. These should be
# caught and handled.
#===========================================================================
# Open a requeset to grab filenames within a directory. Print optional
#===========================================================================
DirRequest = urllib2.Request(url)
DirResponse = urllib2.urlopen(DirRequest)
# Get the redirect url and append 'app_type=401'
# to do basic http auth
DirRedirect_url = DirResponse.geturl()
DirRedirect_url += '&app_type=401'
# Request the resource at the modified redirect url
DirRequest = urllib2.Request(DirRedirect_url)
DirResponse = urllib2.urlopen(DirRequest)
DirBody = DirResponse.read(DirResponse)
# Uses the HTML parser defined above to pring the content of the directory containing data
parser.feed(DirBody)
Files = parser.dataList
# Display the contents of the python list declared in the HTMLParser class
# print Files #Uncomment to print a list of the files
#=========================================================================
# Call the function to download all files in url
#=========================================================================
BatchJob(Files, cookie_jar) # Comment out to prevent downloading to your working directory
I could fix the bug using a directly load of the website and selecting the images to download. As the code above.
"""This script, NSIDC_parse_HTML_BatchDL.py, defines an HTML parser to scrape data files from an earthdata HTTPS URL and bulk downloads all files to your working directory.
This code was adapted from https://wiki.earthdata.nasa.gov/display/EL/How+To+Access+Data+With+Python Last edited Jan 26, 2017 G. Deemer"""
import urllib2
import os
from cookielib import CookieJar
# Define function for batch downloading
def BatchJob(Files, cookie_jar):
for dat in Files:
print "downloading: ", dat
JobRequest = urllib2.Request(url+dat)
JobRequest.add_header('cookie', cookie_jar) # Pass the saved cookie into additional HTTP request
JobRedirect_url = urllib2.urlopen(JobRequest).geturl() + '&app_type=401'
# Request the resource at the modified redirect url
Request = urllib2.Request(JobRedirect_url)
Response = urllib2.urlopen(Request)
f = open( dat, 'wb')
f.write(Response.read())
f.close()
Response.close()
print "Files downloaded to: ", os.path.dirname(os.path.realpath(__file__))
#==========================================================================
# The following code block is used for HTTPS authentication
#==========================================================================
# The user credentials that will be used to authenticate access to the data
username = "user"
password = "password"
# The FULL url of the directory which contains the files you would like to bulk download
url = "https://n5eil01u.ecs.nsidc.org/SMAP/SPL4SMGP.003/2017.10.14/" # Example URL
# Create a password manager to deal with the 401 reponse that is returned from # Earthdata Login
password_manager = urllib2.HTTPPasswordMgrWithDefaultRealm()
password_manager.add_password(None,
"https://urs.earthdata.nasa.gov",
username, password)
# Create a cookie jar for storing cookies. This is used to store and return
# the session cookie given to use by the data server (otherwise it will just
# keep sending us back to Earthdata Login to authenticate). Ideally, we
# should use a file based cookie jar to preserve cookies between runs. This
# will make it much more efficient.
cookie_jar = CookieJar()
# Install all the handlers.
opener = urllib2.build_opener(
urllib2.HTTPBasicAuthHandler(password_manager),
#urllib2.HTTPHandler(debuglevel=1), # Uncomment these two lines to see
#urllib2.HTTPSHandler(debuglevel=1), # details of the requests/responses
urllib2.HTTPCookieProcessor(cookie_jar))
urllib2.install_opener(opener)
# Create and submit the requests. There are a wide range of exceptions that
# can be thrown here, including HTTPError and URLError. These should be
# caught and handled.
#===========================================================================
# Open a requeset to grab filenames within a directory. Print optional
#===========================================================================
DirResponse = urllib2.urlopen(url)
htmlPage = DirResponse.read()
listFiles = [x.split(">")[0].replace('"', "")
for x in htmlPage.split("><a href=") if x.split(">")[0].endswith('.h5"') == True]
# Display the contents of the python list declared in the HTMLParser class
# print Files #Uncomment to print a list of the files
#=========================================================================
# Call the function to download all files in url
#=========================================================================
BatchJob(Files, cookie_jar) # Comment out to prevent downloading to your working directory

Download image data then upload to Google Cloud Storage

I have a Flask web app that is running on Google AppEngine. The app has a form that my user will use to supply image links. I want to download the image data from the link and then upload it to a Google Cloud Storage bucket.
What I have found so far on Google's documentation tells me to use the 'cloudstorage' client library which I have installed and imported as 'gcs'.
found here: https://cloud.google.com/appengine/docs/python/googlecloudstorageclient/read-write-to-cloud-storage
I think I am not handling the image data correctly through requests. I get a 200 code back from the Cloud Storage upload call but there is no object when I look for it in the console. Here is where I try to retrieve the image and then upload it:
img_resp = requests.get(image_link, stream=True)
objectName = '/myBucket/testObject.jpg'
gcs_file = gcs.open(objectName,
'w',
content_type='image/jpeg')
gcs_file.write(img_resp)
gcs_file.close()
edit:
Here is my updated code to reflect an answer's suggestion:
image_url = urlopen(url)
content_type = image_url.headers['Content-Type']
img_bytes = image_url.read()
image_url.close()
filename = bucketName + objectName
options = {'x-goog-acl': 'public-read',
'Cache-Control': 'private, max-age=0, no-transform'}
with gcs.open(filename,
'w',
content_type=content_type,
options=options) as f:
f.write(img_bytes)
f.close()
However, I am still getting a 201 response on the POST (create file) call and then a 200 on the PUT call but the object never appears in the console.
Try this:
from google.appengine.api import images
import urllib2
image = urllib2.urlopen(image_url)
img_resp = image.read()
image.close()
objectName = '/myBucket/testObject.jpg'
options = {'x-goog-acl': 'public-read',
'Cache-Control': 'private, max-age=0, no-transform'}
with gcs.open(objectName,
'w',
content_type='image/jpeg',
options=options) as f:
f.write(img_resp)
f.close()
And, why restrict them to just entering a url. Why not allow them to upload a local image:
if isinstance(image_or_url, basestring): # should be url
if not image_or_url.startswith('http'):
image_or_url = ''.join([ 'http://', image_or_url])
image = urllib2.urlopen(image_url)
content_type = image.headers['Content-Type']
img_resp = image.read()
image.close()
else:
img_resp = image_or_url.read()
content_type = image_or_url.content_type
If you are running on the development server, the file will be uploaded into your local datastore. Check it at:
http://localhost:<your admin port number>/datastore?kind=__GsFileInfo__
and
http://localhost:<your admin port number>/datastore?kind=__BlobInfo__

How to access file after upload in Django?

I'm working on a web. User can upload a file. This file is in docx format. After he uploads a file and choose which languages he wants to translate the file to, I want to redirect him to another page, where he can see prices for translations. The prices depends on particular language and number of characters in the docx file.
I can't figure out how to handle the file uploaded. I have a function which get's path to file and returns a number of characters. After uploading file and click on submit, I want to call this function so I can render new page with estimated prices.
I've read that I can call temporary_file_path on request.FILES['file'] but it raises
'InMemoryUploadedFile' object has no attribute 'temporary_file_path'
I want to find out how many characters uploaded file contains and send it in a request to another view - /order-estimation.
VIEW:
def create_order(request):
LanguageLevelFormSet = formset_factory(LanguageLevelForm, extra=5, max_num=5)
language_level_formset = LanguageLevelFormSet(request.POST or None)
job_creation_form = JobCreationForm(request.POST or None, request.FILES or None)
context = {'job_creation_form': job_creation_form,
'formset': language_level_formset}
if request.method == 'POST':
if job_creation_form.is_valid() and language_level_formset.is_valid():
cleaned_data_job_creation_form = job_creation_form.cleaned_data
cleaned_data_language_level_formset = language_level_formset.cleaned_data
for language_level_form in [d for d in cleaned_data_language_level_formset if d]:
language = language_level_form['language']
level = language_level_form['level']
Job.objects.create(
customer=request.user,
text_to_translate=cleaned_data_job_creation_form['text_to_translate'],
file=cleaned_data_job_creation_form['file'],
short_description=cleaned_data_job_creation_form['short_description'],
notes=cleaned_data_job_creation_form['notes'],
language_from=cleaned_data_job_creation_form['language_from'],
language_to=language,
level=level,
)
path = request.FILES['file'].temporary_file_path
utilities.docx_get_characters_number(path) # THIS NOT WORKS
return HttpResponseRedirect('/order-estimation')
else:
return render(request, 'auth/jobs/create-job.html', context=context)
return render(request, 'auth/jobs/create-job.html', context=context)
The InMemoryUploadedFile does not provide temporary_file_path. The content lives 'in memory' - as the class name implies.
By default Django uses InMemoryUploadedFile for files up to 2.5MB size, larger files use TemporaryFileUploadHandler. where the later provides the temporary_file_path method in question. Django Documentation
So an easy way would be to change your settings for FILE_UPLOAD_HANDLERS to always use TemporaryFileUploadHandler:
FILE_UPLOAD_HANDLERS = [
'django.core.files.uploadhandler.TemporaryFileUploadHandler',
]
Just keep in mind that this is not the most efficient way when you have a site with a lot of concurrent small upload requests.

error serving pdf in django 1.8

In django 1.8 I have a couple of functions that read pdf files and return them, and that generate a pdf with reportlab and return it.
In some cases the file is served correctly, but sometimes the PDF is opened by the browser as if it were html and what is even more strange, pdf source is displayed in my django base template.
In this case, if reloading the page after the error, the pdf is served.
This is the code of a view:
fpdf = open (path, 'rb')
return HttpResponse (FileWrapper (fpdf), content_type = 'application/pdf')
and this is the code of the other:
pdf = pisa.CreatePDF (StringIO.StringIO (html.encode ("UTF-8")), result)
if not pdf.err:
response = HttpResponse (result.getvalue (), content_type = 'application / pdf')
response ['Content-Disposition'] = 'attachment; filename =% S.pdf '% (doc.name.replace ("", "_"))
return response
#Return HttpResponse (result.getvalue (), content_type = 'application/pdf')
Returning the PDF as an attachment is a test that I made to see if solved, because the desired behavior would be directly open the file.
Unfortunately, the error still occurs even so.
Change this line
response = HttpResponse (result.getvalue (), content_type = 'application / pdf')
To this line
response = HttpResponse (result.getvalue (), content_type = 'application/octet-stream')
This will make the file to be treated as a binary, and downloaded to the user instead of opening it in the browser.
If you view it inside the browser, follow Igor Pomaranskiy advice, and remove the space inside your content_type variable by doing the following
Change this
content_type = 'application / pdf'
to this
content_type = 'application/pdf'