File Uploads in Django from urllib - django

I have small django app where you can upload PDF files.
In the past only human beings used the web application.
In the future a script should be able to upload files.
Up to now we use ModelBackend for authentication (settings.AUTHENTICATION_BACKENDS)
Goal
A script should be able to authenticate and upload files
My current strategy
I add a new user remote-system-foo and give him a password.
Somehow log in to the django web application and then upload pdf files via a script.
I would like to use the requests library for the http client script.
Question
How to login into the django web application?
Is my current strategy the right one, or are there better strategies?

You can use the requests library to log into any site, you of course need to tailor the POST depending on which parameters your site requires. If things aren't trivial, take a look at the post data in Chrome's developer tools from when you log in to your site. Here is some code I used to log into a site, it could easily be extended to do what ever you need it to do.
from bs4 import BeautifulSoup as bs
import requests
data = requests.session.get(page)
soup = bs(data.text, "lxml")
# Grab csrf token
# soup.find(...) or something
# The POST data for authorizing, this may or may not have been a django
# site, so see what your POST needs
data = {
'user[login]': 'foo' ,
'user[password]': 'foofoo',
}
# Act like a computer, and insert token here, not with data!
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95
Safari/537.36', 'X-CSRF-Token': token
}
requests.session.post('https://www.examplesite.com/users/sign_in', data=data,
headers=headers)
Now, your session is logged in and you should be able to upload your pdf. But I've never tried to upload via requests. Take a look at the relevant requests documentation
That being said, this feels like a strange solution. You might consider uploading the files as fixtures or RunSQL, or rather, their location (eg AWS bucket url) to the database. But this is new territory for me.
Hope it helps.

We use this library now: https://github.com/hirokiky/django-basicauth
This way we use http-basic-auth for API views and session/cookie auth for interactive human beings.

Since I found no matching solution, I wrote and published this:
https://pypi.python.org/pypi/tbzuploader/
Generic http upload tool.
If the http upload was successfull, files get moved to a “done” sub
directory.
The upload is considered successfull by tbzuploader if the servers
replies with http status 201 Created
Additional features: Handles pairs of files.
For example you have four files: a.pdf, a.xml, b.pdf, b.xml
The first upload should take a.pdf and a.xml, and the second upload
b.pdf and b.xml, then read the docs for –patterns.

Related

Files are being downloaded at pythonanywhere server and user laptop/pc too. How to restrict to write at pythonanywhere server

Problem is i have hosted at pythonanywhere using django.Video is downloaded at pythonanywhere server and user/client system too.Thats why i used os. remove(path).After downloading it removes from server.
Is there any ways files donot write on pyhtonanywhere server. so that i donot use os.remove(path).
How to restrict to write at pythonanywhere server. Only to download at user system.
def fb_download(request):
link = request.GET.get('url')
html= requests.get(link)
try:
url= re.search('hd_src:"(.+?)"',html.text)[1]
except:
url= re.search('sd_src:"(.+?)"',html.text)[1]
path=wget.download(url, 'Video.mp4')
response=FileResponse(open(path, 'rb'), as_attachment=True)
os.remove(path)
return response
If I understand correctly, you're trying to get a request from a browser, which contains a URL. You then access the page at that URL and extract a further URL from it, and then you want to present the contents of that second URL -- a video -- to the browser.
The way you are doing that is to download the file to the server, and then to serve that up as a file attachment to the browser.
If you do it that way, then there is no way to avoid writing the file on the server; indeed, the way you are doing it right now might have problems because you are deleting the file before you've returned the response to the browser, so there may (depending on how the file deletion is processed and whether the FileResponse caches the file's contents) be cases where there is no file to send back to the browser.
But an alternative way to do it that might work would be to send a redirect response to the URL -- the one in your variable url -- like this, without downloading it at all:
def fb_download(request):
link = request.GET.get('url')
html= requests.get(link)
try:
url= re.search('hd_src:"(.+?)"',html.text)[1]
except:
url= re.search('sd_src:"(.+?)"',html.text)[1]
return redirect(url)
By doing that, the download happens on the browser instead of on the server.
I don’t understand javascript really good,
But i think if you download the file to the server
And then you can download the file to the use using JS
And i think you can use

Data Crawling From Linkedin

I'm trying to crawl data from Linkedin which use for a personal data crawling practice. But I can not crawl the data without login. So I used two way to simulate log in. One is to get the cookies from HttpClient, which will try to make a simulation login to get the cookies. the other is just add the cookie directly. But I failed both. I don't know the reason.
I used Framework Webmagic for the data crawling.
generally, adding Cookies directly will be an easy way. But I don't know whether I added the wrong cookies.
Here's the thing. I wanna fetch data from the website https://www.linkedin.com/mynetwork/invite-connect/connections/
And I added all the cookies at this page.
Here's all the cookies.
private Site site = Site.me().setRetryTimes(3).setSleepTime(100);
site.setCharset("utf-8")
.setUserAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.101 Safari/537.36")
.addHeader("accept","text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8")
.addHeader("accept-encoding","gzip, deflate, br")
.addHeader("accept-language:en-US","en;q=0.8")
.addHeader("connection", "keep-alive")
.addHeader("referer","https://www.linkedin.com/")
.addCookie(".linkedin.com","lidc", "b=TB91:g=750:u=38:i=1503815541:t=1503895683:s=AQE5xZLW6mVmRdHBY9qNO-YOiyAnKtgk")
.addCookie(".linkedin.com","lang", "v=2&lang=en-us")
.addCookie(".linkedin.com","_lipt", "CwEAAAFeIo5-jXjgrpSKF4JfxzNbjC6328JPUgtSHQIKtSDyk4Bockuw84uMkCwbKS0TzUOM_w8Al4s9YjFFF-0T43TPtfG_wv-JNVXsPeO8mVxaYwEcTGiyOdyaRZOCIK7qi02EvZUCtjsaTpAos60U4XrFnu1FO-cY1LrzpqDNUmfrqWJPjSoZpOmjeKtTh-nHcdgpruvjf237E78dqMydLLd1A0Uu7Kr7CmNIurXFd9-Z4hwevLRd3SQMEbSRxAwCclgC4tTzEZ5KoFmpI4veKBFGOqF5MCx3hO9iNRdHrJC44hfRx-Bw7p__PYNWF8sc6yYd0deF-C5aJpronFUYp3vXiwt023qm6T9eRqVvtH1BRfLwCZOJmYrGbKzq4plzNKM7DnHKHNV_cjJQtc9aD3JQz8n2GI-cHx2PYubUyIjVWWvntKWC-EUtn4REgL4jmIaWzDUVz3nkEBW7I3Wf6u2TkuAVu9vq_0mW_dTVDCzgASk")
.addCookie(".linkedin.com","_ga", "GA1.2.2091383287.1503630105")
.addCookie(".www.linkedin.com","li_at", "AQEDAReIjksE2n3-AAABXiKOYVQAAAFeRprlVFYAV8gUt-kMEnL2ktiHZG-AOblSny98srz2r2i18IGs9PqmSRstFVL2ZLdYOcHfPyKnBYLQPJeq5SApwmbQiNtsxO938zQrrcjJZxpOFXa4wCMAuIsN")
.addCookie(".www.linkedin.com","JSESSIONID", "ajax:4085733349730512988")
.addCookie(".linkedin.com","liap", "true")
.addCookie(".www.linkedin.com","sl","v=1&f68pf")
.addCookie("www.linkedin.com","visit", "v=1&M")
.addCookie(".www.linkedin.com","bscookie", "v=1&201708250301246c8eaadc-a08f-4e13-8f24-569529ab1ce0AQEk9zZ-nB0gizfSrOSucwXV2Wfc3TBY")
.addCookie(".linkedin.com","bcookie", "v=2&d2115cf0-88a6-415a-8a0b-27e56fef9e39");
Did I miss something?
LinkedIn is very difficult to crawl, not just technically but they also sue people who do.
When they detect an IP as a possible bot, they give you the login page. Most IP addresses known for bots by them are now serving a login page. New ranges do not last very long.
They're probably just pretty confident you're a bot and keeping you from logging in.

Login using python request module on a aspx webpage

I've being trying to log in to this web page but I fail every time. This is the code i used
import requests
headers = {'User-Agent': 'Chrome'}
payload = {'_GlobalLoginControl$UserLogin':'myUser','_GlobalLoginControl$Password':'myPass'}
s = requests.Session()
r = s.post('https://www.scadalynx.com/GlobalLogin.aspx',headers=headers,data=payload)
r = s.get('https://www.scadalynx.com/Default.aspx')
print r.url
The result I get from: print r.url is this:
https://www.scadalynx.com/GlobalLogin.aspx?Timeout=Y
You can't.
The main problem is, your payload isn't complete. Check chrome's networking tab. There are much more required payloads.
ScriptMgr:_GlobalLoginControl$UpdatePanel1|_GlobalLoginControl$LoginBtn
ScriptMgr_HiddenField:;;AjaxControlToolkit, Version=4.1.40412.0, Culture=neutral, PublicKeyToken=28f01b0e84b6d53e:en-US:2d0688b9-5fe7-418f-aeb1-6ecaa4dca45f:475a4ef5:effe2a26:751cdd15:5546a2b:dfad98a5:1d3ed089:497ef277:a43b07eb:3cf12cf1
__EVENTTARGET:
__EVENTARGUMENT:
__VIEWSTATE:/wEPDwUKMTQxMjQ3NTE5MA9kFgICAQ8WAh4Ib25zdWJtaXQFkgFpZiAoJGdldCgnX0dsb2JhbExvZ2luQ29udHJvbF9QYXNzd29yZCcpICE9IG51bGwpICRnZXQoJ19HbG9iYWxMb2dpbkNvbnRyb2xfUGFzc3dvcmQnKS52YWx1ZSA9IGVzY2FwZSgkZ2V0KCdfR2xvYmFsTG9naW5Db250cm9sX1Bhc3N3b3JkJykudmFsdWUpOxYEAgEPZBYCZg9kFgICBQ9kFgICAg9kFgICAQ9kFgJmD2QWCgIND2QWAgIJDw9kFgIeB29uY2xpY2sFugFqYXZhc2NyaXB0OmlmKCRnZXQoJ19HbG9iYWxMb2dpbkNvbnRyb2xfX0ZvcmdvdFBhc3N3b3JkRU1haWxUZXh0Qm94JykgIT0gbnVsbCkkZ2V0KCdfR2xvYmFsTG9naW5Db250cm9sX19Gb3Jnb3RQYXNzd29yZEVNYWlsVGV4dEJveCcpLnZhbHVlID0gJGdldCgnX0dsb2JhbExvZ2luQ29udHJvbF9Vc2VyTG9naW4nKS52YWx1ZTtkAg8PZBYCAgUPEGRkFgBkAhEPD2QWAh8BBZUDamF2YXNjcmlwdDppZigkZ2V0KCdfR2xvYmFsTG9naW5Db250cm9sX1VzZXJMb2dpblZhbGlkYXRvcicpICE9IG51bGwpJGdldCgnX0dsb2JhbExvZ2luQ29udHJvbF9Vc2VyTG9naW5WYWxpZGF0b3InKS5lbmFibGVkID0gdHJ1ZTtpZigkZ2V0KCdfR2xvYmFsTG9naW5Db250cm9sX19Vc2VyTG9naW5SZWd1bGFyRXhwcmVzc2lvblZhbGlkYXRvcicpICE9IG51bGwpJGdldCgnX0dsb2JhbExvZ2luQ29udHJvbF9fVXNlckxvZ2luUmVndWxhckV4cHJlc3Npb25WYWxpZGF0b3InKS5lbmFibGVkID0gdHJ1ZTtpZigkZ2V0KCdfR2xvYmFsTG9naW5Db250cm9sX1Bhc3N3b3JkVmFsaWRhdG9yJykgIT0gbnVsbCkkZ2V0KCdfR2xvYmFsTG9naW5Db250cm9sX1Bhc3N3b3JkVmFsaWRhdG9yJykuZW5hYmxlZCA9IHRydWU7ZAITD2QWAgIBDw8WAh4HVmlzaWJsZWhkZAIVD2QWBAIBDw9kFgIfAQUtJGdldCgnX0dsb2JhbExvZ2luQ29udHJvbF9QYXNzd29yZCcpLmZvY3VzKCk7ZAILDw9kFgIfAQVhamF2YXNjcmlwdDokZ2V0KCdfR2xvYmFsTG9naW5Db250cm9sX19Gb3Jnb3RQYXNzd29yZEVNYWlsUmVxdWlyZWRGaWVsZFZhbGlkYXRvcicpLmVuYWJsZWQgPSB0cnVlO2QCAg8PFgIfAmhkFgICAw8WAh4LXyFJdGVtQ291bnRmZBgBBR5fX0NvbnRyb2xzUmVxdWlyZVBvc3RCYWNrS2V5X18WAwUpX0dsb2JhbExvZ2luQ29udHJvbCRSZW1lbWJlckxvZ2luQ2hlY2tCb3gFK19HbG9iYWxMb2dpbkNvbnRyb2wkX0ZvcmdvdFBhc3N3b3JkQ2xvc2VCdG4FEV9FcnJvckltYWdlQnV0dG9uIXu7XOl6z8WoghCWdElD7kNBanI=
__VIEWSTATEGENERATOR:ABDC7715
__SCROLLPOSITIONX:0
__SCROLLPOSITIONY:0
__EVENTVALIDATION:/wEdAA8j+x15hTpBOEjDv1LxVan3AUijrFjxy9PpisoGxfMqnNduSMVw1RChh3aZsdCK82jXRUWkWThaqEhU3Gr5iw98GHoUhEtg6gp73QcFIR1tGEGQHmQGQos+5LR8l78kIyNCGm6wvkKBlG3Z3EngFWzmX3gMRUNTCvY9T8lfFGMsRkvp3s0LtAU9sya5EgaP5MNrqxxx0HTfWwHJy49saUYlPDg6OL5q3VoZ6biOkvIG8l/ujxMESq+8VmX4sGwXcQBJxOm7RbAd1IEojVITrtk4hx8VhfPuqTNrqWHRrUAMgBj1ffXkwiR7kcJxJ3ixy43iLukJszI09WI7xsAFyAKxG82PcA==
_GlobalLoginControl$ScrWidth:1536
_GlobalLoginControl$ScrHeight:864
_GlobalLoginControl$UserLogin:asdsad#asdas.com
_GlobalLoginControl$Password:asdasd
_GlobalLoginControl$PasswordStore:
_GlobalLoginControl$HiddenField1:
_GlobalLoginControl$_HiddenSessionContentID:
_ErrorHiddenField:
__ASYNCPOST:true
_GlobalLoginControl$LoginBtn:Login
Probably, you could outsource this (I think it isn't possible, you have to use selenium or get the page first and scrape the informations.
But check this topic: How to make HTTP POST on website that uses asp.net?
We considered that, the login should be pass-through with phantomjs/chrome with selenium, the you should pass the cookies and the headers to requests. After you pass the required informations for requests, you could work with request for the further steps.

Persistence for cookies

I am working with python mechanize on making a login script. I have read that the mechanize's Browser() object will handle with cookies automatically for further requests.
How can i make this cookie persistent ie save into a file so that I can load from that file later.
My script is currently logging-in(using mechanize/HTML forms) to the website with Browser() object every time it is run.
If you go through the API docs for Mechanize at
http://wwwsearch.sourceforge.net/mechanize/doc.html
there is some information regarding specifically what you're asking, specifically the CookieJar and LWPCookieJar materials.
From the docs:
There are also some CookieJar subclasses which can store cookies in files and databases. FileCookieJar is the abstract class for CookieJars that can store cookies in disk files. LWPCookieJar saves cookies in a format compatible with the libwww-perl library. This class is convenient if you want to store cookies in a human-readable file:
import mechanize
cj = mechanize.LWPCookieJar()
cj.revert("cookie3.txt")
opener = mechanize.build_opener(mechanize.HTTPCookieProcessor(cj))
r = opener.open("http://foobar.com/")
cj.save("cookie3.txt")
EDIT: Pseudo code for what is asked for in comments
Attempt to load your CookieJar from file
If successful, set your Browser() cookie jar to the loaded cookie jar
Attempt to access the page normally
Else if unsuccessful go through the pages until you reach a point where you have all of the cookies
Save the cookies to the file using the LWPCookieJar();

CSRF handling with Adobe Flash Application using Django backend

I'm building a flash game that uses Django as a backend.
I currently have an api endpoint set up using django-tastypie, which the flash app can call to receive JSON data for populating the application.
I understand using simple django views, and templating system, one is able to simply include a csrf_token in a webpage with the aid of the middleware.
My problem now is trying to post data back to the server without using csrf_exempt, and the flash application ideally, can be run without inserting params tags. Hopefully, a standalone swf file that'll work as it is.
How would one get a csrf_token into the flash app so it can post data back to the server without security concerns?
If the csrf_token way is not possible, are there any other ways to post data securely?
I have searched many avenues leading to similar questions, but many are unanswered.
Maybe I'm missing something here as I'm engrossed in my perspective. I hope someone can enlighten me on better ways to do it.
Thanks in advance.
It sounds like you may have two problems:
How do I actually send the CSRF token with my POST requests from Flash?
Django also accepts CSRF tokens via the header X-CRSFToken. See the docs here.
You can append headers to your request like so:
var req:URLRequest=new URLRequest();
req.url="http://somesite.com";
var header:URLRequestHeader=new URLRequestHeader("X-CSRFToken","foobar");
req.requestHeaders.push(header);
URLRequests docs are here.
How do I get the CSRF token into my Flash file in the first place?!
(Option A) Because CSRF tokens are generated on a per request basis (e.g., with templating a traditional HTML form, on a GET request) the simplest thing to do is to pass the CSRF token to the Flash file via a templated parameter. Something like: <param name="csrf_token" value="{{ my_csrf_token }}" />
(Option B) It sounds like you don't want to do the parameter thing, so your final option is to build a custom Django view which has the sole functionality of delivering a CSRFToken to your Flash file. So the Flow would be your Flash file loads, your Flash makes a GET request to http://mysite.com/csrf_token/ which simply returns a valid CSRF token, and then you can use that token to do your POST. (Note you will need to do a GET request for each POST request).