Collect POST data, process it and POST that data to an external URL

Collect POST data, process it and POST that data to an external URL - django

I'm trying to implement a data processing module.
The scenario is,
First a user will POST some data.
User POSTed data needs to be processed and some more info needs to be added here
This processed POST data should be sent to an external URL with out user intervention.
The external URL will accept only POST requests.
Please suggest me a way to send this POST data to external URL.
Update
As suggested, I started using requests.
In the view that i collected the initial POST data, I'm compiling another data object with the user posted data (after processing) and adding some more data to the object and doing the post request as bellow
req = requests.post(post_url, data=post_obj)
the status_code returned is 200
But the data(post_obj) doesn't seem to be sent to the post_url. The post_url is prompting that it did not receive the POST data.
when I checked the req object,
req.request.data seems to have the post_obj information and req.request.url has the post_url
req.url has the redirect_url which is prompting that the post_url didn't receive any data.
My question is,
How to actually POST the data?
what is the object that needs to be returned in the view?
If the way I'm POSTing the data (requests.post method) is wrong. Please suggest me the appropriate way.
Note: After POSTing the data to the post_url, it will be redirected to a different page.

Use Urllib2, mechanise or requests (who all use pythons built in urllib2 and httplib) or pycurl (which uses libcurl) to do the posts to the external resource.
Requests is the easiest to work with, mechanize is great for filling out forms and programming like a browser, urllib2 is the underlying library so it's also important to know and pycurl is (imo) a last resort due to not being particularly maintained
You should consider using a queue to handle the server->third party step and then asynchronously report to the user that the task has completed, otherwise you face potentially timing your connections out if your 3rd party app takes to long to respond.

You can use the standard library urllib2 to do the 2nd POST.
I've also heard good things about the requests library, which should be easier to use than urllib2.

Related

File upload to third party API with HTTP 307 Temporary Redirect in Flask

I have a scenario where I have to upload a file from flask app to a third party API. I have wrapped all API requests in Flask to control API usage. For this I have redirected the traffic from main route towards the api wrapper route with http 307 status to preserve the request body and in the API wrapper I have used request to post into third party API endpoint.
The problem is only file < 100KB gets send through the redirection request, having a file larger than 100 KB gets somehow terminated in the sending phase.
Is there any limit in the 307 redirection and payload size?
I tried debugging by watching the network timing stack trace, from there it seems the request is dropped in the sending phase.
Main Blueprint
#main.route('/upload/',methods=['POST','GET'])
def upload():
#for ajax call
if request.method == 'POST'
return redirect(url_for('api.file_push'),code=307)
else:
return render_template('file-upload.html')
API Blueprint
#api.route('/upload/',methods=['POST'])
def file_push():
upload_file = request.files['file']
filename = urllib.parse.quote(upload_file.filename)
toUpload = upload_file.read()
result=requests.post(apiInterfaces.FILE_UPLOAD_INTERFACE+'/'+filename,files{'file':toUpload})
return result
Yes, I can directly send post request to API endpoint from main route but I don't want to, it will destroy my system design and architecture.

I assume you're using Python, and possibly requests so this answer will be based on what I've learned figuring this out (debugging with a colleague). I filed a bug report with psf/requests. There is a related answer here which confirms my suspicions.
It seems that when you initiate a PUT request using requests (and urllib3), the entire request is sent before a response from the server is looked at, but some servers can send a HTTP 307 during this time. One of two things happen:
the server closes the connection by sending the response, even if the client has not finished sending the entire file. In this case, the client might see a closed connection and you won't have a response you can use for redirect (this happens with urllib3>1.26.10 (roughly)) but requests is not handling this situation correctly
the server sends the response and you re-upload the file to the second location (urllib3==1.26.3 has this behavior when using requests). Technically, there is a bug in urllib3 and it should have failed, but silently lets you upload...
However, it seems that if you are expecting a redirect, the most obvious solution might be to send a null byte via PUT first, get a valid response back for the new URL [don't follow redirects], and then use that response to do the PUT of the full file. With requests, it's probably something like
>>> import requests
>>> from io import BytesIO
>>> data = BytesIO(b'\x00')
>>> response = request.put(url, data=data, allow_redirects=False)
>>> request.put(response.headers['Location'], data=fp, allow_redirects=False)
and at that point, you'll be ok (assuming you only expect a single redirect here).

Using Request, HttpNtlmAuth to make a system call with authentication

I've got the following snip of code the has the audacity to tell me it is "FAIL to load undefnied" (the nerve...) I'm trying to pass my authenticated session to a system call that uses javascript.
import requests
from requests_ntlm import HttpNtlmAuth
from subprocess import call
# THIS WORKS - 200 returned
s = requests.Session()
r = s.get("http://example.com",auth=HttpNtlmAuth('domain\MyUserName','password'))
call(["phantomjs", "yslow.js", r.url])
The issue is when "calL" gets called - all I get is the following
FAIL to load undefined.
Im guessing that just passing the correct authenticated session should work - but the question is how do I do it such that I can extract the info I want. Out of all the other attempts this has been the most fruitful. Please help - thanks!

There seem to be couple things going on here so I'll address them one-by-one.
The subprocess module in python is meant to be used to call out to the system as if you were using the command line. It knows nothing of "authenticated session"s and the command line (or shell) has no knowledge of how to use a python object, like a session, to work with phantomjs.
phantomjs has python bindings since version 1.8 so I would expect this might be made easier by using them. I have not used them, however, so I can not tell you with certainty that they will be helpful.
I looked at yslow's website and there appears to be no way to pass it the content that you are downloading with requests. Even then, the content would not have everything (for example: any externally hosted javascript that would be loaded by selenium/phantomjs or a browser, is not loaded by requests)
yslow seems as though it normally just downloads the URL for you and performs its analysis. When the website is behind NTLM, however, it first sends the client a 401 response which should indicate to the client that it must authenticate. Further, information is sent to the client that tells it how to authenticate and provides it parameters to use when authenticating for NTLM. This is how requests_ntlm works with requests. The first request is made and generates a 401 response, then the authentication handler generates the proper header(s) and re-sends the request which is why you see the 200 response bound to r.
yslow accepts a JSON representation of the headers you want to send so you can try to use the headers found in r.request.headers but I doubt they will work.
In short, this is not a question that the people who normally follow the requests tag can help you with. And looking at the documentation for yslow it seems that it (technically) does not support authentication of any type. yslow developers might argue though that it supports Basic Authentication because it allows you to specify headers.

HTTP status for different conflict scenarios

I'm implementing user registration for a Web Service.
When somebody wants to register an account, my WS sends an activation link to his/her mail. Until this link is clicked, user account is not activated (but the info is persisted in database, so the resource exists).
So my question is, if you try to register the same mail several times, you will get a 409 CONFLICT code. But there are two scenarios right there:
User account pending on confirmation
User already registered and activated
I would like to know what is the right approach. Should I "invent" an HTTP status 4XX to distinguish them, or send 409 with a JSON with info? other solutions?
Thx!
EDIT:
I have found this response -> https://stackoverflow.com/a/3290369/1171280 where Piskvor suggest to use 409 status and request header to explain the reason why it failed and/or body. Which one? header? body? both?
What do you think?
EDIT 2:
HTTP status + body with detailed error (with machine-parseable codes even) is OK, Twitter does that (https://dev.twitter.com/docs/error-codes-responses) and RESPECT :) . But I still doubt with 403 vs 409... :S

Pending account is a special type of a user account, so I think both accounts (already registered and pending) are same in the context of your question. You should return 409 in both cases. For the REST API both are same cases because that resource already exists in the system.
Regarding your updated question, I would suggest using body (JSON) to send out error(s) instead of using a custom HTTP header to explain the reason why the call failed. Reason is that in the body can you have multiple error messages (each one as a separate JSON object/array element) where as in the header you can have only one (though you can split based on some character). Other reason is that you can have one generic error handling method which looks for an "error" object in the JSON instead of looking for different custom headers for each failure scenario.
HTTP codes:
403 - The server understood the request, but is refusing to fulfill it. Authorization will not help and the request SHOULD NOT be
repeated.
409 - The request could not be completed due to a conflict with the current state of the resource. This code is only allowed in
situations where it is expected that the user might be able to resolve
the conflict and resubmit the request.
I think it should be 409 because the conflict can be resolved by re-issuing the request with different email address.

HTTP status codes are not meant to "invented".
409 CONFLICT sounds OK to me. Including details in the body ist OK, too, if your client needs to know.

Don't use 409. Use 403.
[409] is only allowed in situations where it is expected that the user might be able to resolve the conflict and resubmit the request.
It's for a request that should have been OK, but has a problem that can be resolved. If you edit a document and PUT the revised text but someone else did the same thing before you did, you should have a chance to look at the other person's work so you don't accidentally undo all their work. You'd get a 409 which means, if you want to revise it, you should send your revision with an indication that you've seen the latest revision by the other person -- i.e. you know what you're doing.
There's no way to 'correct' a redundant attempt to register. The only way to avoid the conflict is to register with a different username, but that's very incorrect.
I'm imagining a POST request that takes a username and email address and creates a new resource dedicated to that new user (which should now be used for validation), sending that resource's URL in an email. So you're dealing with the refusal of the POST request handler to create a new resource, for a reason specific to the business model of your application (rather than an HTTP-related reason like bad syntax).
There's no status code more specific to what you want than 403. In this case, all you should use HTTP's vocabulary to communicate is 'that's not allowed' -- use the layer on top of HTTP to communicate why, like a polite HTML page or a JSON object for the client to understand and render as a polite HTML page.

409 should be ok; for the details https://datatracker.ietf.org/doc/html/draft-nottingham-http-problem-04 might be of interest.

Connecting a desktop application with a website

I made an application using Qt/C++ that reads some values every 5-7 seconds and sends them to a website.
My approach is very simple. I am just reading the values i want to send and then i make an HTTP POST to the website. I also send the username and password to the website.
The problem is that i cannot find out if the request is successful. I mean that if i send the request and server gets it, i will get an HTTP:200 always. For example if the password is not correct, there is no way to know it. It is the way HTTP works.
Now i think i will need some kind of a protocol to take care the communication between the application and the website.
The question is what protocol to use?

If the action performed completes before the response header is sent you have the option of adding a custom status to it. If your website is built on PHP you can call header() to add the custom status of the operation.
header('XAppRequest-Status: complete');

if you can modify the server side script you could do the following
on one end :
You can make the HTTP post request via ajax
and evaluate the result of the ajax request.
On the serve side
On the HTTP request you do your process and if everything goes accordingly you can send data back to the ajax script that called it.
solves your problem .. ?

How post data over https with urllib2?

I want to integrate a credit card processing in my website using Paybox.com API's.
I have to send a POST request (using urllib2) to Paybox API's with credit card details (number, date, cvv) when a user submit a form.
How can I secure that? is it enougth to put https://www.mywebsite.com/card/processing in my form action?
How can I send POST data over HTTPS using urllib2?
PS: I work on Django.

Well in terms of security refer to this QA: POST data encryption - Is HTTPS enough?
As far as how to do it, here's an explanation about using urllib: http://www.codercaste.com/2009/11/28/how-to-use-the-urllib-python-library-to-fetch-url-data-and-more/
The idea is to use the urlencode command to create a parameters object for the request, then create a request object from the url and the parameters object, and then call urlopen on the request object in order to actually send the request.

Here are solutions using python-request lib: http://www.python-requests.org/en/latest/user/advanced/
request using ssl: http://docs.python-requests.org/en/latest/user/advanced/#ssl-cert-verification
request using post: http://docs.python-requests.org/en/latest/user/quickstart/#more-complicated-post-requests (should also allow verify=True parameter)
By the way, python-request is a very powerful and easy way to make requests.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js