Python socket garbage response [closed] - python-2.7

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
Problem: I am getting corrupted output when I send the GET request for a webpage.
GET http://www.vox.com/a/maps-explain-the-middle-east HTTP/1.1\r\nHost: www.vox.com\r\nUser-Agent: Mozilla/5.0 (Windows NT 6.2; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0\r\nAccept: */*\r\nAccept-Language: en-US,en;q=0.5\r\nAccept-Encoding: gzip, deflate\r\nConnection: keep-alive\r\n\r\n
output is a bad file containing data like
���v�������/�:#�J�|d[��Xt��tF(�p3E%������?�Λ�'��\k��E�7�q����"�®}_sϵ�܏ӛv'�,,ƣ'�=���� K{O>K����l�&�A:ϳ���rѯ��U�4X,f��������_k?=�}9����p��%��d�M���g�Y�([��q��\K�B&)��fdz
But when I send
GET http://www.vox.com/a/maps-explain-the-middle-east HTTP/1.1\r\nHost: www.vox.com\r\n\r\n
I got the webpage. I actually added only a few extra headers taken from firefox .
Response headers in both cases are
HTTP/1.0 200 OK
Server: nginx
Content-Type: text/html; charset=utf-8
Status: 200 OK
X-UA-Compatible: IE=Edge,chrome=1
Cache-Control: max-age=0, private, must-revalidate
X-Request-Id: f5e482e1dd57f613df9c1b416a65b9b2
X-Runtime: 0.039694
P3P: CP="CAO DSP COR CURa ADMa DEVa PSAa PSDa CONi OUR IND PHY ONL UNI COM NAV INT CNT STA"
Content-Encoding: gzip
Accept-Ranges: bytes
Date: Wed, 24 Sep 2014 10:39:19 GMT
Age: 0
X-Served-By: cache-iad2129-IAD, cache-lax1430-LAX
X-Cache: MISS, MISS
X-Cache-Hits: 0, 0
X-Timer: S1411555159.330146,VS0,VE108
Vary: Accept-Encoding
X-Cache: MISS from localhost
X-Cache-Lookup: MISS from localhost:3128
Via: 1.1 varnish-v4, 1.1 varnish, 1.1 varnish, 1.0 localhost (squid/3.1.19)
Connection: close
I don't understand why this is happening Is it some kind of compression.
Edit: But if I use compression how would I recover original data ???

Yeah I think it is probably due to encoding scheme used.
Because if I don't use 'Accept-Encoding: gzip, deflate' I get the correct webpage.
But i don't know how to recover webpage from this encoding

Related

how to know if a http request is partial and how to fully parse it before generating a response c++

I am working on a C++ project where i listen on sockets and generate HTTP responses based on the requests i get from my clients on my fds, in short i use my browser to send a request, i end up getting the raw request, i parse it and generate the corresponding http response.
However in the case of large POST requests, usually what happens is that i get partial requests, so in the first part i will usually only find the first line (version/method/uri), some headers but no body, and i guess am supposed to get the rest of the body somehow, however i am unable to figure out two things,
first of all how do i know if the request i am getting is partial or completed from just the first part ? i am not getting any information relating to range, here's the first part i get when my client sends me a POST request.
POST / HTTP/1.1
Host: localhost:8081
Connection: keep-alive
Content-Length: 8535833
Cache-Control: max-age=0
sec-ch-ua: " Not A;Brand";v="99", "Chromium";v="96", "Google Chrome";v="96"
sec-ch-ua-mobile: ?0
sec-ch-ua-platform: "Windows"
Origin: http://127.0.0.1:8081
Upgrade-Insecure-Requests: 1
DNT: 1
Content-Type: multipart/form-data; boundary=----WebKitFormBoundaryOs6fsdbaegBIumqh
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.93 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
Sec-Fetch-Site: cross-site
Sec-Fetch-Mode: navigate
Sec-Fetch-User: ?1
Sec-Fetch-Dest: document
Referer: http://127.0.0.1:8081/
Accept-Encoding: gzip, deflate, br
Accept-Language: fr,en-US;q=0.9,en;q=0.8
how can i figure out just from this whether or not am getting a partial request or just a faulty request (I need to generate a 400 error in the case of a request that says it has X content-length but the body size is different)
second question is, suppose i already know whether or not its partial, how do i proceed with storing the entire request in a buffer before sending it to my parser and generating a response ? here's my reception function (i already know the client's fd, so i just recv on it
void Client::receive_request(void)
{
char buffer[2024];
int ret;
ret = recv(_fd, buffer, 2024, 0);
buffer[ret] = 0;
_received_request += buffer;
_bytes_request += ret;
std::cout << "Raw Request:\n" << _received_request << std::endl;
if (buffer[ret-1] == '\n')
{
_ready_request = true;
_request.parse(_received_request, _server->get_config());
}
}
and here's the code that checks whether or not a client is attempting to send a request, parse and generate a response
int Connections::check_clients() {
int fd;
for (std::vector<Client*>::iterator client = clients.begin();
client != clients.end() && ready_fd != 0 ; client++)
{
fd = (*client)->get_fd();
if (FD_ISSET(fd, &ready_rset))
{
ready_fd--;
(*client)->receive_request();
if ((*client)->request_is_ready())
{
(*client)->wait_response();
close(fd);
FD_CLR(fd, &active_set);
fd_list.remove(fd);
max_fd = *std::max_element(fd_list.begin(), fd_list.end());
free(*client);
client = clients.erase(client);
}
}
}
return 0;
}
as you can see am coding everything in C++ (98) and would rather not get answers that just dismiss my questions and refer me to different technologies or libraries, unless it will help me understand what am doing wrong and how to handle partial requests.
for info, am only handling HTTP 1.1(GET/POST/DELETE only) and i usually only get this issue when am getting a large chunked file or a file upload that has a very large body. thank you
PS : if needed i can link up the github repo of the current project if you wanna look further into the code
how can i figure out just from this whether or not am getting a partial request or just a faulty request (I need to generate a 400 error in the case of a request that says it has X content-length but the body size is different)
The body size is, by definition, the size of the Content-Length field. Any bytes that you receive afterwards belong to the next HTTP request (see HTTP pipelining). If you do not receive Content-Length bytes within a reasonable time period, then you can make the server issue a 408 Request Timeout error.
second question is, suppose i already know whether or not its partial, how do i proceed with storing the entire request in a buffer before sending it to my parser and generating a response ? here's my reception function (i already know the client's fd, so i just recv on it
Your posted code has at least the following problems:
You should check the return value of recv to determine whether the function succeeded or failed, and if it failed, you should handle the error appropriately. In your current code, if recv fails with the return value -1, then you will write to the array buffer out of bounds, causing undefined behavior.
It does not seem appropriate to use the line if (buffer[ret-1] == '\n'). The HTTP request header will be over when you encounter a "\r\n\r\n", and the HTTP request body will be over when you have read Content-Length bytes of the body. The ends of the header and body will not necessarily occur at the end of the data read by recv, but can also occur in the middle. If you want to support HTTP pipelining, then the additional data should be handled by the handler for the next HTTP request. If you don't want to support HTTP pipelining, then you can simply discard the additional data and use Connection: close in the HTTP response header.
You seem to be using a null terminating character to mark the end of the data read by recv. However, this will not work if a byte with the value 0 is part of the HTTP request. It is probably safe to assume that such a byte should not be part of the HTTP request header, but it is probably not safe to assume that such a byte won't be part of the HTTP request body (for example when using POST with binary data).

File downloading without extension Socket Programming

I am working in socket programming with help of C++.And I have to write a code to download file.
My Http Header for this is :
char header[] = "HTTP/1.1 200 OK\r\nContent-Type:application/vnd.ms-excel;Content-Disposition:attachment;filename:\"abc.xls\";Content-Length:14; \r\n\r\n";
but file get downloaded as "download" it does not have extension too.
I also tried this with
char header[] = "HTTP/1.1 200 OK\r\nContent-Type:application/octet-stream;Content-Disposition:attachment;filename:\"abc.xls\";Content-Length:14; \r\n\r\n";
but its not working.
Can anyone help me regarding this.?
The HTTP response header you've tried looks like this:
HTTP/1.1 200 OK
Content-Type:application/vnd.ms-excel;Content-Disposition:attachment;filename:"abc.xls";Content-Length:14;
instead it should look like this
HTTP/1.1 200 OK
Content-Type:application/vnd.ms-excel
Content-Disposition:attachment;filename="abc.xls"
Content-Length:14
I recommend you study the relevant standards before trying to implement a protocol. In this case this is the HTTP standard (RFC 7230 and following RFC) and "Use of the Content-Disposition Header Field in the
Hypertext Transfer Protocol (HTTP)" (RFC 6266).

Windows 8 (Store App) in C - How to disable gzip compression?

I have an app that appears to enable gzip encoding by default while sending data to the server.
We tried disabling the gzip compression by explicitly using:
IXMLHttpRequest2::SetRequestHeader(L"Accept-Encoding", L"") (on the HTTP Request Object, of course)
This still doesn't seem to help. Is there anyway to disable GZIP being enabled in the HTTP-Request headers from the C++ App?
Thanks!
To ask a server to do not use a specific encoding you should provide a list of Accept-Encoding values. From section 14.11 of RFC2616 (HTTP/1.1) you see that it has one of forms (values are examples):
Accept-Encoding: compress, gzip
Accept-Encoding:
Accept-Encoding: *
Accept-Encoding: compress;q=0.5, gzip;q=1.0
Accept-Encoding: gzip;q=1.0, identity; q=0.5, *;q=0
If the content-coding is one of the content-codings listed in Accept-Encoding field, then it is acceptable, unless it is by a qvalue of 0. (As defined in section 3.9, a of 0 means "not acceptable.")
Then to ask the server to do not use gzip compression you should provide, instead of an empty string, this value for Accept-Encoding:
gzip;q=0
This will require the server to do not use it and but you have to provide another encoding. See section 3.5 for available encodings. Use the quality q parameter to inform the server about your preferences (do not forget that if it can't provide that encoding for your request it'll reply with error 406).
identity;q=1.0, gzip=0.5
In this way you ask to use identity encoding and, in case it's not available, you can accept a gzip encoding too (this will prevent the server to reply with an error if it, for any reason, can't use any other encoding for your request). You may try performance of other encodings too (compress and deflate, for example).
Code
Then, finally, you have to use IXMLHttpRequest2::SetRequestHeader(L"Accept-Encoding", L"identity;q=1.0, gzip=0.5"). In SetRequestHeader you see that it's an append to headers sent by default so if you specify an empty string actually the value won't be changed (actually how it is interpreted may depends on the server, I didn't find any proper specification about this, you may inspect both your HTTP request and response to check what is actually sent/received).
Old value: Accept-Encoding: compress
Call: IXMLHttpRequest2::SetRequestHeader(L"Accept-Encoding", L"")
New value: Accept-Encoding: compress

Is there any way to look at the raw header from ring?

Is there a way to turn ring's hash-map of parameters into the original response and request headers that the browser and server use to communicate?
So basically, instead of the hash-map structure that ring provides, I want to be able to generate and parse the raw text headers using ring.
Request:
Accept:text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Charset:ISO-8859-1,utf-8;q=0.7,*;q=0.3
Accept-Encoding:gzip,deflate,sdch
Accept-Language:en-US,en;q=0.8
Connection:keep-alive
Cookie:__qca=P0-1122510804-1338534864474; usr=t=pvgxSE5uUO9s&s=Ir7otYoeUaMb; __utma=140029553.119380626.1338534864.1340057197.1340064637.52; __utmb=140029553.6.10.1340064637; __utmc=140029553; __utmz=140029553.1340000628.50.23.utmcsr=google|utmccn=(organic)|utmcmd=organic|utmctr=(not%20provided)
Response:
Host:stackoverflow.com
If-Modified-Since:Tue, 19 Jun 2012 00:10:35 GMT
Referer:http://stackoverflow.com/posts/11092804/edit
User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.34 Safari/536.11
Response Headersview source
Cache-Control:public, max-age=60
Content-Encoding:gzip
Content-Length:33233
Content-Type:text/html; charset=utf-8
Date:Tue, 19 Jun 2012 00:26:48 GMT
Expires:Tue, 19 Jun 2012 00:27:48 GMT
Last-Modified:Tue, 19 Jun 2012 00:26:48 GMT
Set-Cookie:usr=t=Qz5ObGDYskmu&s=Ir7otYoeUaMb; domain=.stackoverflow.com; expires=Wed, 19-Dec-2012 00:26:48 GMT; path=/; HttpOnly
Vary:*
I don't think that Ring provides that because purpose of Ring is to create the abstractions over the HTTP request/response using hash map and other abstractions and even if ring provide you the low level access to headers strings, that would be a leaky abstraction. If you really want to access raw headers and generate raw response headers than you can use a web server like jetty interface directly rather than going though ring.

URLOpenPullStream and gzip content download - need uncompressed data

I am using URLOpenPullStream along with a IBindStatusCallback and IHttpNegotiate callbacks to handle the negotiate, status, and data messages. Problem that I have is when the content is gzip (e.g. Content-Encoding: gzip). The data that I am receiving via OnDataAvailable is compressed. I need the uncompressed data. I am using BINDF_PULLDATA | BINDF_GETNEWESTVERSION | BINDF_NOWRITECACHE binding flags. I have read some posts that says it should support gzip format.
I initially tried to change the Accept-Encoding request header to specify that I did not want gzip but was unsucessful with this. I can change or add headers in BeginningTransaction, but it fails to change Accept-Content. I was able to change the User-Agent, and was able to add a new header, so the process works, but it would not override the Accept-Content for some reason.
Other option is to un-gzip the data myself. In a quick test using a C++ gzip library, I was able to ungzip the content. So, this may be an option. If this is what I need to do, what is the best method to detect it is gzip. I noticed that I got an OnProgress event with BINDSTATUS_MIMETYPEAVAILABLE and the text set to "application/x-gzip-compressed". Is this how I should detect it?
Looking for any solution to get around this problem! I do want to stay with URLOpenPullStream. This is a product that has been released and wish to keep changes to the minimum.
I will answer my own question after more research. It seems that the website that I having the issue with is returning something incorrect where IE, FF, and URLOpenPullStream do not recognize it as valid gzip content. The headers appear to be fine, e.g.
HTTP/1.1 200 OK
Content-Type: text/html; charset=iso-8859-1
Content-Encoding: none
Server: Microsoft-IIS/6.0
MSNSERVER: H: COL102-W41 V: 15.4.317.921 D: 2010-09-21T20:29:43
Vary: Accept-Encoding
Content-Encoding: gzip
Content-Length: 4258
Date: Wed, 27 Oct 2010 20:48:15 GMT
Connection: keep-alive
Set-Cookie: xidseq=4; domain=.live.com; path=/
Set-Cookie: LD=; domain=.live.com; expires=Wed, 27-Oct-2010 19:08:15 GMT; path=/
Cache-Control: no-cache, no-store
Pragma: no-cache
Expires: -1
Expires: -1
but URLOpenPullStream just downloaded it in raw compressed format, IE reports an error if you try to access the site, and FF shows garbage.
After doing a test with a site that does return valid gzip content, e.g. www.webcompression.org, then IE, FF, and URLOpenPullStream worked fine. So, it appears that URLOpenPullStream does support gzip content. In this case, it was transparent. In OnDataAvailable, I received the uncompressed data, and in the OnResponse, the headers did not show the Content-Encoding as gzip.
Unfortunately, this still did not solve my problem. I resolved by checking the response headers in OnResponse event. If the Content-Encoding was gzip, then I set a flag and when the download was complete, then used zlib gzip routines to uncompress the content. This seemed to work fine. This should be fine for my rare case since typically I should never receive a Content-Encoding : gzip in the OnResponse headers since the URLOpenPullStream handles the uncompress transparently.
Dunno :)