I am using URLOpenPullStream along with a IBindStatusCallback and IHttpNegotiate callbacks to handle the negotiate, status, and data messages. Problem that I have is when the content is gzip (e.g. Content-Encoding: gzip). The data that I am receiving via OnDataAvailable is compressed. I need the uncompressed data. I am using BINDF_PULLDATA | BINDF_GETNEWESTVERSION | BINDF_NOWRITECACHE binding flags. I have read some posts that says it should support gzip format.
I initially tried to change the Accept-Encoding request header to specify that I did not want gzip but was unsucessful with this. I can change or add headers in BeginningTransaction, but it fails to change Accept-Content. I was able to change the User-Agent, and was able to add a new header, so the process works, but it would not override the Accept-Content for some reason.
Other option is to un-gzip the data myself. In a quick test using a C++ gzip library, I was able to ungzip the content. So, this may be an option. If this is what I need to do, what is the best method to detect it is gzip. I noticed that I got an OnProgress event with BINDSTATUS_MIMETYPEAVAILABLE and the text set to "application/x-gzip-compressed". Is this how I should detect it?
Looking for any solution to get around this problem! I do want to stay with URLOpenPullStream. This is a product that has been released and wish to keep changes to the minimum.
I will answer my own question after more research. It seems that the website that I having the issue with is returning something incorrect where IE, FF, and URLOpenPullStream do not recognize it as valid gzip content. The headers appear to be fine, e.g.
HTTP/1.1 200 OK
Content-Type: text/html; charset=iso-8859-1
Content-Encoding: none
Server: Microsoft-IIS/6.0
MSNSERVER: H: COL102-W41 V: 15.4.317.921 D: 2010-09-21T20:29:43
Vary: Accept-Encoding
Content-Encoding: gzip
Content-Length: 4258
Date: Wed, 27 Oct 2010 20:48:15 GMT
Connection: keep-alive
Set-Cookie: xidseq=4; domain=.live.com; path=/
Set-Cookie: LD=; domain=.live.com; expires=Wed, 27-Oct-2010 19:08:15 GMT; path=/
Cache-Control: no-cache, no-store
Pragma: no-cache
Expires: -1
Expires: -1
but URLOpenPullStream just downloaded it in raw compressed format, IE reports an error if you try to access the site, and FF shows garbage.
After doing a test with a site that does return valid gzip content, e.g. www.webcompression.org, then IE, FF, and URLOpenPullStream worked fine. So, it appears that URLOpenPullStream does support gzip content. In this case, it was transparent. In OnDataAvailable, I received the uncompressed data, and in the OnResponse, the headers did not show the Content-Encoding as gzip.
Unfortunately, this still did not solve my problem. I resolved by checking the response headers in OnResponse event. If the Content-Encoding was gzip, then I set a flag and when the download was complete, then used zlib gzip routines to uncompress the content. This seemed to work fine. This should be fine for my rare case since typically I should never receive a Content-Encoding : gzip in the OnResponse headers since the URLOpenPullStream handles the uncompress transparently.
Dunno :)
Related
I am working on a C++ project where i listen on sockets and generate HTTP responses based on the requests i get from my clients on my fds, in short i use my browser to send a request, i end up getting the raw request, i parse it and generate the corresponding http response.
However in the case of large POST requests, usually what happens is that i get partial requests, so in the first part i will usually only find the first line (version/method/uri), some headers but no body, and i guess am supposed to get the rest of the body somehow, however i am unable to figure out two things,
first of all how do i know if the request i am getting is partial or completed from just the first part ? i am not getting any information relating to range, here's the first part i get when my client sends me a POST request.
POST / HTTP/1.1
Host: localhost:8081
Connection: keep-alive
Content-Length: 8535833
Cache-Control: max-age=0
sec-ch-ua: " Not A;Brand";v="99", "Chromium";v="96", "Google Chrome";v="96"
sec-ch-ua-mobile: ?0
sec-ch-ua-platform: "Windows"
Origin: http://127.0.0.1:8081
Upgrade-Insecure-Requests: 1
DNT: 1
Content-Type: multipart/form-data; boundary=----WebKitFormBoundaryOs6fsdbaegBIumqh
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.93 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
Sec-Fetch-Site: cross-site
Sec-Fetch-Mode: navigate
Sec-Fetch-User: ?1
Sec-Fetch-Dest: document
Referer: http://127.0.0.1:8081/
Accept-Encoding: gzip, deflate, br
Accept-Language: fr,en-US;q=0.9,en;q=0.8
how can i figure out just from this whether or not am getting a partial request or just a faulty request (I need to generate a 400 error in the case of a request that says it has X content-length but the body size is different)
second question is, suppose i already know whether or not its partial, how do i proceed with storing the entire request in a buffer before sending it to my parser and generating a response ? here's my reception function (i already know the client's fd, so i just recv on it
void Client::receive_request(void)
{
char buffer[2024];
int ret;
ret = recv(_fd, buffer, 2024, 0);
buffer[ret] = 0;
_received_request += buffer;
_bytes_request += ret;
std::cout << "Raw Request:\n" << _received_request << std::endl;
if (buffer[ret-1] == '\n')
{
_ready_request = true;
_request.parse(_received_request, _server->get_config());
}
}
and here's the code that checks whether or not a client is attempting to send a request, parse and generate a response
int Connections::check_clients() {
int fd;
for (std::vector<Client*>::iterator client = clients.begin();
client != clients.end() && ready_fd != 0 ; client++)
{
fd = (*client)->get_fd();
if (FD_ISSET(fd, &ready_rset))
{
ready_fd--;
(*client)->receive_request();
if ((*client)->request_is_ready())
{
(*client)->wait_response();
close(fd);
FD_CLR(fd, &active_set);
fd_list.remove(fd);
max_fd = *std::max_element(fd_list.begin(), fd_list.end());
free(*client);
client = clients.erase(client);
}
}
}
return 0;
}
as you can see am coding everything in C++ (98) and would rather not get answers that just dismiss my questions and refer me to different technologies or libraries, unless it will help me understand what am doing wrong and how to handle partial requests.
for info, am only handling HTTP 1.1(GET/POST/DELETE only) and i usually only get this issue when am getting a large chunked file or a file upload that has a very large body. thank you
PS : if needed i can link up the github repo of the current project if you wanna look further into the code
how can i figure out just from this whether or not am getting a partial request or just a faulty request (I need to generate a 400 error in the case of a request that says it has X content-length but the body size is different)
The body size is, by definition, the size of the Content-Length field. Any bytes that you receive afterwards belong to the next HTTP request (see HTTP pipelining). If you do not receive Content-Length bytes within a reasonable time period, then you can make the server issue a 408 Request Timeout error.
second question is, suppose i already know whether or not its partial, how do i proceed with storing the entire request in a buffer before sending it to my parser and generating a response ? here's my reception function (i already know the client's fd, so i just recv on it
Your posted code has at least the following problems:
You should check the return value of recv to determine whether the function succeeded or failed, and if it failed, you should handle the error appropriately. In your current code, if recv fails with the return value -1, then you will write to the array buffer out of bounds, causing undefined behavior.
It does not seem appropriate to use the line if (buffer[ret-1] == '\n'). The HTTP request header will be over when you encounter a "\r\n\r\n", and the HTTP request body will be over when you have read Content-Length bytes of the body. The ends of the header and body will not necessarily occur at the end of the data read by recv, but can also occur in the middle. If you want to support HTTP pipelining, then the additional data should be handled by the handler for the next HTTP request. If you don't want to support HTTP pipelining, then you can simply discard the additional data and use Connection: close in the HTTP response header.
You seem to be using a null terminating character to mark the end of the data read by recv. However, this will not work if a byte with the value 0 is part of the HTTP request. It is probably safe to assume that such a byte should not be part of the HTTP request header, but it is probably not safe to assume that such a byte won't be part of the HTTP request body (for example when using POST with binary data).
I am migrating from Jetty 9.0.x to 9.4.x
org.eclipse.jetty.server.ResourceCache is removed from Jetty 9.4.x
Questions:
1) What is the replacement for this class in 9.4.x ?
2) I found CachedContentFactory is the closest equivalent of this class but constructor of this class takes one extra parameter CompressedContentFormat[] precompressedFormats. if this is a correct replacement then I am not sure what should I pass it in for this param? can it be empty array ? Sorry, javadocs did not help a lot either.
First some history.
Back during the major release Jetty 9.0.0 there were 2 main ways to handle static content:
DefaultHandler (and the inferior ResourceHandler).
When major release Jetty 9.4.0 rolled out (this is 4 major version releases of Jetty later then Jetty 9.0.0) an effort was made to make both of those component use a common codebase, so the ResourceService was created to standardize the servicing of static content in a single place. Now the differences between DefaultHandler and ResourceHandler were vastly reduced. (note: DefaultHandler still supports more features of its own and more features of the various HTTP specs)
Next, Issue #539 was resolved to allow the ResourceHandler (and now DefaultHandler) to have customized directory listings. To accomplish this the HttpOutput.ContentFactory interface was introduced.
The new HttpOutput.ContentFactory was responsible for returning the HttpContent representing the path provided (and an optional maximum buffer size configuration option).
So that means, at this point we have ...
A DefaultServlet (or ResourceHandler)
Which has a ResourceService
Which gets it's content from a HttpOutput.ContentFactory
The returned HttpContent can be a static resource, a directory listing, or a welcome file.
When it comes time to send a piece of static content the steps taken are ...
Ask for HttpContent object from HttpOutput.ContentFactory.getContent(path, maxBufferSize)
Ask for representation of HttpContent that can be used to send the referenced content, one of the following (in this order):
If HttpChannel is configured to use "direct buffers", then ask for HttpContent.getDirectBuffer() representing the entire content. (this could be a memory mapped file using a negligible amount of heap memory)
Ask for HttpContent.getIndirectBuffer() representing the entire content. (this could be a memory mapped file using a negligible amount of heap memory)
Ask for HttpContent.getReadableByteChannel() to send content.
Ask for HttpContent.getInputStream() to send content.
Return error indicating "Unknown Content"
There are 2 major implementations of HttpOutput.ContentFactory present in Jetty 9.4.0+
ResourceContentFactory which handles transient content (not cached) - if content exceeds maxBufferSize then raw ByteBuffer versions will not be returned.
CachedContentFactory which will cache various ByteBuffer values returned from previous HttpOutput usage.
The CachedContentFactory has a isCacheable(Resource) method that is interrogated to know if the supplied resource should enter the in-memory cache or not.
With regards to the CompressedContentFormat[] precompressedFormats parameter in the CachedContentFactory constructor, that refers to the "pre-compressed" formats supported by both the ResourceService and the CachedContentFactory.
Typical, default, setup is ...
CompressedContentFormat[] precompressedFormats = {
CompressedContentFormat.GZIP, // gzip compressed
CompressedContentFormat.BR, // brotli compressed
new CompressedContentFormat("bzip", ".bz") // bzip compressed
};
CachedContentFactory cachedContentFactory = new CachedContentFactory(parentContentFactory,
resourceFactory, mimeTypes, useFileMappedBuffers,
useEtags, precompressedFormats);
resourceService.setContentFactory(cachedContentFactory);
These precompressedFormats refer to static (and immutable) content that has been precompressed before server startup.
This allows a client to send a request for say ...
GET /css/main.css HTTP/1.1
Host: example.hostname.com
Accept-Encoding: gzip, deflate
and if the "Base Resource" directory has a ${resource.basedir}/css/main.css AND a ${resource.basedir}/css/main.css.gz then the response will be served from the main.css.gz (not the main.css), resulting in an HTTP response like ...
HTTP/1.1 200 OK
Date: Wed, 15 May 2019 20:17:22 GMT
Vary: Accept-Encoding
Last-Modified: Wed, 15 May 2019 20:17:22 GMT
Content-Type: text/css
ETag: W/"H/6qTDwA8vsH/6rJoEknqc"
Accept-Ranges: bytes
Content-Length: 11222
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
Problem: I am getting corrupted output when I send the GET request for a webpage.
GET http://www.vox.com/a/maps-explain-the-middle-east HTTP/1.1\r\nHost: www.vox.com\r\nUser-Agent: Mozilla/5.0 (Windows NT 6.2; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0\r\nAccept: */*\r\nAccept-Language: en-US,en;q=0.5\r\nAccept-Encoding: gzip, deflate\r\nConnection: keep-alive\r\n\r\n
output is a bad file containing data like
���v�������/�:#�J�|d[��Xt��tF(�p3E%������?�Λ�'��\k��E�7�q����"�®}_sϵ�ӛv'�,,ƣ'�=���� K{O>K����l�&�A:ϳ���rѯ��U�4X,f��������_k?=�}9����p��%��d�M���g�Y�([��q��\K�B&)��fdz
But when I send
GET http://www.vox.com/a/maps-explain-the-middle-east HTTP/1.1\r\nHost: www.vox.com\r\n\r\n
I got the webpage. I actually added only a few extra headers taken from firefox .
Response headers in both cases are
HTTP/1.0 200 OK
Server: nginx
Content-Type: text/html; charset=utf-8
Status: 200 OK
X-UA-Compatible: IE=Edge,chrome=1
Cache-Control: max-age=0, private, must-revalidate
X-Request-Id: f5e482e1dd57f613df9c1b416a65b9b2
X-Runtime: 0.039694
P3P: CP="CAO DSP COR CURa ADMa DEVa PSAa PSDa CONi OUR IND PHY ONL UNI COM NAV INT CNT STA"
Content-Encoding: gzip
Accept-Ranges: bytes
Date: Wed, 24 Sep 2014 10:39:19 GMT
Age: 0
X-Served-By: cache-iad2129-IAD, cache-lax1430-LAX
X-Cache: MISS, MISS
X-Cache-Hits: 0, 0
X-Timer: S1411555159.330146,VS0,VE108
Vary: Accept-Encoding
X-Cache: MISS from localhost
X-Cache-Lookup: MISS from localhost:3128
Via: 1.1 varnish-v4, 1.1 varnish, 1.1 varnish, 1.0 localhost (squid/3.1.19)
Connection: close
I don't understand why this is happening Is it some kind of compression.
Edit: But if I use compression how would I recover original data ???
Yeah I think it is probably due to encoding scheme used.
Because if I don't use 'Accept-Encoding: gzip, deflate' I get the correct webpage.
But i don't know how to recover webpage from this encoding
I have an app that appears to enable gzip encoding by default while sending data to the server.
We tried disabling the gzip compression by explicitly using:
IXMLHttpRequest2::SetRequestHeader(L"Accept-Encoding", L"") (on the HTTP Request Object, of course)
This still doesn't seem to help. Is there anyway to disable GZIP being enabled in the HTTP-Request headers from the C++ App?
Thanks!
To ask a server to do not use a specific encoding you should provide a list of Accept-Encoding values. From section 14.11 of RFC2616 (HTTP/1.1) you see that it has one of forms (values are examples):
Accept-Encoding: compress, gzip
Accept-Encoding:
Accept-Encoding: *
Accept-Encoding: compress;q=0.5, gzip;q=1.0
Accept-Encoding: gzip;q=1.0, identity; q=0.5, *;q=0
If the content-coding is one of the content-codings listed in Accept-Encoding field, then it is acceptable, unless it is by a qvalue of 0. (As defined in section 3.9, a of 0 means "not acceptable.")
Then to ask the server to do not use gzip compression you should provide, instead of an empty string, this value for Accept-Encoding:
gzip;q=0
This will require the server to do not use it and but you have to provide another encoding. See section 3.5 for available encodings. Use the quality q parameter to inform the server about your preferences (do not forget that if it can't provide that encoding for your request it'll reply with error 406).
identity;q=1.0, gzip=0.5
In this way you ask to use identity encoding and, in case it's not available, you can accept a gzip encoding too (this will prevent the server to reply with an error if it, for any reason, can't use any other encoding for your request). You may try performance of other encodings too (compress and deflate, for example).
Code
Then, finally, you have to use IXMLHttpRequest2::SetRequestHeader(L"Accept-Encoding", L"identity;q=1.0, gzip=0.5"). In SetRequestHeader you see that it's an append to headers sent by default so if you specify an empty string actually the value won't be changed (actually how it is interpreted may depends on the server, I didn't find any proper specification about this, you may inspect both your HTTP request and response to check what is actually sent/received).
Old value: Accept-Encoding: compress
Call: IXMLHttpRequest2::SetRequestHeader(L"Accept-Encoding", L"")
New value: Accept-Encoding: compress
Is there a way to turn ring's hash-map of parameters into the original response and request headers that the browser and server use to communicate?
So basically, instead of the hash-map structure that ring provides, I want to be able to generate and parse the raw text headers using ring.
Request:
Accept:text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Charset:ISO-8859-1,utf-8;q=0.7,*;q=0.3
Accept-Encoding:gzip,deflate,sdch
Accept-Language:en-US,en;q=0.8
Connection:keep-alive
Cookie:__qca=P0-1122510804-1338534864474; usr=t=pvgxSE5uUO9s&s=Ir7otYoeUaMb; __utma=140029553.119380626.1338534864.1340057197.1340064637.52; __utmb=140029553.6.10.1340064637; __utmc=140029553; __utmz=140029553.1340000628.50.23.utmcsr=google|utmccn=(organic)|utmcmd=organic|utmctr=(not%20provided)
Response:
Host:stackoverflow.com
If-Modified-Since:Tue, 19 Jun 2012 00:10:35 GMT
Referer:http://stackoverflow.com/posts/11092804/edit
User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.34 Safari/536.11
Response Headersview source
Cache-Control:public, max-age=60
Content-Encoding:gzip
Content-Length:33233
Content-Type:text/html; charset=utf-8
Date:Tue, 19 Jun 2012 00:26:48 GMT
Expires:Tue, 19 Jun 2012 00:27:48 GMT
Last-Modified:Tue, 19 Jun 2012 00:26:48 GMT
Set-Cookie:usr=t=Qz5ObGDYskmu&s=Ir7otYoeUaMb; domain=.stackoverflow.com; expires=Wed, 19-Dec-2012 00:26:48 GMT; path=/; HttpOnly
Vary:*
I don't think that Ring provides that because purpose of Ring is to create the abstractions over the HTTP request/response using hash map and other abstractions and even if ring provide you the low level access to headers strings, that would be a leaky abstraction. If you really want to access raw headers and generate raw response headers than you can use a web server like jetty interface directly rather than going though ring.