Jetty - large messages filtering - jetty

Hi I want to refuse incoming requests with too large body or header in my Jetty.I suppose that I have to set some filter, but I haven't found any solution. Do you have any suggestions? Thanks.

Easy enough to build a Servlet Filter or Jetty Handler that pays attention to the request's Content-Length header and then rejects (responds with an http error status code) the request.
As for the header size limit, that's controlled by the HttpConfiguration.setRequestHeaderSize(int)
However, there are a class of requests, that uses Chunked Transfer-Encoding, with these kinds of requests, there is no Content-Length and you will just have to reject the request when reading from the HttpServletRequest.getInputStream() after it hits a certain size.
There is also the complication of Mime multi-part request body content and how you determine the request content is too large.
One other note, unfortunately, due to how HTTP connection handling must be performed, even if a client sends you too large of a request body content, the server still has to read that entire body content and throw it away. This is the half-closed scenario found in the spec, its up to the client to see the early rejected http response and close/terminate the connection.

Related

Discriminating between infrastructure and business logic when using HTTP status codes

We are trying to build a REST interface that allows users to test the existence of a specific resource. Let's assume we're selling domain names: the user needs to determine if the domain is available.
An HTTP GET combined with 200 and 404 response codes seems sensible at first glance.
The problem we have is discriminating between a request successfully served by our lookup service, and a request served under exceptional behaviour from other components. For example:
404 and 200 can be returned by intermediary proxies that actually block the request. This can be due to proxy misconfiguration, or even external infrastructure such as coffee shop Wifi using poor forms-based authentication.
Clients could be using broken URLs. This could occur through deprecation or (again) by misconfiguration. We could combat the former through 301, however.
What is the current best practice for discriminating between responses that have been successfully fulfilled against the client's intention for that request, and responses served through exceptional behaviour?
The problem is eliminated by tunnelling responses through the response body, as we can ensure these are unique to our service. However, doesn't seem very RESTful!
Simply have your application add some content to its HTTP responses that will distinguish them from the responses thrown by intermediaries. Any or all of these would work:
Information about the error in the response content that is recognizable as your application's content (for example, Application error: Domain name not found (404))
A Content-Type header in the response that indicates that the response content should be decoded as an application error (for example, Content-Type: application/vnd.domain-finder.error+json)
A custom header in the response that indicates it is an application error
Once you implement a scheme like this, your API clients will need to be aware of the mechanism you choose if they want to react differently to application errors versus infrastructure errors, so just document it clearly.
I tend to follow the "do what's RESTful as long as it makes sense" line of thinking.
Let's say you have an API that looks like this:
/api/v1/domains/<name>/
Hitting /api/v1/domain/exists.com/ could then return a 200 with some whois information.
Hitting /api/v1/domain/doesnt.com/ could return a 404 with links to purchase options.
That would probably work. If the returned content follows a strict format (e.g. a JSON response with a results key) then your API's responses can be differentiated from your proxies' responses.
Alternatively, you could offer
/api/v1/domains/?search=maybe
/api/v1/domains/?lookup=maybe.com
This is now slightly less RESTful but it's still self-describing and (in my opinion) not that bad. Now every response can be a 200 and your content can reveal the results.

Mongoose Web Server HTTP Headers extremely slow

I have a mongoose server, with commands callable with AJAX. I get a CORS error if I call it without sending HTTP headers from mongoose (but visiting the address with the browser works just fine), but when I do send headers, it may take up to a minute before I get a response (but it does work), both with AJAX and the browser. My reply code:
//without headers
mg_printf(conn,reply.c_str());
//with headers
mg_printf(conn,"HTTP/1.1 200 OK\r\n"
"Content-Type: text/plain\n"
"Cache-Control: no-cache\n"
"Access-Control-Allow-Origin: *\n\n"
"%s\n", reply.c_str());
How can I speed this up? Am I sending my headers wrong?
Ok, I found a solution, it works if I first check whether the request is an api call or not, and only send the headers when it is.
The reason mongoose is slow is because it waits for the rest of the content until it times out. And the reason it waits is because you do not set Content-Length, in which case the "end of a content" marker is when connection closes.
So the correct solution is:
Add Content-Length header with correct body length, OR
Alternatively, use mg_send_header() and mg_printf_data() functions, in which case you don't need to bother with Content-Length cause these functions use chunked encoding.

Several 100-continue received from the server

I'm using libcurl (c++) library to make a request to an IIS 7.5 server. The transaction is a common SOAP webservice
Everything is working fine, my requests send an "Expect 100-continue" flag, and the server responds with a 100-continue and inmediately after that a 200 ok code along with the web service response.
But from time to time, the client receives a 100-continue message and after that, another 100 code. This makes the client report an error, as it expects a final status code right after the server 100 code. I read in W3C HTTP1.1 protocol:
An origin server that sends a 100 (Continue) response MUST
ultimately send a final status code, once the request body is
received and processed, unless it terminates the transport
connection prematurely.
The word "ultimately" makes me loose the track. Is it possible/common that a server sends several 100 codes after a final status code?
If anyone has faced this issue before, can point me to any explanation on how to handle multiple 100 response codes with libcurl?
Thanks in advance
The current spec says this on 100-continue:
The 100 (Continue) status code indicates that the initial part of a
request has been received and has not yet been rejected by the server. The server intends to send a final response after the request has been fully received and acted upon.
When the request contains an Expect header field that includes a
100- continue expectation, the 100 response indicates that the server wishes to receive the request payload body, as described in
Section 5.1.1. The client ought to continue sending the request and discard the 100 response.
If the request did not contain an Expect header field containing the 100-continue expectation, the client can simply discard this interim response.
The way I read it, it is not supposed to be more than one 100-continue response header and that's why libcurl works like this. I've never seen this (multiple 100 responses) happen and I've been doing HTTP for a while (I am the main developer of curl). To change this behavior I would expect you'd need to patch libcurl slightly to allow for this to happen.
It is not related to CURLOPT_FAILONERROR.
I suspect it's because there is an unhandled error that is not handled by the client properly. Make sure you set the CURLOPT_FAILONERROR flag.
See this SO post for more information.

Incorrect response from the server anfter GET request

When I send a request with "GET" in c++ like this:
GET / HTTP/1.1\r\nHost: site.com\r\n\r\n
I receive the proper answer. But when I configure the request according to what browsers do (I captured the headers from packet sniffer) the response from the server is 200 OK but the html body is a piece of garbage. Also the content-Length shown in the header proves that I didn't get the correct html response.
The problem occurs when adding "Accept-Encoding: gzip, deflate". I send exactly what the browser sends. But I receive different response than browser.
Why do you think this happens?
If you accept gzipped content, the server may send gzipped content. (In fact, some buggy servers send gzipped content even if you don't say you accept it!)
Notice that in the returned headers, it will include Content-Encoding: gzip, or maybe deflate instead of gzip. This tells you about the encoding. If it is gzipped, you need to decompress it with a library like zlib.
Another thing you might see in replies to HTTP 1.1 requests is that the connection won't necessarily close when it is completed, and you might get Transfer-Encoding: chunked, which will format the body differently. Chunked responses are a series of chunks with a hex length, then content, terminated by an empty chunk. Non-chunked responses, by contrast, are sent with a Content-Length header which tells you how much to expect. The content length is the length of the data it sends, which will be smaller if the data is compressed.
Unless you implement decompression, don't send Accept-Encoding. Chunked responses are something you'll probably have to implement though, since it is common in http 1.1 and if you do just http 1.0, you won't get to use the important host header.

REST Web Service: Acceptable HTTP response content-type when responding with status 4XX (Client Error)

I have been unable to find any documentation in the HTTP specifications that govern whether it is acceptable to generate a HTTP response include a human readable error message (e.g. content-type: text/plain) if an HTTP client has made an invalid HTTP request, and specified a request header that limits the acceptable response content types using an accept header.
Imagine a REST web service client issuing an invalid GET request to "http://myhost/validpath?illegalRequestParameter=rubbish", and including a request header "Accept: application/xml" or "Accept: application/vnd.ms-excel".
The server would respond with an HTTP status code in the 4XX series ("400 Bad Request", in this case). But how would the service be able to convey information to the client about the cause of the error?
I see the following options:
Create a plaintext error message in the HTTP response content.
Set response header "Content-type: text/plain" and include a descriptive error message in the response content. This would, however, break the HTTP client's "Accept" restriction.
Don't include a HTTP response content.
This is clearly valid, but not very helpful to the client that just knows that a "Client Error" occurred but has no way of knowing why (and reporting the reason in a client log file).
Try to coerce an error message into an "Accept'able" MIME type.
This is rarely possible. Even if the error message could be constructed as a valid application/xml type, it would likely break a web service contract (e.g. XML Schema conformance).
My question is: Is the above situation governed by existing HTTP specifications/standards?
References:
HTTP Status Code Definitions: http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html
HTTP Header Field Definitions http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html
If the accept header is 'application/xml', then you should return your error message as xml. It will break the client, but that doesn't matter, the client isn't getting the information they requested anyway. At least the client is able to parse the error message...