Jetty and max content size

Jetty and max content size - jetty

I use Jetty 9.4.8 and i want limits the amount of data that can post to the server. For that i added to jetty.xml:
<Call name="setAttribute">
<Arg>org.eclipse.jetty.server.Request.maxFormContentSize</Arg>
<Arg>10000</Arg>
</Call>
I tested jetty like (request-xxlarge.xml - text file(20mb)):
curl -X POST --data #request-xxlarge.xml http://localhost:8080/test -v
As a result i got
* Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 8080 (#0)
> POST /test HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.47.0
> Accept: */*
> Content-Length: 21232818
> Content-Type: application/x-www-form-urlencoded
> Expect: 100-continue
>
< HTTP/1.1 200 OK
< Content-Length: 4
< Connection: close
< Server: Jetty(9.4.8.v20171121)
<
* Closing connection 0
POST
server processed the request
EDITED
The server returns 413 when Content-Type: application / x-www-form-urlencoded.
But if I expect it to be a web service and i want to process Content-Type: application / soap + xml - then the maxFormContentSize parameter does not work.
How can I limit the size of the body for a web service?

There are 3 limits that Jetty can impose on request body content based on specific Content-Type values.
Content-Type: application/x-www-form-urlencoded
This is the standard html <form> submission, where the submitted form is accessed via the HttpServletRequest.getParameter(String key) API.
This kind of request body content can be filtered in 2 different ways.
org.eclipse.jetty.server.Request.maxFormContentSize - this is a limit on the overall size (in bytes) of the request body content.
org.eclipse.jetty.server.Request.maxFormKeys - this is a limit on the overall number of form keys of the form being submitted.
The above 2 keys expect integer values, and can be set in any of the following techniques ...
1) As a Server attribute that applies to all deployed webapps
Server.setAttribute("org.eclipse.jetty.server.Request.maxFormContentSize", 100000)
2) As a System Property that applies to all deployed webapps
$ java -Dorg.eclipse.jetty.server.Request.maxFormContentSize=100000 \
-jar /path/to/jetty-home/start.jar
or
System.setProperty("org.eclipse.jetty.server.Request.maxFormContentSize", "1000000")
3) As a Context parameter on the deployed WebAppContext
In your ${jetty.base}/webapps/<context>.xml you can set these values
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Configure PUBLIC "-//Jetty//Configure//EN"
"http://www.eclipse.org/jetty/configure_9_3.dtd">
<Configure class="org.eclipse.jetty.webapp.WebAppContext">
<Set name="contextPath">/mycontext</Set>
<Set name="war"><Property name="jetty.webapps"/>/mycontext.war</Set>
<Set name="maxFormKeys">100</Set>
<Set name="maxFormContentSize">202020</Set>
</Configure>
Content-Type: multipart/form-data
This is also a standard form submission as above, but the data can also be accessed via the HttpServletRequest.getPart(String name) APIs.
This kind of data is restricted via the configuration present for the destination Servlet's #MultipartConfig values (or WEB-INF/web.xml entries for the <servlet><multipart-config> elements)
To limit this kind of content type by entire request size, use the maxRequestSize configuration.
All other request Content-Type entries
For all other Content-Type values, the servlet spec, and Jetty are not involved in parsing or limiting the size.
If you are using a library to handle your requests (REST, SOAP, etc) then check with that library to see if it has a mechanism to reject requests based on size, or other reasons (eg: not well-formed, missing expected/required data, etc).
As always, if you want a feature, feel free to request it on the Eclipse Jetty issue tracker at
https://github.com/eclipse/jetty.project/issues

Not sure what your issue is, but I'll give it a shot.
###Theory###
The Expect: 100-continue header is specified in HTTP 1.1 and allows the server to acknowledge or reject a POST/PUT request immediately after the headers are send but before the client starts sending potentially large amounts of actual data (body of the request). Thus, a conforming client must then wait for a HTTP/1.1 100 Continue before sending the data. This scheme is advantageous for larger POST/PUT requests since in the case of rejection the client and server don't waste their time with superfluous network communication.
###Possible Solutions###
Disable Expect Logic
curl -H 'Expect:' -H 'Transfer-Encoding: ...' -H 'Content-Type: ...'
--data-binary '#mediumfile' example.com/path -v
Send next packet with the big file as expected after the first response packet.
More info here : https://www.rfc-editor.org/rfc/rfc7231#section-5.1.1

Related

How to allow empty POST bodies / avoid 411

I've discovered that if you send a request with an empty POST body, meaning no Content-Length header, the GCP Load Balancer (in this case from an Ingress controller through GKE) will reject your request with this error:
$ curl -L -X POST 'http://example.com/fund?amount=0'
<html><head>
<meta http-equiv="content-type" content="text/html;charset=utf-8">
<title>411 Length Required</title>
</head>
<body text=#000000 bgcolor=#ffffff>
<h1>Error: Length Required</h1>
<h2>POST requests require a <code>Content-length</code> header.</h2>
<h2></h2>
</body></html>
Assume I can't change the clients, is there some way to make the LB just accept empty bodies in POST requests?

The workarounds available as of the moment would be adding content headers content-length: 0 if you are to send HTTP POST requests with empty body. Per this RFC 2616 documentation:
If no response body is included, the response MUST include a Content-Length field with a field-value of "0"
Or to set the client to use http/1.1 as default:
For compatibility with HTTP/1.0 applications, HTTP/1.1 requests containing a message-body MUST include a valid Content-Length header field unless the server is known to be HTTP/1.1 compliant. If a request contains a message-body and a Content-Length is not given, the server SHOULD respond with 400 (bad request) if it cannot determine the length of the message, or with 411 (length required) if it wishes to insist on receiving a valid Content-Length.
Both options require intervention from the client side. Unfortunately, there are no available workarounds/adjustments that can be done from the GCP Load Balancer side at this time.

Google Cloud Run: Webhook POST causes 400 Response

We are catching a BigCommerce webhook event in our Google Cloud Run application. The request looks like:
Headers
host: abc-123-ue.a.run.app
AccountId: ABC
Content-Type: application/json
Password: Goodbye
Platform: BC
User-Agent: akka-http/10.1.10
Username: Hello
Content-Length: 197
Connection: keep-alive
Body
{"created_at":1594914374,"store_id":"1001005173","producer":"stores/gy68868uk5","scope":"store/product/created","hash":"139fab64ded23b3e1b8473ba24ab21bedd3f535b","data":{"type":"product","id":132}}
For some reason, this causes a 400 response from Google Cloud Run. Our application doesn't even seem to be passed the request. All other endpoints work (including other post requests).
Any ideas?
Edit
In the original post, I had the path in the host header. This was a mistake made in creating this post and not the actual value passed to us. We can only inspect the request via Requestbin (I can't find the request values anywhere in Google logs) so I'm speculating on the host value and made a mistake writing it out here.
Research so far...
So upon further testing, it seems that BigCommerce Webhooks also fail to send to any Google Cloud Function we set up. As a workaround, I'm having Pipedream catch the webhook and send the payload to our application. No problems there. This endpoint also works with mirror payloads from local and Zapier which seems to eliminate authentication errors.
We are running FastAPI on Google Run and the simplest function on Google Cloud Functions. This seems to be an error with how Google Serverless and BigCommerce Webhook Events communicate with each other. I'm just not sure how...
Here are the headers we managed to capture on one of the only times a BigCommerce Webhook Event came through to our Google Cloud Function:
Content-Length: 197
Content-Type: application/json
Host: us-central1-abc-123.cloudfunctions.net
User-Agent: akka-http/10.1.10
Forwarded: for="0.0.0.0";proto=https
Function-Execution-Id: unes7v34vzyo
X-Appengine-Country: ZZ
X-Appengine-Default-Version-Hostname: f696ddc1d56c3fd66p-tp.appspot.com
X-Appengine-Https: on
X-Appengine-Request-Log-Id: 5f10e15c00ff082ecbb02ee3a70001737e6636393664646331643536633366643636702d7470000165653637393633633164376565323033383131366437343031613365613263303a36000100
X-Appengine-Timeout-Ms: 599999
X-Appengine-User-Ip: 0.0.0.0
X-Cloud-Trace-Context: a62207698d141465d0f38488492d088b/9870406606828581415
X-Forwarded-For: 0.0.0.0
X-Forwarded-Proto: https
Accept-Encoding: gzip
Connection: close

> host: abc-123-ue.a.run.app/bigcommerce/webhooks/
This is most likely the issue. Host headers must contain only the hostname, not the request /paths.
You can clearly see this will fail:
$ curl -IvH 'Host: pdf-2wvlk7vg3a-uc.a.run.app/foo' https://pdf-2wvlk7vg3a-uc.a.run.app
...
HTTP/2 400
However if you don't craft the Host header yourself, it will work.

JMETER This site does not specify a policy in the P3P header ERROR

I am trying to hit this URL https://subdomain.example.com in JMeter and recorded using the Blazemeter Chrome extension has all the necessary config elements but get an error:
HTTP/1.1 429 Too Many Requests
Content-Type: text/html; charset=utf-8
Content-Length: 1031
Connection: keep-alive
Cache-Control: private, no-cache, no-store, must-revalidate
Date: Tue, 20 Aug 2019 01:21:35 GMT
Expires: 0
p3p: CP="This site does not specify a policy in the P3P header"
I have tried coping the Header Cookies from Browser Header Response which works for sometime but then start throwing an error

As per HTTP Status Code 429 Too Many Requests description:
The HTTP 429 Too Many Requests response status code indicates the user has sent too many requests in a given amount of time ("rate limiting").
A Retry-After header might be included to this response indicating how long to wait before making a new request.
So there are following options:
Your server is overloaded, in this case there is nothing you can do here apart from reporting the error as the bottleneck
Your script doesn't have proper correlation implemented, i.e. you're sending recorded hard-coded values instead of getting dynamic parameters
Your server doesn't allow such amount of requests from a single IP address within the given timeframe, you could try implementing IP Spoofing so your server would "think" that the requests are coming from the different machines.

Thanks for your reply. In the end I figured out that no limitation for number of calls implemented.
Now come to answer this is how I managed to work this:
Opened the page in chrome and from the header section copied all the header elements into the header manager hard coded.
First time it fails and returns p3p: CP="This site does not specify a policy in the P3P header" but also return the update variable value needed for next request which I extract and used in the next and subsequent Requests. The way I was able to find out which variable is changing by using the string comparison of 2 Response Headers
This was a difficult one but somehow worked with very minor change I also added the Header Manager to each request for safer side.

URL forbidden 403 when using a tool but fine from browser

I have some images that I need to do a HttpRequestMethod.HEAD in order to find out some details of the image.
When I go to the image url on a browser it loads without a problem.
When I attempt to get the Header info via my code or via online tools it fails
An example URL is http://www.adorama.com/images/large/CHHB74P.JPG
As mentioned, I have used the online tool Hurl.It to try and attain the Head request but I am getting the same 403 Forbidden message that I am getting in my code.
I have tried adding many various headers to the Head request (User-Agent, Accept, Accept-Encoding, Accept-Language, Cache-Control, Connection, Host, Pragma, Upgrade-Insecure-Requests) but none of this seems to work.
It also fails to do a normal GET request via Hurl.it. Same 403 error.
If it is relevant, my code is a c# web service and is running on the AWS cloud (just in case the adorama servers have something against AWS that I dont know about). To test this I have also spun up an ec2 (linux box) and run curl which also returned the 403 error. Running curl locally on my personal computer returns the binary image which is presumably just the image data.
And just to remove the obvious thoughts, my code works successfully for many many other websites, it is just this one where there is an issue
Any idea what is required for me to download the image headers and not get the 403?

same problem here.
Locally it works smoothly. Doing it from an AWS instance I get the very same problem.
I thought it was a DNS resolution problem (redirecting to a malfunctioning node). I have therefore tried to specify the same IP address as it was resolved by my client but didn't fix the problem.
My guess is that Akamai (the service is provided by an Akamai CDN in this case) is blocking AWS. It is understandable somehow, customers pay by traffic for CDN, by abusing it, people can generate huge bills.
Connecting to www.adorama.com (www.adorama.com)|104.86.164.205|:80... connected.
HTTP request sent, awaiting response...
HTTP/1.1 403 Forbidden
Server: **AkamaiGHost**
Mime-Version: 1.0
Content-Type: text/html
Content-Length: 301
Cache-Control: max-age=604800
Date: Wed, 23 Mar 2016 09:34:20 GMT
Connection: close
2016-03-23 09:34:20 ERROR 403: Forbidden.

I tried that URL from Amazon and it didn't work for me. wget did work from other servers that weren't on Amazon EC2 however. Here is the wget output on EC2
wget -S http://www.adorama.com/images/large/CHHB74P.JPG
--2016-03-23 08:42:33-- http://www.adorama.com/images/large/CHHB74P.JPG
Resolving www.adorama.com... 23.40.219.79
Connecting to www.adorama.com|23.40.219.79|:80... connected.
HTTP request sent, awaiting response...
HTTP/1.0 403 Forbidden
Server: AkamaiGHost
Mime-Version: 1.0
Content-Type: text/html
Content-Length: 299
Cache-Control: max-age=604800
Date: Wed, 23 Mar 2016 08:42:33 GMT
Connection: close
2016-03-23 08:42:33 ERROR 403: Forbidden.
But from another Linux host it did work. Here is output
wget -S http://www.adorama.com/images/large/CHHB74P.JPG
--2016-03-23 08:43:11-- http://www.adorama.com/images/large/CHHB74P.JPG
Resolving www.adorama.com... 23.45.139.71
Connecting to www.adorama.com|23.45.139.71|:80... connected.
HTTP request sent, awaiting response...
HTTP/1.0 200 OK
Content-Type: image/jpeg
Last-Modified: Wed, 23 Mar 2016 08:41:57 GMT
Server: Microsoft-IIS/8.5
X-AspNet-Version: 2.0.50727
X-Powered-By: ASP.NET
ServerID: C01
Content-Length: 15131
Cache-Control: private, max-age=604800
Date: Wed, 23 Mar 2016 08:43:11 GMT
Connection: keep-alive
Set-Cookie: 1YDT=CT; expires=Wed, 20-Apr-2016 08:43:11 GMT; path=/; domain=.adorama.com
P3P: CP="NON DSP ADM DEV PSD OUR IND STP PHY PRE NAV UNI"
Length: 15131 (15K) [image/jpeg]
Saving to: \u201cCHHB74P.JPG\u201d
100%[=====================================>] 15,131 --.-K/s in 0s
2016-03-23 08:43:11 (460 MB/s) - \u201cCHHB74P.JPG\u201d saved [15131/15131]
I would guess that the image provider is deliberately blocking requests from EC2 address ranges.
The reason the wget outgoing ip address is different in the two examples is due to DNS resolution on the cdn provider that adorama are providing

Web Server may implement ways to check particular fingerprint attributes to prevent automated bots . Here a few of them they can check
Geoip, IP
Browser headers
User agents
plugin info
Browser fonts return
You may simulate the browser header and learn some fingerprinting "attributes" here : https://panopticlick.eff.org
You can try replicate how a browser behave and inject similar headers/user-agent. Plain curl/wget are not likely to satisfied those condition, even tools like phantomjs occasionally get blocked. There is a reason why some prefer tools like selenium webdriver that launch actual browser.

I found using another url also being protected by AkamaiGHost was blocking due to certain parts in the user agent. Particulary using a link with protocol was blocked:
Using curl -H 'User-Agent: some-user-agent' https://some.website I found the following results for different user agents:
Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:70.0) Gecko/20100101 Firefox/70.0 okay
facebookexternalhit/1.1 (+http\://www.facebook.com/externalhit_uatext.php): 403
https ://bar: okay
https://bar: 403
All I could find for now is this (downvoted) answer https://stackoverflow.com/a/48137940/230422 stating that colons (:) are not allowed in header values. That is clearly not the only thing happening here as the Mozilla example also has a colon, only not a link.
I guess that at least most webservers don't care and allow facebook's bot and other bots having a contact url in their user agent. But appearently AkamaiGHost does block it.

HTTP sending response to OPTIONS request [C]

Getting Response is null error while receiving HTTP response.
I am developing an sample small HTTP server in C using row sockets.
There are actually 2 servers in my application one is standard Apache server which I am using for serving HTML pages and my small server will respond to only XMLHttpRequest sent from the Javascript within the HTML pages.
I am sending request from JavaScript as follows:
var sendReq = new XMLHttpRequest();
endReq.open("POST", "http://localhost:10000/", true);
sendReq.setRequestHeader('Content-Type','application/x-www-form-urlencoded');
sendReq.onreadystatechange = handleResult;
var param = "REQUEST_TYPE=2002&userName=" + userName.value;
param += "&password=" + password.value;
sendReq.send(param);
When I send this request I receive following Request in my server code:
OPTIONS / HTTP/1.1
Host: localhost:10000
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.3) Gecko/20100423 Ubuntu/10.04 (lucid) Firefox/3.6.3
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive
Origin: http://localhost:7777
Access-Control-Request-Method: POST
I have replied to this Request as follows using socket write function:
HTTP/1.1 200 OK\n
Access-Control-Allow-Origin: *\n
Server: PSL/1.0 (Unix) (Ubuntu/Linux)\n
Access-Control-Allow-Methods: POST, GET, OPTIONS\n
Accept-Ranges: bytes\n
Content-Length: 438\nConnection: close\n
Content-Type: text/html; charset=UTF-8\n\n
I don`t know What should be the HTTP actual response to be sent on request of OPTIONS.
After this I get my Actual POST request that I have sent from JavaScript and then I respond back with
HTTP/1.1 200 OK\n\n
And then at the browser end get error Response is null.
So how to send headers/data as HTTP Response using row sockets in 'C' and how to respond to OPTIONS request. Can someone explain me by giving some example?

It's hard to understand your question, but I believe you are pointing to this as the response giving you trouble:
HTTP/1.1 200 OK\n\n
You should be including other fields, especially the Content-Length and Content-Type. If you're going to build your own HTTP server, then you should review the protocol specifications.
That said, it's not at all clear why you need to replace the HTTP server instead of using either CGI or another server side language (PHP, Java, etc). This is significantly reducing your portability and maintainability.
Finally, you appear to be transmitting the password in the request. Make sure that this is only done over some kind of encrypted (HTTPS) or else physically secured connection.

I'm not sure what you're asking, but you might find the following useful:
HTTP Made Really Easy
HTTP/1.1 rfc2616.txt
MAMA - Opera Developer Community
I found them all quite useful when I was writing a HTTP client.

This problem had occured as after processing the OPTIONS request by our server, any subsequent requests made, for some reason, were required to be responded back with "Access-Control-Allow-Origin: *" along with other normal headers and response body.
After providing this line in our responses, I always got the desired responseText/responseXML in my javascript.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js