How to control chunks in jetty? - jetty

I'm writing HTTP webservice that may take quite long to produce results. I use embedded Jetty 8.1.5 and JAX-RS (Apache CXF)
I decided to go with some kind of control protocol: when new request comes, I start a long-running job in a separate thread and periodically write to HttpOutputStream lines with the current status ("CONTROL_MESSAGE: 42% done")
The problem is that Jetty uses chunk encoding here, so my status messages are buffered and are useless as they all can be buffered in a single chunk, providing no progress for a client.
I can't use Content-Length property as I don't know end result length. HttpOutputStream.flush() doesn't work as Jetty uses internal buffers.
As I see it, I need a way to tell Jetty "please finish current chunk and flush it", but don't know how.

Actually I'm not sure your problem is chunking. If I get you right you are writing the progress to the same stream where the result will finally go?
this can be done with or without chunking simply by calling flushBuffers on the response.
However, it then depends on your connection and your browser if this makes it into the client.
A transparent proxy may aggregate the content and not flush until the response is complete or its own buffer is full. The browser might not act on any content until the response is complete or its buffer is full.
In some implementations of comet that use this technique, you have to send 512 byes of white space after the pushed message.
You'd be better off using something like server sent events, websocket or long polling to send progress to the client..... better yet - just use cometd.org, which will select the best transport for the client available.

You should be able to disable chunking by disabling the persistent connection itself by adding a Connection: close header to the response, it is that or knowing the Content-Length ahead of time. Or I suppose you could just use HTTP/1.0 as well. I'll open a bug with jetty documentation to get this better documented.

Related

Testing: When writing a HTTP response to the socket, write the headers, then sleep before writing the body

This is surely a weird one ... I'm doing some extreme integration style testing on a custom Java HTTP client for a backend service I'm working with. For reasons which don't matter here, the client has some specific quirks and a custom solution was the only real option.
For automated testing, I've built a "fake" version of the backend service by spinning up a Jetty server locally and having it behave in different ways e.g. return 500, wait e.g. 4 seconds before giving a response to simulate latency etc and firing off a battery of tests against it with the client on build time.
Given the nature of this client, there is an usual and specific scenario which I need to test and I'm trying to find a way to make my Jetty serve behave in the correct fashion. Basically, when returning HTTP response, I need to immediately return the HTTP Headers and the first few bytes of the HTTP body and then sleep. The goal is to trigger a socket timeout in the client specifically when reading the HTTP body.
Anyone know where in Jetty I could plug something in to force this behaviour? Was looking at the Connector interface but not so sure thats the right place.
Thanks for any suggestions.
Write a few bytes to the HttpServletResponse.getOutputStream(), then call HttpServletResponse.flushBuffer() to immediately commit the response.
Bonus tip: use HttpServletResponse.sendError(-1) to terminate the connection abruptly.

How to keep a HTTP long-polling connection open?

I want to implement long polling in a web service. I can set a sufficiently long time-out on the client. Can I give a hint to intermediate networking components to keep the response open? I mean NATs, virus scanners, reverse proxies or surrounding SSH tunnels that may be in between of the client and the server and I have not under my control.
A download may last for hours but an idle connection may be terminated in less than a minute. This is what I want to prevent. Can I inform the intermediate network that an idle connection is what I want here, and not because the server has disconnected?
If so, how? I have been searching around four hours now but I don’t find information on this.
Should I send 200 OK, maybe some headers, and then nothing?
Do I have to respond 102 Processing instead of 200 OK, and everything is fine then?
Should I send 0x16 (synchronous idle) bytes every now and then? If so, before or after the initial HTTP status code, before or after the header? Do they make it into the transferred file, and may break it?
The web service / server is in C++ using Boost and the content file being returned is in Turtle syntax.
You can't force proxies to extend their idle timeouts, at least not without having administrative access to them.
The good news is that you can design your long polling solution in such a way that it can recover from a connection being suddenly closed.
One such design would be as follows:
Since long polling is normally used for event notifications (think the Observer pattern), you associate a serial number with each event.
The client makes a GET request carrying the serial number of the last event it has seen, either as part of the URL or in a cookie.
The server maintains a buffer of recent events. Upon receiving a GET request from the client, it checks if any of the buffered events need to be sent to the client, based on their serial numbers and the serial number provided by the client. If so, all such events are sent in one HTTP response. The response finishes at that point, in case there is a proxy that wants to buffer the whole response before relaying it further.
If the client is up to date, that is it didn't miss any of the buffered events, the server is delaying its response till another event is generated. When that happens, it's sent as one complete HTTP response.
When the client receives a response, it immediately sends a new one. When it detects the connection was closed, it creates a new one and makes a new request.
When using cookies to convey the serial number of the last event seen by the client, the client side implementation becomes really simple. Essentially you just enable cookies on the client side and that's it.

Check if server received data after timeout

I made a program that uses serveral RestAPI's of Bitcoin exchanges, e.g. Bitstamp
There is a function that allows me to do a trade: sell or buy Bitcoin for a specific price. Simplified, you have to call a URL with parameters like this:
https://www.bitstamp.net/api/trade?price=100&amount=1&type=sell
The server then answers in JSON. Example:
{"error":"","message":"Sold 1 BTC # 100$"}
If the trade was successful, my program continues. If it was not, it tries again (depending on the error message).
However, there is one problem. I'm using libcurl for the communication with the server and I set the CURLOPT_TIMEOUT to two seconds. It almost always works, but sometimes I get the following error:
Code #28: Operation timed out after 2000 milliseconds with 0 bytes received
When this happens, my program tries to trade again. But sometimes, despite the timeout, the trade was already made, which means it is done multiple times because my code tries again.
Can I somehow find out if the server atleast received all the data? The thing is if I increase CURLOPT_TIMEOUT to say 10 seconds, and the server does not answer, I have the same problem. So this is not a solution.
I do not know details of Bitstamp, but here is how HTTP works. Client sends a request to a server and receives a response. In the response, details about success or failure are described (by using HTTP error codes). However, if a timeout is received, then client has no information about it's request:
is it sent to the server;
did server receive it;
if server received the request, did it manage to process;
maybe server processed the request, but sending back the response failed due to the network issues.
For that reason, one should not count that the request was successful, and should resend the request. The problem you have described is certainly possible - server received the request, processed it but did not manage to send back the response. For that reason, other more complex protocols should be used, unfortunately HTTP is not one of them because of it's request-response nature.
Perhaps you should check if the given REST API gives some status for the transactions.
You are supposed to wait for the HTTP response to be a little bit more sure wether your request was successfully processed or not.
If you can access to the file descriptor, you can call ioctl() with the SIOCOUTQ (Linux) or FIONWRITE (BSD) -- I don't know the equivalent for Windows --, to check for unacknowledged sent data at socket level, before totally aborting you connection.
The problem is that it wouldn't be totally error-free either. Even though TCP is stateful at transport level, HTTP is stateless at application level. If your application needs transactional behavior (you dealing with currency, after all, aren't you?), it should provide a means for that.
All that said, I think two seconds might be too little. If you need speed because of multiple operations or something like that, consider parallelizing your connections.

Sockets in Linux - how do I know the client has finished?

I am currently trying to implement my own webserver in C++ - not for productive use, but for learning.
I basically open a socket, listen, wait for a connection and open a new socket from which I read the data sent by the client. So far so good. But how do I know the client has finished sending data and not simply temporarily stopped sending more because of some other reason?
My current example: When the client sends a POST-request, it first sends the headers, then two times "\r\n" in a row and then the request body. Sometimes the body does not contain any data. So if the client is temporarily unable to send anything after it sent the headers - how do I know it is not yet finished with its request?
Does this solely depend on the used protocol (HTTP) and it is my task to find this out on the basis of the data I received, or is there something like an EOF for sockets?
If I cannot get the necessary Information from the socket, how do I protect my program from faulty clients? (Which I guess I must do regardless of this, since it might be an attacker and not a faulty client sending wrong data.) Is my only option to keep reading until the request is complete by definition of the protocol or a timeout (defined by me) is reached?
I hope this makes sense.
Btw: Please don't tell me to use some library - I want to learn the basics.
The protocol (HTTP) tells you when the client has stopped sending data. You can't get the info from the socket as the client will leave it open waiting for a response.
As you say, you must guard against errant clients not sending proper requests. Typically in the case of an incomplete request a timeout is applied to the read. If you haven't received anything in 30 seconds, say, then close the socket and ignore it.
For an HTTP post, there should be a header (Content-Length) saying how many bytes to expect after the the end of the headers. If its a POST and there is no Content-Length, then reject it.
"Does this solely depend on the used protocol (HTTP) and it is my task to find this out on the basis of the data I received,"
Correct. You can find the HTTP spec via google;
http://www.w3.org/Protocols/rfc2616/rfc2616.html
"or is there something like an EOF for sockets?"
There is as it behaves just like a file ... but that's not applicable here because the client isn't closing the connection; you're sending the reply ON that connection.
With text based protocols like HTTP you are at the mercy of the client. Most well formatted POST will have a content-length so you know how much data is coming. However the client can just delay sending the data, or it may have had its Ethernet cable removed or just hang, in which case that socket is sitting there indefinitely. If it disconnects nicely then you will get a socket closed event/response from the recv().
Most well designed servers in that case will have a receive timeout, and if the socket is idle for more than say 30 seconds it will close that socket, so resources are not leaked by misbehaving clients.

Is this a good canditate for a web-service?

Ok so coming in from a completely different field of software development, I have a problem that's a little out of my experience. I'll state it as plainly as possible without giving out confidential details:
I want to make a server that "does stuff" when requested by a client on the same network. The client will most likely be a back-end to a content management system.
The request consists of some parameters, an input file and several output files.
The files are quite large, from 10MB - 100MB of data that must be processed (possibly more). The client can specify destination for output files.
The client needs to be able to find out the status of the request - eg position in queue, percent complete. And obviously when and where to pick up output.
So, my questions are - What is a good method for the client and server to communicate? Should the client poll the server, or provide a "callback" somehow for status updates?
At this point the implementation platform is completely open - anything from C to scripting languages like Ruby are available (at either end), my main issue is how the communication should occur.
First thought, set up some webservices between the machines. But webservices aren't going to be too friendly or efficient with the large files.
Simple appoach:
ServerA hits a web method on ServerB "BeginProcess". The response give you back a FTP location username/password, and ticket number.
ServerA delivers the files to FTP location.
ServerA regularly polls a webmethod "GetProcessStatus(ticketNumber)", possible return values: Awaiting files, Percent complete, Finished
Slightly more complicated approach, without the polling.
ServerA hits a web method on ServerB "BeginProcess(postUrl)", and you send along a URL you want status updates POSTed to. Response: FTP location username/password, and ticket number.
ServerA delivers the files to FTP location.
ServerB sends thru updates to the POST location on ServerA every XXX% completed.
For extra resilience you would keep the GetProcessStatus in case something gets lost in the ether...
Files that will be up to 100MB aren't a good choice for a webservice, since you run a risk of the HTTP session timing out before you have completed your processing.
Having a webservice for checking the status of these jobs would be more ideal. Handle the file transfers via FTP or whatever file transfer method you choose and poll a webservice for updates on status. When the process is completed, you might have an output file url returned that can be downloaded.