libcurl: send GET requests after timeout limit is reached - c++

Problem:
OS: Ubuntu 20.04.1 LTS
When a target URL updates its content, recently libcurl has had unexpected polling delays / timeouts anywhere between 2 and 20+ seconds between sending a GET request to the target URL and receiving any response.
I have no idea what has been causing this behaviour, and have detailed all of the strace reports, tshark results, entire libcurl C++ program, attempts to diagnose, and other terminal outputs at the following SO question, but have had no luck in diagnosing this for about four months:
libcurl: abnormal GET response delays
There seems to be something between the client server and remote server that is stopping packets from being returned, but only when the page changes its content. During this polling delay / timeout, no other requests can be sent - therefore any new data uploaded on the remote server cannot be retrieved quickly.
This issue did not exist before mid-July 2021. Given that after four months this problem still hasn't been solved, I want to attempt a workaround that will still send requests to the target when this polling delay presents itself. I won't understand what caused the polling timeouts, but hopefully I will be able to retrieve the data without delays like the program used to do.
Target URL: https://ir.eia.gov/wpsr/table4.csv
Summary questions:
Q1. Is there a timeout option with libcurl that, when exceeded, the program does not exit but instead sends another GET request to the target URL?
Q2. Since this problem only arises when the target URL makes a scheduled content update, could there be a chance the target URL changes its IP address and thus there is some delay caused by a DNS resolution server in between the client and remote side on the return leg? I am going to attempt to use a tool like pingPlotter to see if there is a delay at some specific IP address between the outbound GET request and the response.
Before any scheduled page content changes, the latency between the outbound GET request and the response is <100 ms.

Related

JMeter Test Result - Connect time is 100% of request (HTTP) sample time

I am using JMeter 5.1 to conduct performance testing (HTTP requests), the system tested is web application on Google Cloud Platform. For JMeter, I configured 10 threads(users), and got error messages for failed request: failed: Connection timed out (Connection timed out)
and check result in "View Results in Table" found Connect time = Sample time and there is no latency time.
Under this condition, What should I try to find the root cause. Is there any analyze direction or method?
You can take a look to this link it explain some causes related to your issue.
In addition, if you are using GCP HTTP(S) Load balancer, you should take a look to stackdriver logs. That will provide more clues for your issue. You may need to change the timeout values for your load balancer or your backend.
This is due to TCP connection timeout, if JMeter fails to get the response within the bounds of the TCP Handshake it will fail the associated HTTP Request sampler.
Latency (also known as TTFB) would be 0 because there is no any response from the server.
same for sent bytes, same for header size in bytes.
If for your application response times over 2 minutes is acceptable and expected you can increase Connect timeout, the relevant setting lives under "Advanced" tab of the HTTP Request sampler (or even better HTTP Request Defaults)
For example this is how you can increase the Connect timeout to 5 minutes:
With regards to root cause, the reasons are numerous, i.e.:
wrong protocol/port combination
the port is blocked by OS firewall or GCP firewall
the application or VM is overloaded and cannot respont
incorrect configuration of application server/database/other middleware
etc.

How to keep a HTTP long-polling connection open?

I want to implement long polling in a web service. I can set a sufficiently long time-out on the client. Can I give a hint to intermediate networking components to keep the response open? I mean NATs, virus scanners, reverse proxies or surrounding SSH tunnels that may be in between of the client and the server and I have not under my control.
A download may last for hours but an idle connection may be terminated in less than a minute. This is what I want to prevent. Can I inform the intermediate network that an idle connection is what I want here, and not because the server has disconnected?
If so, how? I have been searching around four hours now but I don’t find information on this.
Should I send 200 OK, maybe some headers, and then nothing?
Do I have to respond 102 Processing instead of 200 OK, and everything is fine then?
Should I send 0x16 (synchronous idle) bytes every now and then? If so, before or after the initial HTTP status code, before or after the header? Do they make it into the transferred file, and may break it?
The web service / server is in C++ using Boost and the content file being returned is in Turtle syntax.
You can't force proxies to extend their idle timeouts, at least not without having administrative access to them.
The good news is that you can design your long polling solution in such a way that it can recover from a connection being suddenly closed.
One such design would be as follows:
Since long polling is normally used for event notifications (think the Observer pattern), you associate a serial number with each event.
The client makes a GET request carrying the serial number of the last event it has seen, either as part of the URL or in a cookie.
The server maintains a buffer of recent events. Upon receiving a GET request from the client, it checks if any of the buffered events need to be sent to the client, based on their serial numbers and the serial number provided by the client. If so, all such events are sent in one HTTP response. The response finishes at that point, in case there is a proxy that wants to buffer the whole response before relaying it further.
If the client is up to date, that is it didn't miss any of the buffered events, the server is delaying its response till another event is generated. When that happens, it's sent as one complete HTTP response.
When the client receives a response, it immediately sends a new one. When it detects the connection was closed, it creates a new one and makes a new request.
When using cookies to convey the serial number of the last event seen by the client, the client side implementation becomes really simple. Essentially you just enable cookies on the client side and that's it.

Check if server received data after timeout

I made a program that uses serveral RestAPI's of Bitcoin exchanges, e.g. Bitstamp
There is a function that allows me to do a trade: sell or buy Bitcoin for a specific price. Simplified, you have to call a URL with parameters like this:
https://www.bitstamp.net/api/trade?price=100&amount=1&type=sell
The server then answers in JSON. Example:
{"error":"","message":"Sold 1 BTC # 100$"}
If the trade was successful, my program continues. If it was not, it tries again (depending on the error message).
However, there is one problem. I'm using libcurl for the communication with the server and I set the CURLOPT_TIMEOUT to two seconds. It almost always works, but sometimes I get the following error:
Code #28: Operation timed out after 2000 milliseconds with 0 bytes received
When this happens, my program tries to trade again. But sometimes, despite the timeout, the trade was already made, which means it is done multiple times because my code tries again.
Can I somehow find out if the server atleast received all the data? The thing is if I increase CURLOPT_TIMEOUT to say 10 seconds, and the server does not answer, I have the same problem. So this is not a solution.
I do not know details of Bitstamp, but here is how HTTP works. Client sends a request to a server and receives a response. In the response, details about success or failure are described (by using HTTP error codes). However, if a timeout is received, then client has no information about it's request:
is it sent to the server;
did server receive it;
if server received the request, did it manage to process;
maybe server processed the request, but sending back the response failed due to the network issues.
For that reason, one should not count that the request was successful, and should resend the request. The problem you have described is certainly possible - server received the request, processed it but did not manage to send back the response. For that reason, other more complex protocols should be used, unfortunately HTTP is not one of them because of it's request-response nature.
Perhaps you should check if the given REST API gives some status for the transactions.
You are supposed to wait for the HTTP response to be a little bit more sure wether your request was successfully processed or not.
If you can access to the file descriptor, you can call ioctl() with the SIOCOUTQ (Linux) or FIONWRITE (BSD) -- I don't know the equivalent for Windows --, to check for unacknowledged sent data at socket level, before totally aborting you connection.
The problem is that it wouldn't be totally error-free either. Even though TCP is stateful at transport level, HTTP is stateless at application level. If your application needs transactional behavior (you dealing with currency, after all, aren't you?), it should provide a means for that.
All that said, I think two seconds might be too little. If you need speed because of multiple operations or something like that, consider parallelizing your connections.

FTP upload with libcurl: getting CURLINFO_DATA_IN, timing out

I am uploading to an FTP server using libcurl. In general things are working properly but I always get a timeout error with a specific server (timeout is set to one minute). The upload of the file itself does happen properly.
I used curl_easy_setopt with CURLOPT_DEBUGFUNCTION to setup a debug function to see what's going on. Once upload starts, I see that curl_infotype is set to CURLINFO_DATA_OUT for many calls, but I also see several calls where curl_infotype is set to CURLINFO_DATA_IN. Then, once upload is done but the server still connected, I keep getting curl_infotype set to CURLINFO_DATA_IN until the timeout is reached.
Some questions:
- why am I getting this CURLINFO_DATA_IN?
- how am I suppose to respond to it?
[Edit - I forgot to mention that FileZilla can upload to that server properly]
The debug callback gives you the actual data that is being sent and received, did you look at that data yet to see what it is? The FTP server does send a reply back to the client after a transfer completes, so that may account for the CURLINFO_DATA_IN notifications you are seeing. It is possible that maybe the server is sending back a reply that libcurl does not recognize correctly so it keeps waiting for more data that will never arrive. It is hard to say for sure without seeing what the actual communication back and forth really looks like.

Asynchronous server stopping getting data from client with no visible reason

I have a problem with client-server application. As I've almost run out of sane ideas for its solving I am asking for help. I've stumbled into described situation about three or four times now. Provided data is from last failure, when I've turned all the possible logging, messages dumping and so on.
System description
1) Client. Works under Windows. I take as an assumption that there is no problem with its work (judging from logs)
2) Server. Works under Linux (RHEL 5). It is server where I has a problem.
3) Two connections are maintained between client and server: one command and one for data sending. Both work asynchronously. Both connections live in one thread and on one boost::asio::io_service.
4) Data to be sent from client to server is messages delimeted by '\0'.
5) Data load is about 50 Mb/hour, 24 hours a day.
6) Data is read on server side using boost::asio::async_read_until with corresponding delimeter
Problem
- For two days system worked as expected
- On third day at 18:55 server read one last message from client and then stopped reading them. No info in logs about new data.
- From 18:55 to 09:00 (14 hours) client reported no errors. So it sent data (about 700 Mb) successfully and no errors arose.
- At 08:30 I started investigation of a problem. Server process was alive, both connections between server and client were alive too.
- At 09:00 I attached to server process with gdb. Server was in sleeping state, waiting for some signal from system. I believe I accidentally hit Ctrl + C and may be there was some message.
- Later in logs I found message with smth like 'system call interrupted'. After that both connections to client were dropped. Client reconnected and server started to worked normally.
- The first message processed by server was timestamped at 18:57 on client side. So after restarting normal work, server didn't drop all the messages up to 09:00, they were stored somewhere and it processed them accordingly after that.
Things I've tried
- Simulated scenario above. As server dumped all incoming messages I've wrote a small script which presented itself as client and sent all the messages back to server again. Server dropped with out of memory error, but, unfortunately, it was because of high data load (about 3 Gb/hour this time), not because of the same error. As it was Friday evening I had no time to correctly repeat the experiment.
- Nevertheless, I've run server through Valgrind to detect possible memory leaks. Nothing serious was found (except the fact that server was dropped because of high load), no huge memory leaks.
Questions
- Where were these 700 Mb of data which client sent and server didn't get? Why they were persistent and weren't lost when server restarted the connection?
- It seems to me that problem is someway connected with server not getting message from boost::asio::io_service. Buffer is get filled with data, but no calls to read handler are made. Could this be problem on OS side? Something wrong with asynchronous calls may be? If it is so, how could this be checked?
- What can I do to detect the source of problem? As i said I've run out of sane ideas and each experiment costs very much in terms of time (it takes about two or three days to get the system to described state), so I need to run as much possible checks for experiment as I could.
Would be grateful for any ideas I can use to get to the error.
Update: Ok, it seems that error was in synchronous write left in the middle of asynchronous client-server interaction. As both connections lived in one thread, this synchronous write was blocking thread for some reason and all interaction both on command and data connection stopped. So, I changed it to async version and now it seems to work.
As i said I've run out of sane ideas and each experiment costs very
much in terms of time (it takes about two or three days to get the
system to described state)
One way to simplify investigation of this problem is to run server inside some Virtual Machine until it reaches this broken state. Then you can make snapshot of whole system and revert to it every time when things go wrong during investigation. At least you will not have to wait 3 days to get this state again.