FTP upload with libcurl: getting CURLINFO_DATA_IN, timing out - c++

I am uploading to an FTP server using libcurl. In general things are working properly but I always get a timeout error with a specific server (timeout is set to one minute). The upload of the file itself does happen properly.
I used curl_easy_setopt with CURLOPT_DEBUGFUNCTION to setup a debug function to see what's going on. Once upload starts, I see that curl_infotype is set to CURLINFO_DATA_OUT for many calls, but I also see several calls where curl_infotype is set to CURLINFO_DATA_IN. Then, once upload is done but the server still connected, I keep getting curl_infotype set to CURLINFO_DATA_IN until the timeout is reached.
Some questions:
- why am I getting this CURLINFO_DATA_IN?
- how am I suppose to respond to it?
[Edit - I forgot to mention that FileZilla can upload to that server properly]

The debug callback gives you the actual data that is being sent and received, did you look at that data yet to see what it is? The FTP server does send a reply back to the client after a transfer completes, so that may account for the CURLINFO_DATA_IN notifications you are seeing. It is possible that maybe the server is sending back a reply that libcurl does not recognize correctly so it keeps waiting for more data that will never arrive. It is hard to say for sure without seeing what the actual communication back and forth really looks like.

Related

libcurl: send GET requests after timeout limit is reached

Problem:
OS: Ubuntu 20.04.1 LTS
When a target URL updates its content, recently libcurl has had unexpected polling delays / timeouts anywhere between 2 and 20+ seconds between sending a GET request to the target URL and receiving any response.
I have no idea what has been causing this behaviour, and have detailed all of the strace reports, tshark results, entire libcurl C++ program, attempts to diagnose, and other terminal outputs at the following SO question, but have had no luck in diagnosing this for about four months:
libcurl: abnormal GET response delays
There seems to be something between the client server and remote server that is stopping packets from being returned, but only when the page changes its content. During this polling delay / timeout, no other requests can be sent - therefore any new data uploaded on the remote server cannot be retrieved quickly.
This issue did not exist before mid-July 2021. Given that after four months this problem still hasn't been solved, I want to attempt a workaround that will still send requests to the target when this polling delay presents itself. I won't understand what caused the polling timeouts, but hopefully I will be able to retrieve the data without delays like the program used to do.
Target URL: https://ir.eia.gov/wpsr/table4.csv
Summary questions:
Q1. Is there a timeout option with libcurl that, when exceeded, the program does not exit but instead sends another GET request to the target URL?
Q2. Since this problem only arises when the target URL makes a scheduled content update, could there be a chance the target URL changes its IP address and thus there is some delay caused by a DNS resolution server in between the client and remote side on the return leg? I am going to attempt to use a tool like pingPlotter to see if there is a delay at some specific IP address between the outbound GET request and the response.
Before any scheduled page content changes, the latency between the outbound GET request and the response is <100 ms.

how to get notified when HTTP services are down?

We have a iOS app.We use HTTP services for getting and posting JSON data. Push notifications also enabled. If backend services are down is there any way to notify the user that services are down
did you try having a timout? if the app can't connect to the server for some time,
not only the connection attempt nearly always terminates, but it also raises a timeout exception on most of the programming languages I use.
try to check timeout specifications on the object you use to communicate via http, you're probabely able to implement them.
if you can't connect to the server in order to receive the http message
simply tell the user "server unavailable" or something like that.
ideally, if you know the backend server will be dead for a while (for
update purposes etc.) and you still can use the http server,
you may send an http containning text saying something like "server unavailable", or you may just send an empty message and then detect
it on the front end.(that is if you never send empty messages, anyways
I think it'll give you issues. "server unavailable" should be better.).
if the http server will be periodically unavailable, try something like implementing
update notifications. when you start up the app, the app asks when does a server
update will occur. and then you save it, and when it starts up again, it checks
whether or not such an update is happening at the moment.
besides that, if you really want to use push notifications and it'll
be periodically unavailable, before the server gets down - send a notification.
really you just need to use your imagination here.
but what you can't do - is send a notification when your server is down,
if you don't know it'll get down. mainly because you have no way of notifying the client (because the server you use for communication is down). however - as I stated above - what you can do is have the client checking for when the server is down(if it can't connect etc.).
if you have a backup server you can send a notification when the server gets down. if both the server and the backup server gets down, if only the backup server needs to inform you - the client will most likely won't know it's down.
you may use an external company to be your backup server. so if the electricity
is down (or something like that) it won't affect your notification system.
hope it helps.

How to keep a HTTP long-polling connection open?

I want to implement long polling in a web service. I can set a sufficiently long time-out on the client. Can I give a hint to intermediate networking components to keep the response open? I mean NATs, virus scanners, reverse proxies or surrounding SSH tunnels that may be in between of the client and the server and I have not under my control.
A download may last for hours but an idle connection may be terminated in less than a minute. This is what I want to prevent. Can I inform the intermediate network that an idle connection is what I want here, and not because the server has disconnected?
If so, how? I have been searching around four hours now but I don’t find information on this.
Should I send 200 OK, maybe some headers, and then nothing?
Do I have to respond 102 Processing instead of 200 OK, and everything is fine then?
Should I send 0x16 (synchronous idle) bytes every now and then? If so, before or after the initial HTTP status code, before or after the header? Do they make it into the transferred file, and may break it?
The web service / server is in C++ using Boost and the content file being returned is in Turtle syntax.
You can't force proxies to extend their idle timeouts, at least not without having administrative access to them.
The good news is that you can design your long polling solution in such a way that it can recover from a connection being suddenly closed.
One such design would be as follows:
Since long polling is normally used for event notifications (think the Observer pattern), you associate a serial number with each event.
The client makes a GET request carrying the serial number of the last event it has seen, either as part of the URL or in a cookie.
The server maintains a buffer of recent events. Upon receiving a GET request from the client, it checks if any of the buffered events need to be sent to the client, based on their serial numbers and the serial number provided by the client. If so, all such events are sent in one HTTP response. The response finishes at that point, in case there is a proxy that wants to buffer the whole response before relaying it further.
If the client is up to date, that is it didn't miss any of the buffered events, the server is delaying its response till another event is generated. When that happens, it's sent as one complete HTTP response.
When the client receives a response, it immediately sends a new one. When it detects the connection was closed, it creates a new one and makes a new request.
When using cookies to convey the serial number of the last event seen by the client, the client side implementation becomes really simple. Essentially you just enable cookies on the client side and that's it.

Check if server received data after timeout

I made a program that uses serveral RestAPI's of Bitcoin exchanges, e.g. Bitstamp
There is a function that allows me to do a trade: sell or buy Bitcoin for a specific price. Simplified, you have to call a URL with parameters like this:
https://www.bitstamp.net/api/trade?price=100&amount=1&type=sell
The server then answers in JSON. Example:
{"error":"","message":"Sold 1 BTC # 100$"}
If the trade was successful, my program continues. If it was not, it tries again (depending on the error message).
However, there is one problem. I'm using libcurl for the communication with the server and I set the CURLOPT_TIMEOUT to two seconds. It almost always works, but sometimes I get the following error:
Code #28: Operation timed out after 2000 milliseconds with 0 bytes received
When this happens, my program tries to trade again. But sometimes, despite the timeout, the trade was already made, which means it is done multiple times because my code tries again.
Can I somehow find out if the server atleast received all the data? The thing is if I increase CURLOPT_TIMEOUT to say 10 seconds, and the server does not answer, I have the same problem. So this is not a solution.
I do not know details of Bitstamp, but here is how HTTP works. Client sends a request to a server and receives a response. In the response, details about success or failure are described (by using HTTP error codes). However, if a timeout is received, then client has no information about it's request:
is it sent to the server;
did server receive it;
if server received the request, did it manage to process;
maybe server processed the request, but sending back the response failed due to the network issues.
For that reason, one should not count that the request was successful, and should resend the request. The problem you have described is certainly possible - server received the request, processed it but did not manage to send back the response. For that reason, other more complex protocols should be used, unfortunately HTTP is not one of them because of it's request-response nature.
Perhaps you should check if the given REST API gives some status for the transactions.
You are supposed to wait for the HTTP response to be a little bit more sure wether your request was successfully processed or not.
If you can access to the file descriptor, you can call ioctl() with the SIOCOUTQ (Linux) or FIONWRITE (BSD) -- I don't know the equivalent for Windows --, to check for unacknowledged sent data at socket level, before totally aborting you connection.
The problem is that it wouldn't be totally error-free either. Even though TCP is stateful at transport level, HTTP is stateless at application level. If your application needs transactional behavior (you dealing with currency, after all, aren't you?), it should provide a means for that.
All that said, I think two seconds might be too little. If you need speed because of multiple operations or something like that, consider parallelizing your connections.

Is this a good canditate for a web-service?

Ok so coming in from a completely different field of software development, I have a problem that's a little out of my experience. I'll state it as plainly as possible without giving out confidential details:
I want to make a server that "does stuff" when requested by a client on the same network. The client will most likely be a back-end to a content management system.
The request consists of some parameters, an input file and several output files.
The files are quite large, from 10MB - 100MB of data that must be processed (possibly more). The client can specify destination for output files.
The client needs to be able to find out the status of the request - eg position in queue, percent complete. And obviously when and where to pick up output.
So, my questions are - What is a good method for the client and server to communicate? Should the client poll the server, or provide a "callback" somehow for status updates?
At this point the implementation platform is completely open - anything from C to scripting languages like Ruby are available (at either end), my main issue is how the communication should occur.
First thought, set up some webservices between the machines. But webservices aren't going to be too friendly or efficient with the large files.
Simple appoach:
ServerA hits a web method on ServerB "BeginProcess". The response give you back a FTP location username/password, and ticket number.
ServerA delivers the files to FTP location.
ServerA regularly polls a webmethod "GetProcessStatus(ticketNumber)", possible return values: Awaiting files, Percent complete, Finished
Slightly more complicated approach, without the polling.
ServerA hits a web method on ServerB "BeginProcess(postUrl)", and you send along a URL you want status updates POSTed to. Response: FTP location username/password, and ticket number.
ServerA delivers the files to FTP location.
ServerB sends thru updates to the POST location on ServerA every XXX% completed.
For extra resilience you would keep the GetProcessStatus in case something gets lost in the ether...
Files that will be up to 100MB aren't a good choice for a webservice, since you run a risk of the HTTP session timing out before you have completed your processing.
Having a webservice for checking the status of these jobs would be more ideal. Handle the file transfers via FTP or whatever file transfer method you choose and poll a webservice for updates on status. When the process is completed, you might have an output file url returned that can be downloaded.