C++ Curl Multi Perform Blocking Issue - c++

In my QT app, I been using curl's curl_easy_setopt, and notice that its actually synchronous which was blocking my main gui. Inside my app I have a timer that has a set interval that calls curl. Anytime my timer callback would run curl, it would block my app for a few seconds than continue. So now im trying to figure out how to perform curl's multi interface and multi perform which is asynchronous, and its giving me the same blocking/lagging issue. Can anyone give me advice.
Below is my code, as well as curl's website demo for multi perform.
/********Header Files******/
#include <sys/time.h>
#include <unistd.h>
....
/************My Timer runs the code below every 10 seconds************/
std::string url = searchEngineParam.toStdString();
std::string userAgent = options[5]->userAgentsOptions[0].toStdString();
CURL *http_handle;
CURLM *multi_handle;
int still_running; /* keep number of running handles */
int repeats = 0;
curl_global_init(CURL_GLOBAL_DEFAULT);
http_handle = curl_easy_init();
curl_easy_setopt(http_handle, CURLOPT_URL, url.c_str());
curl_easy_setopt(http_handle, CURLOPT_FOLLOWLOCATION, 1L);
curl_easy_setopt(http_handle,CURLOPT_USERAGENT,userAgent.c_str());
curl_easy_setopt(http_handle, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_easy_setopt(http_handle, CURLOPT_WRITEDATA, (void *)&chunk);
curl_easy_setopt(http_handle, CURLOPT_WRITEFUNCTION, WriteMemoryCallback);
/* init a multi stack */
multi_handle = curl_multi_init();
/* add the individual transfers */
curl_multi_add_handle(multi_handle, http_handle);
/* we start some action by calling perform right away */
curl_multi_perform(multi_handle, &still_running);
do {
CURLMcode mc; /* curl_multi_wait() return code */
int numfds;
/* wait for activity, timeout or "nothing" */
mc = curl_multi_wait(multi_handle, NULL, 0, 1000, &numfds);
if(mc != CURLM_OK) {
fprintf(stderr, "curl_multi_wait() failed, code %d.\n", mc);
break;
}
/* 'numfds' being zero means either a timeout or no file descriptors to
wait for. Try timeout on first occurrence, then assume no file
descriptors and no file descriptors to wait for means wait for 100
milliseconds. */
if(!numfds) {
repeats++; /* count number of repeated zero numfds */
if(repeats > 1) {
WAITMS(100); /* sleep 100 milliseconds */
}
}
else
repeats = 0;
curl_multi_perform(multi_handle, &still_running);
} while(still_running);
curl_multi_remove_handle(multi_handle, http_handle);
curl_easy_cleanup(http_handle);
curl_multi_cleanup(multi_handle);
curl_global_cleanup();

Related

LibCurl C++: slowing down the sending of requests when multiplexing

Goal:
To slightly slow down the sending of requests when multiplexing with libcurl, possibly by introducing small time delays in between the sending of each of the HTTP/2 request to a server. The multiplexing program needs to listen out for any changes from one webpage for around 3 seconds at a set time once a day. However, the multiplexing program finishes execution in under a second even when setting the variable num_transfers to the thousands (variable seen in the code further below).
It would be useful if there was a way to introduce for example, a 3 millisecond delay in between the transmission of a group of multiplex requests. This would mean the program could still send requests asynchronously (so it be won't blocked / won't have to wait for a response from the server before sending the next request) but at a slightly slower rate.
A definition of multiplexing taken from this resource:
Multiplexing is a method in HTTP/2 by which multiple HTTP requests can be sent and responses can be received asynchronously via a single TCP connection. Multiplexing is the heart of HTTP/2 protocol.
Ideal outcome:
An ideal program for this situation would be one that could send a few non-blocking/ multiplex requests every approx. 3 milliseconds. The program would run for around 3-4 seconds in total.
Current problem:
Currently the program is too fast when multiplexing, meaning a few thousand requests could be sent and received within around 350 milliseconds which can lead to the sending IP address to be blocked for a few minutes..
Please note it is not an option in this scenario to use a synchronous / blocking approach - a requirement of this program is that it must not be forced to wait for a response to be returned before sending another request. The issue lies in the fact that the program is too fast at sending a high number of requests.
Attempts at solving:
In the DO...WHILE loop seen in the code below, an attempt was made to introduce some artificial time delays at various locations within the loop using usleep(microseconds) from unistd.h, but this introduced a time delay either before or after all of the requests were sent, rather than an interleaved time delay between sending requests.
Current code:
#include <iostream>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <chrono>
#include <string>
/* somewhat unix-specific */
#include <sys/time.h>
#include <unistd.h>
/* curl stuff */
#include <curl/curl.h>
#include <curl/mprintf.h>
#ifndef CURLPIPE_MULTIPLEX
#define CURLPIPE_MULTIPLEX 0
#endif
struct CURLMsg *msg;
struct transfer {
CURL *easy;
unsigned int num;
//FILE *out;
std::string contents;
};
struct MemoryStruct {
char *memory;
size_t size;
};
struct MemoryStruct chunk;
#define NUM_HANDLES 1000
static size_t WriteMemoryCallback(void *contents, size_t size, size_t nmemb, void *userp) {
transfer *t = (transfer *)userp;
size_t realsize = size * nmemb;
t->contents.append((const char*)contents, realsize);
return realsize;
}
static void setup(struct transfer *t, int num)
{
CURL *hnd;
hnd = t->easy = curl_easy_init();
curl_easy_setopt(hnd, CURLOPT_WRITEFUNCTION, WriteMemoryCallback);
curl_easy_setopt(hnd, CURLOPT_WRITEDATA, (void *)t);
/* set the same URL */
curl_easy_setopt(hnd, CURLOPT_URL, "https://someurl.xyz");
/* HTTP/2 please */
curl_easy_setopt(hnd, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_2_0);
/* we use a self-signed test server, skip verification during debugging */
curl_easy_setopt(hnd, CURLOPT_SSL_VERIFYPEER, 0L);
curl_easy_setopt(hnd, CURLOPT_SSL_VERIFYHOST, 0L);
#if (CURLPIPE_MULTIPLEX > 0)
/* wait for pipe connection to confirm */
curl_easy_setopt(hnd, CURLOPT_PIPEWAIT, 1L);
#endif
}
int main() {
struct transfer trans[NUM_HANDLES];
CURLM *multi_handle;
int i;
int still_running = 0; /* keep number of running handles */
int num_transfers = 3;
chunk.memory = (char*)malloc(1);
chunk.size = 0;
/* init a multi stack */
multi_handle = curl_multi_init();
for(i = 0; i < num_transfers; i++) {
setup(&trans[i], i);
/* add the individual transfer */
curl_multi_add_handle(multi_handle, trans[i].easy);
}
curl_multi_setopt(multi_handle, CURLMOPT_PIPELINING, CURLPIPE_MULTIPLEX);
// curl_multi_setopt(multi_handle, CURLMOPT_MAX_TOTAL_CONNECTIONS, 1L);
// Main loop
do {
CURLMcode mc = curl_multi_perform(multi_handle, &still_running);
if(still_running) {
/* wait for activity, timeout or "nothing" */
mc = curl_multi_poll(multi_handle, NULL, 0, 1000, NULL);
}
if(mc) {
break;
}
// Get response
do {
int queued;
msg = curl_multi_info_read(multi_handle, &queued);
if ((msg) && (msg->msg == CURLMSG_DONE) && (msg->data.result == CURLE_OK)) {
// Get size of payload
curl_off_t dl_size;
curl_easy_getinfo(msg->easy_handle, CURLINFO_SIZE_DOWNLOAD_T, &dl_size);
for (int i = 0; i < num_transfers; i++) {
std::cout << trans[i].contents;
}
std::cout << std::flush;
}
} while (msg);
} while (still_running);
for(i = 0; i < num_transfers; i++) {
curl_multi_remove_handle(multi_handle, trans[i].easy);
curl_easy_cleanup(trans[i].easy);
}
free(chunk.memory);
curl_multi_cleanup(multi_handle);
return 0;
}
Summary question:
Is there a way to send a group of multiplexed requests to a single URL approximately every 3 milliseconds? Another idea to attempt to solve this was to wrap the entire functionality contained in main() within a FOR loop, and putting a time delay at the end of each iteration of the FOR loop.

Libcurl C++: Non-blocking way to send requests to a single URL

Goal: To send requests to the same URL without having to wait for the request-sending function to finish executing.
Currently when I send a request to a URL, I have to wait around 10 ms for the server's response before sending another request using the same function. The aim is to detect changes on a webpage slightly faster than the program currently is doing, so for the WHILE loop to behave in a non-blocking manner.
Question: Using libcurl C++, if I have a WHILE loop that calls a function to send a request to a URL, how can I avoid waiting for the function to finish executing before sending another request to the SAME URL?
Note: I have been researching libcurl's multi-interface but I am struggling to determine if this interface is more suited to parallel requests to multiple URLs rather than sending requests to the same URL without having to wait for the function to finish executing each time. I have tried the following and looked at these resources:
an attempt at multi-threading a C program using libcurl requests
How to do curl_multi_perform() asynchronously in C++?
http://www.godpatterns.com/2011/09/asynchronous-non-blocking-curl-multi.html
https://curl.se/libcurl/c/multi-single.html
https://curl.se/libcurl/c/multi-poll.html
Here is my attempt at sending a request to one URL, but I have to wait for the request() function to finish and return a response code before sending the same request again.
#include <vector>
#include <iostream>
#include <curl/curl.h>
size_t write_callback(char *ptr, size_t size, size_t nmemb, void *userdata) {
std::vector<char> *response = reinterpret_cast<std::vector<char> *>(userdata);
response->insert(response->end(), ptr, ptr+nmemb);
return nmemb;
}
long request(CURL *curl, const std::string &url) {
std::vector<char> response;
long response_code;
curl_easy_setopt(curl, CURLOPT_URL, url.c_str());
curl_easy_getinfo(curl, CURLINFO_RESPONSE_CODE, &response_code);
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, write_callback);
curl_easy_setopt(curl, CURLOPT_WRITEDATA, &response);
auto res = curl_easy_perform(curl);
// ...
// Print variable "response"
// ...
return response_code;
}
int main() {
curl_global_init(CURL_GLOBAL_ALL);
CURL *curl = curl_easy_init();
while (true) {
// blocking: request() must complete before executing again
long response_code = request(curl, "https://example.com");
// ...
// Some condition breaks loop
}
curl_easy_cleanup(curl);
curl_global_cleanup();
return 0;
}
I'm at a point where I have tried to understand the multi-interface documentation as best as possible, but still struggle to fully understand it / determine if it's actually suited to my particular problem. Apologies if this question appears to have not provided enough of my own research, but there are gaps in my libcurl knowledge I'm struggling to fill.
I'd appreciate it if anyone could suggest / explain ways in which I can modify my single libcurl example above to behave in a non-blocking manner.
EDIT:
From libcurl's C implemented example called "multi-poll", when I run the below program the URL's content is printed, but because it only prints once despite the WHILE (1) loop I'm confused as to whether or not it is sending repeated non-blocking requests to the URL (which is the aim), or just one request and is waiting on some other change/event?
#include <stdio.h>
#include <string.h>
/* somewhat unix-specific */
#include <sys/time.h>
#include <unistd.h>
/* curl stuff */
#include <curl/curl.h>
int main(void)
{
CURL *http_handle;
CURLM *multi_handle;
int still_running = 1; /* keep number of running handles */
curl_global_init(CURL_GLOBAL_DEFAULT);
http_handle = curl_easy_init();
curl_easy_setopt(http_handle, CURLOPT_URL, "https://example.com");
multi_handle = curl_multi_init();
curl_multi_add_handle(multi_handle, http_handle);
while (1) {
CURLMcode mc; /* curl_multi_poll() return code */
int numfds;
/* we start some action by calling perform right away */
mc = curl_multi_perform(multi_handle, &still_running);
if(still_running) {
/* wait for activity, timeout or "nothing" */
mc = curl_multi_poll(multi_handle, NULL, 0, 1000, &numfds);
}
// if(mc != CURLM_OK) {
// fprintf(stderr, "curl_multi_wait() failed, code %d.\n", mc);
// break;
// }
}
curl_multi_remove_handle(multi_handle, http_handle);
curl_easy_cleanup(http_handle);
curl_multi_cleanup(multi_handle);
curl_global_cleanup();
return 0;
}
You need to move curl_multi_add_handle and curl_multi_remove_handle inside the
while loop. Below is the extract from curl documentation https://curl.se/libcurl/c/libcurl-multi.html
When a single transfer is completed, the easy handle is still left added to the >multi stack. You need to first remove the easy handle with curl_multi_remove_handle >and then close it with curl_easy_cleanup, or possibly set new options to it and add >it again with curl_multi_add_handle to start another transfer.

Curl - Sending hundreds of requests but only four at a time - Programming

How do you proceed to solve this problem? I've hundreds of requests to be sent to Curl but I can send only four at a time.
Thus, I need to make four requests using curl at the same time and processes their responses. However, once one of the curl pointer is available, I need to send another request.
This is because, the server can handle only four requests at a time but I've hundreds of requests to be sent to the server.
Following is the code, I got from curl site
int main(void)
{
const int HANDLECOUNT = 4;
CURL *handles[HANDLECOUNT];
CURLM *multi_handle;
int still_running = 0; /* keep number of running handles */
int i;
CURLMsg *msg; /* for picking up messages with the transfer status */
int msgs_left; /* how many messages are left */
/* Allocate one CURL handle per transfer */
for(i = 0; i<HANDLECOUNT; i++)
handles[i] = curl_easy_init();
/* set the options (I left out a few, you'll get the point anyway) */
curl_easy_setopt(handles[0], CURLOPT_URL, "website");
curl_easy_setopt(handles[0], CURLOPT_POSTFIELDS, XMLRequestToPost.c_str());
curl_easy_setopt(handles[0], CURLOPT_POSTFIELDSIZE, (long)strlen(XMLRequestToPost.c_str()));
curl_easy_setopt(handles[1], CURLOPT_URL, "website");
curl_easy_setopt(handles[2], CURLOPT_URL, "website");
curl_easy_setopt(handles[3], CURLOPT_URL, "website");
/* set the request for other 3 handles too */
/* init a multi stack */
multi_handle = curl_multi_init();
/* add the individual transfers */
for(i = 0; i<HANDLECOUNT; i++)
curl_multi_add_handle(multi_handle, handles[i]);
/* we start some action by calling perform right away */
curl_multi_perform(multi_handle, &still_running);
while(still_running) {
}
}
Create a thread-safe queue to put your requests into.
Start 4 threads, each one with its own CURL object.
Have each thread run a loop that:
pulls the next request from the queue,
sends it
processes/dispatches the response as needed,
and repeats
Until the queue is empty.

Crash on curl_easy_perform() when uploading a file on CURL in C++

I'm having an issue with a crash when uploading a file via the curl library in C++. I'm using the exact demo code from this location: https://curl.haxx.se/libcurl/c/fileupload.html
The only thing I change in the code is the upload location, to upload to a local wamp server on windows, and the file to upload, which I've verified that its opening ok.
I'm running through visual studio 2014, and building CURL through DLL
The output from the program is:
* Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 80 (#0)
> PUT /replayupload.php HTTP/1.1
Host: 127.0.0.1
Accept: */*
Content-Length: 43
Expect: 100-continue
< HTTP/1.1 100 Continue
*then I get a crash at line 66 in the program. It seems the line:
res = curl_easy_perform(curl);
is causing the problem with an invalid parameter. I have verified that the curl variable is not null, but I'm finding it very difficult to get any more debug info than that, the call stack just references a memory address within the DLL.
I'm able to run the demo to just upload post variables and get a page, this runs fine without a crash. The crash only occurs when uploading a file.
my exact code is:
int main(void)
{
CURL *curl;
CURLcode res;
struct stat file_info;
double speed_upload, total_time;
FILE *fd;
fd = fopen("E:\\testfiles\\test.txt", "rb"); /* open file to upload */
if (!fd)
return 1; /* can't continue */
/* to get the file size */
if (fstat(_fileno(fd), &file_info) != 0)
return 1; /* can't continue */
curl = curl_easy_init();
if (curl) {
/* upload to this place */
curl_easy_setopt(curl, CURLOPT_URL,
"http://127.0.0.1/testupload.php");
/* tell it to "upload" to the URL */
curl_easy_setopt(curl, CURLOPT_UPLOAD, 1L);
/* set where to read from (on Windows you need to use READFUNCTION too) */
curl_easy_setopt(curl, CURLOPT_READDATA, fd);
/* and give the size of the upload (optional) */
curl_easy_setopt(curl, CURLOPT_INFILESIZE_LARGE,
(curl_off_t)file_info.st_size);
/* enable verbose for easier tracing */
curl_easy_setopt(curl, CURLOPT_VERBOSE, 1L);
res = curl_easy_perform(curl);
/* Check for errors */
if (res != CURLE_OK) {
fprintf(stderr, "curl_easy_perform() failed: %s\n",
curl_easy_strerror(res));
}
else {
/* now extract transfer info */
curl_easy_getinfo(curl, CURLINFO_SPEED_UPLOAD, &speed_upload);
curl_easy_getinfo(curl, CURLINFO_TOTAL_TIME, &total_time);
fprintf(stderr, "Speed: %.3f bytes/sec during %.3f seconds\n",
speed_upload, total_time);
}
/* always cleanup */
curl_easy_cleanup(curl);
}
fclose(fd);
return 0;
}
Thanks to Tkausl for spotting the line
/* set where to read from (on Windows you need to use READFUNCTION too) */
I added this line to my code
curl_easy_setopt(curl, CURLOPT_READFUNCTION, &fread);
And now everything seems to work.

No difference between curl_easy and curl_multi

I'm performing HTTP requests from my C++ program to my PHP script with libcurl.
The first easy_ version below works good, however it is quite slow (12 requests per second on localhost). Nothing strange - I got similar results using ab -n 1000 -c 1.
On the other hand ab -n 1000 -c 100 performs much more better with 600 request per second.The thing is, using libcurl multi doesn't seem to be concurrent. I used just slightly modified example code and the result is also about 12 req/s.
Do I understand curl_multi right? How can I achieve results similar to ab?
PS. I know that both codes are a bit different, however almost whole time is spent on curl work.
The easy_ way:
CURL *curl;
CURLcode response; // HTTP response
curl = curl_easy_init();
if(curl)
{
curl_easy_setopt(curl, CURLOPT_URL, "http://localhost/process.php");
while(true)
{
if(!requestsQueue.empty())
{
mtx.lock();
string data = requestsQueue.front();
requestsQueue.pop();
mtx.unlock();
const char *post = data.c_str(); //convert string to char used by CURL
curl_easy_setopt(curl, CURLOPT_POSTFIELDS, post);
do
{
response = curl_easy_perform(curl);
} while(response != CURLE_OK);
}
else
{
//there are no request to perform, so wait for them
cout << "Sleeping...\n";
sleep(2);
continue;
}
}
//curl_easy_cleanup(curl);
}
else
{
cout << "CURL init failed!\n";
}
The multi way:
CURLM *multi_handle;
int still_running; /* keep number of running handles */
/* init a multi stack */
multi_handle = curl_multi_init();
/* add the individual transfers */
for(int i=1;i<=300;i++)
{
CURL *handle;
handle = curl_easy_init();
curl_easy_setopt(handle, CURLOPT_URL, "http://localhost/process.php");
curl_multi_add_handle(multi_handle, handle);
}
/* we start some action by calling perform right away */
curl_multi_perform(multi_handle, &still_running);
do {
struct timeval timeout;
int rc; /* select() return code */
fd_set fdread;
fd_set fdwrite;
fd_set fdexcep;
int maxfd = -1;
long curl_timeo = -1;
FD_ZERO(&fdread);
FD_ZERO(&fdwrite);
FD_ZERO(&fdexcep);
/* set a suitable timeout to play around with */
timeout.tv_sec = 1;
timeout.tv_usec = 0;
curl_multi_timeout(multi_handle, &curl_timeo);
if(curl_timeo >= 0) {
timeout.tv_sec = curl_timeo / 1000;
if(timeout.tv_sec > 1)
timeout.tv_sec = 1;
else
timeout.tv_usec = (curl_timeo % 1000) * 1000;
}
/* get file descriptors from the transfers */
curl_multi_fdset(multi_handle, &fdread, &fdwrite, &fdexcep, &maxfd);
/* In a real-world program you OF COURSE check the return code of the
function calls. On success, the value of maxfd is guaranteed to be
greater or equal than -1. We call select(maxfd + 1, ...), specially in
case of (maxfd == -1), we call select(0, ...), which is basically equal
to sleep. */
rc = select(maxfd+1, &fdread, &fdwrite, &fdexcep, &timeout);
switch(rc) {
case -1:
/* select error */
break;
case 0:
default:
/* timeout or readable/writable sockets */
curl_multi_perform(multi_handle, &still_running);
break;
}
} while(still_running);
curl_multi_cleanup(multi_handle);
curl_easy_cleanup(http_handle);
return 0;
curl_multi does indeed work with any amount of transfers in parallel, but it does so using the same single thread for all the work. It has the side-effect that if something anywhere takes a long time, that action blocks all other transfers.
One example of such a blocking operation that sometimes is what causes something like what you're describing, is the name resolver phase if the old blocking name resolver is used. Other explanations include a callback implemented by the application takes time for some reason.
You can build libcurl to instead use c-ares or the threaded-resolver backends that both avoid this blocking behavior and instead much better allow for concurrency. The threaded resolver is default in libcurl since many years now (late 2021).