Curl - Sending hundreds of requests but only four at a time - Programming - c++

How do you proceed to solve this problem? I've hundreds of requests to be sent to Curl but I can send only four at a time.
Thus, I need to make four requests using curl at the same time and processes their responses. However, once one of the curl pointer is available, I need to send another request.
This is because, the server can handle only four requests at a time but I've hundreds of requests to be sent to the server.
Following is the code, I got from curl site
int main(void)
{
const int HANDLECOUNT = 4;
CURL *handles[HANDLECOUNT];
CURLM *multi_handle;
int still_running = 0; /* keep number of running handles */
int i;
CURLMsg *msg; /* for picking up messages with the transfer status */
int msgs_left; /* how many messages are left */
/* Allocate one CURL handle per transfer */
for(i = 0; i<HANDLECOUNT; i++)
handles[i] = curl_easy_init();
/* set the options (I left out a few, you'll get the point anyway) */
curl_easy_setopt(handles[0], CURLOPT_URL, "website");
curl_easy_setopt(handles[0], CURLOPT_POSTFIELDS, XMLRequestToPost.c_str());
curl_easy_setopt(handles[0], CURLOPT_POSTFIELDSIZE, (long)strlen(XMLRequestToPost.c_str()));
curl_easy_setopt(handles[1], CURLOPT_URL, "website");
curl_easy_setopt(handles[2], CURLOPT_URL, "website");
curl_easy_setopt(handles[3], CURLOPT_URL, "website");
/* set the request for other 3 handles too */
/* init a multi stack */
multi_handle = curl_multi_init();
/* add the individual transfers */
for(i = 0; i<HANDLECOUNT; i++)
curl_multi_add_handle(multi_handle, handles[i]);
/* we start some action by calling perform right away */
curl_multi_perform(multi_handle, &still_running);
while(still_running) {
}
}

Create a thread-safe queue to put your requests into.
Start 4 threads, each one with its own CURL object.
Have each thread run a loop that:
pulls the next request from the queue,
sends it
processes/dispatches the response as needed,
and repeats
Until the queue is empty.

Related

LibCurl C++: slowing down the sending of requests when multiplexing

Goal:
To slightly slow down the sending of requests when multiplexing with libcurl, possibly by introducing small time delays in between the sending of each of the HTTP/2 request to a server. The multiplexing program needs to listen out for any changes from one webpage for around 3 seconds at a set time once a day. However, the multiplexing program finishes execution in under a second even when setting the variable num_transfers to the thousands (variable seen in the code further below).
It would be useful if there was a way to introduce for example, a 3 millisecond delay in between the transmission of a group of multiplex requests. This would mean the program could still send requests asynchronously (so it be won't blocked / won't have to wait for a response from the server before sending the next request) but at a slightly slower rate.
A definition of multiplexing taken from this resource:
Multiplexing is a method in HTTP/2 by which multiple HTTP requests can be sent and responses can be received asynchronously via a single TCP connection. Multiplexing is the heart of HTTP/2 protocol.
Ideal outcome:
An ideal program for this situation would be one that could send a few non-blocking/ multiplex requests every approx. 3 milliseconds. The program would run for around 3-4 seconds in total.
Current problem:
Currently the program is too fast when multiplexing, meaning a few thousand requests could be sent and received within around 350 milliseconds which can lead to the sending IP address to be blocked for a few minutes..
Please note it is not an option in this scenario to use a synchronous / blocking approach - a requirement of this program is that it must not be forced to wait for a response to be returned before sending another request. The issue lies in the fact that the program is too fast at sending a high number of requests.
Attempts at solving:
In the DO...WHILE loop seen in the code below, an attempt was made to introduce some artificial time delays at various locations within the loop using usleep(microseconds) from unistd.h, but this introduced a time delay either before or after all of the requests were sent, rather than an interleaved time delay between sending requests.
Current code:
#include <iostream>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <chrono>
#include <string>
/* somewhat unix-specific */
#include <sys/time.h>
#include <unistd.h>
/* curl stuff */
#include <curl/curl.h>
#include <curl/mprintf.h>
#ifndef CURLPIPE_MULTIPLEX
#define CURLPIPE_MULTIPLEX 0
#endif
struct CURLMsg *msg;
struct transfer {
CURL *easy;
unsigned int num;
//FILE *out;
std::string contents;
};
struct MemoryStruct {
char *memory;
size_t size;
};
struct MemoryStruct chunk;
#define NUM_HANDLES 1000
static size_t WriteMemoryCallback(void *contents, size_t size, size_t nmemb, void *userp) {
transfer *t = (transfer *)userp;
size_t realsize = size * nmemb;
t->contents.append((const char*)contents, realsize);
return realsize;
}
static void setup(struct transfer *t, int num)
{
CURL *hnd;
hnd = t->easy = curl_easy_init();
curl_easy_setopt(hnd, CURLOPT_WRITEFUNCTION, WriteMemoryCallback);
curl_easy_setopt(hnd, CURLOPT_WRITEDATA, (void *)t);
/* set the same URL */
curl_easy_setopt(hnd, CURLOPT_URL, "https://someurl.xyz");
/* HTTP/2 please */
curl_easy_setopt(hnd, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_2_0);
/* we use a self-signed test server, skip verification during debugging */
curl_easy_setopt(hnd, CURLOPT_SSL_VERIFYPEER, 0L);
curl_easy_setopt(hnd, CURLOPT_SSL_VERIFYHOST, 0L);
#if (CURLPIPE_MULTIPLEX > 0)
/* wait for pipe connection to confirm */
curl_easy_setopt(hnd, CURLOPT_PIPEWAIT, 1L);
#endif
}
int main() {
struct transfer trans[NUM_HANDLES];
CURLM *multi_handle;
int i;
int still_running = 0; /* keep number of running handles */
int num_transfers = 3;
chunk.memory = (char*)malloc(1);
chunk.size = 0;
/* init a multi stack */
multi_handle = curl_multi_init();
for(i = 0; i < num_transfers; i++) {
setup(&trans[i], i);
/* add the individual transfer */
curl_multi_add_handle(multi_handle, trans[i].easy);
}
curl_multi_setopt(multi_handle, CURLMOPT_PIPELINING, CURLPIPE_MULTIPLEX);
// curl_multi_setopt(multi_handle, CURLMOPT_MAX_TOTAL_CONNECTIONS, 1L);
// Main loop
do {
CURLMcode mc = curl_multi_perform(multi_handle, &still_running);
if(still_running) {
/* wait for activity, timeout or "nothing" */
mc = curl_multi_poll(multi_handle, NULL, 0, 1000, NULL);
}
if(mc) {
break;
}
// Get response
do {
int queued;
msg = curl_multi_info_read(multi_handle, &queued);
if ((msg) && (msg->msg == CURLMSG_DONE) && (msg->data.result == CURLE_OK)) {
// Get size of payload
curl_off_t dl_size;
curl_easy_getinfo(msg->easy_handle, CURLINFO_SIZE_DOWNLOAD_T, &dl_size);
for (int i = 0; i < num_transfers; i++) {
std::cout << trans[i].contents;
}
std::cout << std::flush;
}
} while (msg);
} while (still_running);
for(i = 0; i < num_transfers; i++) {
curl_multi_remove_handle(multi_handle, trans[i].easy);
curl_easy_cleanup(trans[i].easy);
}
free(chunk.memory);
curl_multi_cleanup(multi_handle);
return 0;
}
Summary question:
Is there a way to send a group of multiplexed requests to a single URL approximately every 3 milliseconds? Another idea to attempt to solve this was to wrap the entire functionality contained in main() within a FOR loop, and putting a time delay at the end of each iteration of the FOR loop.

Libcurl C++: Non-blocking way to send requests to a single URL

Goal: To send requests to the same URL without having to wait for the request-sending function to finish executing.
Currently when I send a request to a URL, I have to wait around 10 ms for the server's response before sending another request using the same function. The aim is to detect changes on a webpage slightly faster than the program currently is doing, so for the WHILE loop to behave in a non-blocking manner.
Question: Using libcurl C++, if I have a WHILE loop that calls a function to send a request to a URL, how can I avoid waiting for the function to finish executing before sending another request to the SAME URL?
Note: I have been researching libcurl's multi-interface but I am struggling to determine if this interface is more suited to parallel requests to multiple URLs rather than sending requests to the same URL without having to wait for the function to finish executing each time. I have tried the following and looked at these resources:
an attempt at multi-threading a C program using libcurl requests
How to do curl_multi_perform() asynchronously in C++?
http://www.godpatterns.com/2011/09/asynchronous-non-blocking-curl-multi.html
https://curl.se/libcurl/c/multi-single.html
https://curl.se/libcurl/c/multi-poll.html
Here is my attempt at sending a request to one URL, but I have to wait for the request() function to finish and return a response code before sending the same request again.
#include <vector>
#include <iostream>
#include <curl/curl.h>
size_t write_callback(char *ptr, size_t size, size_t nmemb, void *userdata) {
std::vector<char> *response = reinterpret_cast<std::vector<char> *>(userdata);
response->insert(response->end(), ptr, ptr+nmemb);
return nmemb;
}
long request(CURL *curl, const std::string &url) {
std::vector<char> response;
long response_code;
curl_easy_setopt(curl, CURLOPT_URL, url.c_str());
curl_easy_getinfo(curl, CURLINFO_RESPONSE_CODE, &response_code);
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, write_callback);
curl_easy_setopt(curl, CURLOPT_WRITEDATA, &response);
auto res = curl_easy_perform(curl);
// ...
// Print variable "response"
// ...
return response_code;
}
int main() {
curl_global_init(CURL_GLOBAL_ALL);
CURL *curl = curl_easy_init();
while (true) {
// blocking: request() must complete before executing again
long response_code = request(curl, "https://example.com");
// ...
// Some condition breaks loop
}
curl_easy_cleanup(curl);
curl_global_cleanup();
return 0;
}
I'm at a point where I have tried to understand the multi-interface documentation as best as possible, but still struggle to fully understand it / determine if it's actually suited to my particular problem. Apologies if this question appears to have not provided enough of my own research, but there are gaps in my libcurl knowledge I'm struggling to fill.
I'd appreciate it if anyone could suggest / explain ways in which I can modify my single libcurl example above to behave in a non-blocking manner.
EDIT:
From libcurl's C implemented example called "multi-poll", when I run the below program the URL's content is printed, but because it only prints once despite the WHILE (1) loop I'm confused as to whether or not it is sending repeated non-blocking requests to the URL (which is the aim), or just one request and is waiting on some other change/event?
#include <stdio.h>
#include <string.h>
/* somewhat unix-specific */
#include <sys/time.h>
#include <unistd.h>
/* curl stuff */
#include <curl/curl.h>
int main(void)
{
CURL *http_handle;
CURLM *multi_handle;
int still_running = 1; /* keep number of running handles */
curl_global_init(CURL_GLOBAL_DEFAULT);
http_handle = curl_easy_init();
curl_easy_setopt(http_handle, CURLOPT_URL, "https://example.com");
multi_handle = curl_multi_init();
curl_multi_add_handle(multi_handle, http_handle);
while (1) {
CURLMcode mc; /* curl_multi_poll() return code */
int numfds;
/* we start some action by calling perform right away */
mc = curl_multi_perform(multi_handle, &still_running);
if(still_running) {
/* wait for activity, timeout or "nothing" */
mc = curl_multi_poll(multi_handle, NULL, 0, 1000, &numfds);
}
// if(mc != CURLM_OK) {
// fprintf(stderr, "curl_multi_wait() failed, code %d.\n", mc);
// break;
// }
}
curl_multi_remove_handle(multi_handle, http_handle);
curl_easy_cleanup(http_handle);
curl_multi_cleanup(multi_handle);
curl_global_cleanup();
return 0;
}
You need to move curl_multi_add_handle and curl_multi_remove_handle inside the
while loop. Below is the extract from curl documentation https://curl.se/libcurl/c/libcurl-multi.html
When a single transfer is completed, the easy handle is still left added to the >multi stack. You need to first remove the easy handle with curl_multi_remove_handle >and then close it with curl_easy_cleanup, or possibly set new options to it and add >it again with curl_multi_add_handle to start another transfer.

C++ libcurl: using non-blocking loop to detect HTTP status code change

Scenario:
Before updating at a scheduled time, a web page has a HTTP status code of 503. When new data is added to the page after the scheduled time, the HTTP status code changes to 200.
Goal:
Using a non-blocking loop, to detect this change in the HTTP status code from 503 to 200 as fast as possible. With the current code seen further below, a WHILE loop successfully listens for the change in HTTP status code and prints out a success statement. Once 200 is detected, a break statement stops the loop.
However, it seems that the program must wait for a response every time a HTTP request is made before moving to the next WHILE loop iteration, behaving in a blocking manner.
Question:
Using libcurl C++, how can the below program be modified to transmit requests (to a single URL) to detect a HTTP status code change without having to wait for the response before sending another request?
Please note: I am aware that excessive requests may be deemed as unfriendly (this is an experiment for my own URL).
Before posting this question, the following SO questions and resources have been consulted:
How to do curl_multi_perform() asynchronously in C++?
Is curl_easy_perform() synchronous or asynchronous?
http://www.godpatterns.com/2011/09/asynchronous-non-blocking-curl-multi.html
https://curl.se/libcurl/c/multi-single.html
https://curl.se/libcurl/c/multi-poll.html
What's been tried so far:
Using multi-threading with a FOR loop in C to repeatedly call function to detect HTTP code change, which had a slight latency advantage. See code here: https://pastebin.com/73dBwkq3
Utilised OpenMP, again when using a FOR loop instead of the original WHILE loop. Latency advantage wasn't substantial.
Using the libcurl documentation C tutorials to try to replicate a program that listens to just one URL for changes, using the asynchronous multi-interface with difficulty.
Current attempt using curl_easy_opt:
#include <iostream>
#include <iomanip>
#include <vector>
#include <string>
#include <curl/curl.h>
// Function for writing callback
size_t write_callback(char *ptr, size_t size, size_t nmemb, void *userdata) {
std::vector<char> *response = reinterpret_cast<std::vector<char> *>(userdata);
response->insert(response->end(), ptr, ptr+nmemb);
return nmemb;
}
long request(CURL *curl, const std::string &url) {
std::vector<char> response;
long response_code;
curl_easy_setopt(curl, CURLOPT_URL, url.c_str());
curl_easy_getinfo(curl, CURLINFO_RESPONSE_CODE, &response_code);
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, write_callback);
curl_easy_setopt(curl, CURLOPT_WRITEDATA, &response);
auto res = curl_easy_perform(curl);
if (response_code == 200) {
std::cout << "SUCCESS" << std::endl;
}
return response_code;
}
int main() {
curl_global_init(CURL_GLOBAL_ALL);
CURL *curl = curl_easy_init();
while (true) {
long response_code = request(curl, "www.example.com");
if (response_code == 200) {
break; // Page updated
}
}
curl_easy_cleanup(curl);
curl_global_cleanup();
return 0;
}
Summary:
Using C++ and libcurl, does anyone know how a WHILE loop can be used to repeatedly send a request to one URL only, without having to wait for the response in between sending requests? The aim of this is to detect the change as quickly as possible.
I understand that there is ample libcurl documentation, but have had difficulties grasping the multi-interface aspects to help apply them to this issue.
/* get us the resource without a body - use HEAD! */
curl_easy_setopt(curl, CURLOPT_NOBODY, 1L);
If HEAD does not work for you, the server may reject HEAD, another solution:
size_t header_callback(char *buffer, size_t size, size_t nitems, void *userdata) {
long response_code = 0;
curl_easy_getinfo(curl, CURLINFO_RESPONSE_CODE, &response_code);
if (response_code != 200)
return 0; // Aborts the request.
return nitems;
}
curl_easy_setopt(curl, CURLOPT_HEADERFUNCTION, header_callback);
The second solution will consume network traffic, the HEAD is much better, once you receive 200, you can request GET.

C++ Curl Multi Perform Blocking Issue

In my QT app, I been using curl's curl_easy_setopt, and notice that its actually synchronous which was blocking my main gui. Inside my app I have a timer that has a set interval that calls curl. Anytime my timer callback would run curl, it would block my app for a few seconds than continue. So now im trying to figure out how to perform curl's multi interface and multi perform which is asynchronous, and its giving me the same blocking/lagging issue. Can anyone give me advice.
Below is my code, as well as curl's website demo for multi perform.
/********Header Files******/
#include <sys/time.h>
#include <unistd.h>
....
/************My Timer runs the code below every 10 seconds************/
std::string url = searchEngineParam.toStdString();
std::string userAgent = options[5]->userAgentsOptions[0].toStdString();
CURL *http_handle;
CURLM *multi_handle;
int still_running; /* keep number of running handles */
int repeats = 0;
curl_global_init(CURL_GLOBAL_DEFAULT);
http_handle = curl_easy_init();
curl_easy_setopt(http_handle, CURLOPT_URL, url.c_str());
curl_easy_setopt(http_handle, CURLOPT_FOLLOWLOCATION, 1L);
curl_easy_setopt(http_handle,CURLOPT_USERAGENT,userAgent.c_str());
curl_easy_setopt(http_handle, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_easy_setopt(http_handle, CURLOPT_WRITEDATA, (void *)&chunk);
curl_easy_setopt(http_handle, CURLOPT_WRITEFUNCTION, WriteMemoryCallback);
/* init a multi stack */
multi_handle = curl_multi_init();
/* add the individual transfers */
curl_multi_add_handle(multi_handle, http_handle);
/* we start some action by calling perform right away */
curl_multi_perform(multi_handle, &still_running);
do {
CURLMcode mc; /* curl_multi_wait() return code */
int numfds;
/* wait for activity, timeout or "nothing" */
mc = curl_multi_wait(multi_handle, NULL, 0, 1000, &numfds);
if(mc != CURLM_OK) {
fprintf(stderr, "curl_multi_wait() failed, code %d.\n", mc);
break;
}
/* 'numfds' being zero means either a timeout or no file descriptors to
wait for. Try timeout on first occurrence, then assume no file
descriptors and no file descriptors to wait for means wait for 100
milliseconds. */
if(!numfds) {
repeats++; /* count number of repeated zero numfds */
if(repeats > 1) {
WAITMS(100); /* sleep 100 milliseconds */
}
}
else
repeats = 0;
curl_multi_perform(multi_handle, &still_running);
} while(still_running);
curl_multi_remove_handle(multi_handle, http_handle);
curl_easy_cleanup(http_handle);
curl_multi_cleanup(multi_handle);
curl_global_cleanup();

No difference between curl_easy and curl_multi

I'm performing HTTP requests from my C++ program to my PHP script with libcurl.
The first easy_ version below works good, however it is quite slow (12 requests per second on localhost). Nothing strange - I got similar results using ab -n 1000 -c 1.
On the other hand ab -n 1000 -c 100 performs much more better with 600 request per second.The thing is, using libcurl multi doesn't seem to be concurrent. I used just slightly modified example code and the result is also about 12 req/s.
Do I understand curl_multi right? How can I achieve results similar to ab?
PS. I know that both codes are a bit different, however almost whole time is spent on curl work.
The easy_ way:
CURL *curl;
CURLcode response; // HTTP response
curl = curl_easy_init();
if(curl)
{
curl_easy_setopt(curl, CURLOPT_URL, "http://localhost/process.php");
while(true)
{
if(!requestsQueue.empty())
{
mtx.lock();
string data = requestsQueue.front();
requestsQueue.pop();
mtx.unlock();
const char *post = data.c_str(); //convert string to char used by CURL
curl_easy_setopt(curl, CURLOPT_POSTFIELDS, post);
do
{
response = curl_easy_perform(curl);
} while(response != CURLE_OK);
}
else
{
//there are no request to perform, so wait for them
cout << "Sleeping...\n";
sleep(2);
continue;
}
}
//curl_easy_cleanup(curl);
}
else
{
cout << "CURL init failed!\n";
}
The multi way:
CURLM *multi_handle;
int still_running; /* keep number of running handles */
/* init a multi stack */
multi_handle = curl_multi_init();
/* add the individual transfers */
for(int i=1;i<=300;i++)
{
CURL *handle;
handle = curl_easy_init();
curl_easy_setopt(handle, CURLOPT_URL, "http://localhost/process.php");
curl_multi_add_handle(multi_handle, handle);
}
/* we start some action by calling perform right away */
curl_multi_perform(multi_handle, &still_running);
do {
struct timeval timeout;
int rc; /* select() return code */
fd_set fdread;
fd_set fdwrite;
fd_set fdexcep;
int maxfd = -1;
long curl_timeo = -1;
FD_ZERO(&fdread);
FD_ZERO(&fdwrite);
FD_ZERO(&fdexcep);
/* set a suitable timeout to play around with */
timeout.tv_sec = 1;
timeout.tv_usec = 0;
curl_multi_timeout(multi_handle, &curl_timeo);
if(curl_timeo >= 0) {
timeout.tv_sec = curl_timeo / 1000;
if(timeout.tv_sec > 1)
timeout.tv_sec = 1;
else
timeout.tv_usec = (curl_timeo % 1000) * 1000;
}
/* get file descriptors from the transfers */
curl_multi_fdset(multi_handle, &fdread, &fdwrite, &fdexcep, &maxfd);
/* In a real-world program you OF COURSE check the return code of the
function calls. On success, the value of maxfd is guaranteed to be
greater or equal than -1. We call select(maxfd + 1, ...), specially in
case of (maxfd == -1), we call select(0, ...), which is basically equal
to sleep. */
rc = select(maxfd+1, &fdread, &fdwrite, &fdexcep, &timeout);
switch(rc) {
case -1:
/* select error */
break;
case 0:
default:
/* timeout or readable/writable sockets */
curl_multi_perform(multi_handle, &still_running);
break;
}
} while(still_running);
curl_multi_cleanup(multi_handle);
curl_easy_cleanup(http_handle);
return 0;
curl_multi does indeed work with any amount of transfers in parallel, but it does so using the same single thread for all the work. It has the side-effect that if something anywhere takes a long time, that action blocks all other transfers.
One example of such a blocking operation that sometimes is what causes something like what you're describing, is the name resolver phase if the old blocking name resolver is used. Other explanations include a callback implemented by the application takes time for some reason.
You can build libcurl to instead use c-ares or the threaded-resolver backends that both avoid this blocking behavior and instead much better allow for concurrency. The threaded resolver is default in libcurl since many years now (late 2021).