Scenario:
Before updating at a scheduled time, a web page has a HTTP status code of 503. When new data is added to the page after the scheduled time, the HTTP status code changes to 200.
Goal:
Using a non-blocking loop, to detect this change in the HTTP status code from 503 to 200 as fast as possible. With the current code seen further below, a WHILE loop successfully listens for the change in HTTP status code and prints out a success statement. Once 200 is detected, a break statement stops the loop.
However, it seems that the program must wait for a response every time a HTTP request is made before moving to the next WHILE loop iteration, behaving in a blocking manner.
Question:
Using libcurl C++, how can the below program be modified to transmit requests (to a single URL) to detect a HTTP status code change without having to wait for the response before sending another request?
Please note: I am aware that excessive requests may be deemed as unfriendly (this is an experiment for my own URL).
Before posting this question, the following SO questions and resources have been consulted:
How to do curl_multi_perform() asynchronously in C++?
Is curl_easy_perform() synchronous or asynchronous?
http://www.godpatterns.com/2011/09/asynchronous-non-blocking-curl-multi.html
https://curl.se/libcurl/c/multi-single.html
https://curl.se/libcurl/c/multi-poll.html
What's been tried so far:
Using multi-threading with a FOR loop in C to repeatedly call function to detect HTTP code change, which had a slight latency advantage. See code here: https://pastebin.com/73dBwkq3
Utilised OpenMP, again when using a FOR loop instead of the original WHILE loop. Latency advantage wasn't substantial.
Using the libcurl documentation C tutorials to try to replicate a program that listens to just one URL for changes, using the asynchronous multi-interface with difficulty.
Current attempt using curl_easy_opt:
#include <iostream>
#include <iomanip>
#include <vector>
#include <string>
#include <curl/curl.h>
// Function for writing callback
size_t write_callback(char *ptr, size_t size, size_t nmemb, void *userdata) {
std::vector<char> *response = reinterpret_cast<std::vector<char> *>(userdata);
response->insert(response->end(), ptr, ptr+nmemb);
return nmemb;
}
long request(CURL *curl, const std::string &url) {
std::vector<char> response;
long response_code;
curl_easy_setopt(curl, CURLOPT_URL, url.c_str());
curl_easy_getinfo(curl, CURLINFO_RESPONSE_CODE, &response_code);
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, write_callback);
curl_easy_setopt(curl, CURLOPT_WRITEDATA, &response);
auto res = curl_easy_perform(curl);
if (response_code == 200) {
std::cout << "SUCCESS" << std::endl;
}
return response_code;
}
int main() {
curl_global_init(CURL_GLOBAL_ALL);
CURL *curl = curl_easy_init();
while (true) {
long response_code = request(curl, "www.example.com");
if (response_code == 200) {
break; // Page updated
}
}
curl_easy_cleanup(curl);
curl_global_cleanup();
return 0;
}
Summary:
Using C++ and libcurl, does anyone know how a WHILE loop can be used to repeatedly send a request to one URL only, without having to wait for the response in between sending requests? The aim of this is to detect the change as quickly as possible.
I understand that there is ample libcurl documentation, but have had difficulties grasping the multi-interface aspects to help apply them to this issue.
/* get us the resource without a body - use HEAD! */
curl_easy_setopt(curl, CURLOPT_NOBODY, 1L);
If HEAD does not work for you, the server may reject HEAD, another solution:
size_t header_callback(char *buffer, size_t size, size_t nitems, void *userdata) {
long response_code = 0;
curl_easy_getinfo(curl, CURLINFO_RESPONSE_CODE, &response_code);
if (response_code != 200)
return 0; // Aborts the request.
return nitems;
}
curl_easy_setopt(curl, CURLOPT_HEADERFUNCTION, header_callback);
The second solution will consume network traffic, the HEAD is much better, once you receive 200, you can request GET.
Related
I have a libcurl program running on Ubuntu 20.04.1 LTS that downloads a webpage's data when it detects that the webpage's HTTPS status code changes from 503 to 200. When the code changes, it means the site owners have uploaded new info to the page.
I used to be able to detect the change from 503 to 200 in around 150-200 milliseconds consistently for months.
Since around mid-July 2021 however (and without changing the libcurl program's code whatsoever), when the site changes from 503 to 200 I do not get a response for anywhere between 7 to 18 seconds to tell me the code has changed.
Note: I have spoken to other users who have confirmed that they are still recieving the updated data in sub 200 milliseconds, as well as the site owners who described everything as working as normally.
Current output/ attempts to solve:
Uninstalled and reinstalled libcurl with sudo apt-get install libcurl4-openssl-dev
I decided to use Wireshark, the output of which can be seen here [use code 173967 if required/ happy to do so]. The scheduled change of the status code is usually at/after 1633012200. The libcurl sending IP is seen as 192.168.0.29 and the destination is 13.224.247.34, but I cannot ascertain if the information is indeed being delivered on time and my application is causing the bottleneck, or if there is a genuine network issue.
Employed debugging print statements. One statement prints "sent" when the response code request is sent, and the other prints "received" with the returned response code and an EPOCH timestamp.
On the countdown to the website's scheduled change from 503 to 200, in the terminal I see the expected output of sent -> received -> EPOCH timestamp. When the website's scheduled change from 503 to 200 occurs, all I see in the terminal is sent, followed by a 7 to 18 second delay before an output of received 200 -> timestamp.
Tested the same code with the same URL on three separate servers with different IP addresses but with the same OS, which all produced the same delayed result although some slightly less slow than others.
An example of print statement output:
/*
listening...
sent
response
503
1632925797111
sent
response
503
1632925797227
sent
response
503
1632925797336
sent
<---- Program hangs here for approx. 7-18 seconds before printing 200 and timestamp*/
Current code:
#include <iomanip>
#include <vector>
#include <iostream>
#include <string>
#include <chrono>
#include <future>
#include <algorithm>
#include <cstring>
#include <curl/curl.h>
// Function for writing callback
size_t write_callback(char *ptr, size_t size, size_t nmemb, void *userdata) {
std::vector<char> *response = reinterpret_cast<std::vector<char> *>(userdata);
response->insert(response->end(), ptr, ptr+nmemb);
return nmemb;
}
// Handle requests to URL
long request(CURL *curl, const std::string &url) {
std::vector<char> response;
long response_code;
curl_easy_setopt(curl, CURLOPT_URL, url.c_str());
curl_easy_setopt(curl, CURLOPT_FOLLOWLOCATION, 1L);
curl_easy_setopt(curl, CURLOPT_IPRESOLVE, CURL_IPRESOLVE_V4);
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, write_callback);
curl_easy_setopt(curl, CURLOPT_WRITEDATA, &response);
curl_easy_setopt(curl, CURLOPT_COOKIEFILE, "");
curl_easy_setopt(curl, CURLOPT_COOKIE, "");
auto res = curl_easy_perform(curl);
curl_easy_getinfo(curl, CURLINFO_RESPONSE_CODE, &response_code);
if (response_code == 200) {
// do something
}
return response_code;
}
int main() {
curl_global_init(CURL_GLOBAL_ALL);
CURL *curl = curl_easy_init();
std::string value1;
std::cout << std::endl << "Press 1 to start listening..." << std::endl;
std::cin >> value1;
if (value1 == "1") {
std::cout << std::endl << "listening..." << std::endl;
while (true) {
std::cout << "sent" << std::endl;
long response_code = request(curl, "https://someurl.xyz");
std::cout << "response" << std::endl;
if (response_code == 200) {
std::int64_t epoch_2 = std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::system_clock::now().time_since_epoch()).count();
std::cout << "code 200 detected at : " << epoch_2 << std::endl;
break; // Page updated
}
else {
std::cout << response_code << std::endl;
std::int64_t epoch_3 = std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::system_clock::now().time_since_epoch()).count();
std::cout << epoch_3 << std::endl;
}
}
}
curl_easy_cleanup(curl);
curl_global_cleanup();
return 0;
}
Summary question:
Q1. Given that other users are not experiencing this and I have ran this same test on different servers with different IP addresses, does anyone know what could be causing the delay of my program detecting the status code change? As previously stated, this program worked as expected with little to no delay before July 2021.
This issue has been ongoing for over three months. If anyone has any ideas, or could possibly point me in a direction to diagnose this, I would be highly appreciative.
Goal: To send requests to the same URL without having to wait for the request-sending function to finish executing.
Currently when I send a request to a URL, I have to wait around 10 ms for the server's response before sending another request using the same function. The aim is to detect changes on a webpage slightly faster than the program currently is doing, so for the WHILE loop to behave in a non-blocking manner.
Question: Using libcurl C++, if I have a WHILE loop that calls a function to send a request to a URL, how can I avoid waiting for the function to finish executing before sending another request to the SAME URL?
Note: I have been researching libcurl's multi-interface but I am struggling to determine if this interface is more suited to parallel requests to multiple URLs rather than sending requests to the same URL without having to wait for the function to finish executing each time. I have tried the following and looked at these resources:
an attempt at multi-threading a C program using libcurl requests
How to do curl_multi_perform() asynchronously in C++?
http://www.godpatterns.com/2011/09/asynchronous-non-blocking-curl-multi.html
https://curl.se/libcurl/c/multi-single.html
https://curl.se/libcurl/c/multi-poll.html
Here is my attempt at sending a request to one URL, but I have to wait for the request() function to finish and return a response code before sending the same request again.
#include <vector>
#include <iostream>
#include <curl/curl.h>
size_t write_callback(char *ptr, size_t size, size_t nmemb, void *userdata) {
std::vector<char> *response = reinterpret_cast<std::vector<char> *>(userdata);
response->insert(response->end(), ptr, ptr+nmemb);
return nmemb;
}
long request(CURL *curl, const std::string &url) {
std::vector<char> response;
long response_code;
curl_easy_setopt(curl, CURLOPT_URL, url.c_str());
curl_easy_getinfo(curl, CURLINFO_RESPONSE_CODE, &response_code);
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, write_callback);
curl_easy_setopt(curl, CURLOPT_WRITEDATA, &response);
auto res = curl_easy_perform(curl);
// ...
// Print variable "response"
// ...
return response_code;
}
int main() {
curl_global_init(CURL_GLOBAL_ALL);
CURL *curl = curl_easy_init();
while (true) {
// blocking: request() must complete before executing again
long response_code = request(curl, "https://example.com");
// ...
// Some condition breaks loop
}
curl_easy_cleanup(curl);
curl_global_cleanup();
return 0;
}
I'm at a point where I have tried to understand the multi-interface documentation as best as possible, but still struggle to fully understand it / determine if it's actually suited to my particular problem. Apologies if this question appears to have not provided enough of my own research, but there are gaps in my libcurl knowledge I'm struggling to fill.
I'd appreciate it if anyone could suggest / explain ways in which I can modify my single libcurl example above to behave in a non-blocking manner.
EDIT:
From libcurl's C implemented example called "multi-poll", when I run the below program the URL's content is printed, but because it only prints once despite the WHILE (1) loop I'm confused as to whether or not it is sending repeated non-blocking requests to the URL (which is the aim), or just one request and is waiting on some other change/event?
#include <stdio.h>
#include <string.h>
/* somewhat unix-specific */
#include <sys/time.h>
#include <unistd.h>
/* curl stuff */
#include <curl/curl.h>
int main(void)
{
CURL *http_handle;
CURLM *multi_handle;
int still_running = 1; /* keep number of running handles */
curl_global_init(CURL_GLOBAL_DEFAULT);
http_handle = curl_easy_init();
curl_easy_setopt(http_handle, CURLOPT_URL, "https://example.com");
multi_handle = curl_multi_init();
curl_multi_add_handle(multi_handle, http_handle);
while (1) {
CURLMcode mc; /* curl_multi_poll() return code */
int numfds;
/* we start some action by calling perform right away */
mc = curl_multi_perform(multi_handle, &still_running);
if(still_running) {
/* wait for activity, timeout or "nothing" */
mc = curl_multi_poll(multi_handle, NULL, 0, 1000, &numfds);
}
// if(mc != CURLM_OK) {
// fprintf(stderr, "curl_multi_wait() failed, code %d.\n", mc);
// break;
// }
}
curl_multi_remove_handle(multi_handle, http_handle);
curl_easy_cleanup(http_handle);
curl_multi_cleanup(multi_handle);
curl_global_cleanup();
return 0;
}
You need to move curl_multi_add_handle and curl_multi_remove_handle inside the
while loop. Below is the extract from curl documentation https://curl.se/libcurl/c/libcurl-multi.html
When a single transfer is completed, the easy handle is still left added to the >multi stack. You need to first remove the easy handle with curl_multi_remove_handle >and then close it with curl_easy_cleanup, or possibly set new options to it and add >it again with curl_multi_add_handle to start another transfer.
My current curl setup to call a webpage, save it into a string, and reiterate the process after sleeping for a second. This is the code to write into the string:
#include <curl/curl.h>
#include <string>
#include <iostream>
#include <thread>
#include <chrono>
size_t curl_writefunc(void* ptr, size_t size, size_t nmemb, std::string* data)
{
data->append((const char*)ptr, size * nmemb);
return size * nmemb;
}
void curl_handler(std::string& data)
{
int http_code = 0;
CURL* curl;
// Initialize cURL
curl = curl_easy_init();
// Set the function to call when there is new data
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, curl_writefunc);
// Set the parameter to append the new data to
curl_easy_setopt(curl, CURLOPT_WRITEDATA, &data);
// Set the URL to download; just for this question.
curl_easy_setopt(curl, CURLOPT_URL, "http://www.example.com/");
// Download
curl_easy_perform(curl);
// Get the HTTP response code
curl_easy_getinfo(curl, CURLINFO_RESPONSE_CODE, &http_code);
// Clean up
curl_easy_cleanup(curl);
curl_global_cleanup();
}
int main()
{
bool something = true;
std::string data;
while (something)
{
curl_handler(data);
std::cout << data << '\n';
data.clear();
std:: this_thread:: sleep_for (std:: chrono:: seconds(1));
}
}
However it runs into a problem about 20 minutes into runtime and this is the message it confronts me with:
140377776379824:error:02001018:system library:fopen:Too many open files:bss_file.c:173:fopen('/etc/ssl/openssl.cnf','rb')
140377776379824:error:2006D002:BIO routines:BIO_new_file:system lib:bss_file.c:178:
140377776379824:error:0E078002:configuration file routines:DEF_LOAD:system lib:conf_def.c:199:
It seems to stem from an openssl file, that does not close once it has fullfilled its task in the single iteration. If iterated more than once, the open files add up and are bound to enter into an error at some point.
I am still much of a beginner programmer, and therefore don't want to start messing with openSSL, so I came here to ask, wether there is a solution for this kind of problem. Could it be solved by declaring the curl object outside of the recalled function?
What has to be done is simply declaring the handle and its settings before getting the data. Only the actual download and its accompanying response is then reiterated in the loop. It is encouraged to re-use a handler as often as needed, since part of its resources (like the files opened in this session), may need to redeployed again.
Messing around with the League of Legends API.
I've had an issue for a couple of days now so I've simplified what's going on. I'm sending off a URL via cURL which should return a block of Json. The URL opens fine in my browser and displays the expected data. However for some strange reason, cURL (or the API?) is sending data to my callback function multiple times.
A few snippets of what returns:
Starts with - {"20278403":[{"name":"Pop...
Ends with - {"name":"Karthus's Overlords","ti
Literally cuts out with "ti. A new callback then begins, continuing on with the old data:
Starts with - er":"PLATINUM","que...
Ends with - "isInactive":false}]}]}
As you may notice, the correct termination for Json is present with the second callback's output. I know the suggestion will be 'why not just shove it all into one string and parse it after?' - the problem is that I need to send off several requests as you can only request X many players data at a time. So it's difficult to tell where one request's Json begins and the other ends!
Most importantly - does anyone know why this is happening? It seems extremely bizarre to return data across multiple callbacks.
If it helps.. Just a generic cURL call:
curl_easy_setopt(m_pCurl, CURLOPT_URL, "https://euw...")
curl_easy_setopt(m_pCurl, CURLOPT_WRITEFUNCTION, &DataSuccessCB);
curl_easy_perform(m_pCurl);
size_t CAPIReader::DataSuccessCB(char* cBuffer, size_t iSize, size_t nmemb, void* userData)
{
string sBuffer = string(cBuffer);
vStrVec.push_back(sBuffer); // vector holding all the returned json strings - intended to have a whole block of json in each one!
return (iSize * nmemb);
}
Thanks.
This is the normal behaviour of libcurl, you could see this in the getinmemory.c sample. I suppose that cURL callback the function when data are available from the socket. So if the TCP message is fragmented, the callback is called several times.
A possible solution to concatenate the message is to transmit a pointer to the string to fill :
size_t CAPIReader::DataSuccessCB(char* cBuffer, size_t iSize, size_t nmemb, void* userData)
{
std::string & buffer = *(std::string*)userData;
buffer.append((char*)contents,nmemb*iSize);
return (iSize * nmemb);
}
std::string data;
curl_easy_setopt(m_pCurl, CURLOPT_URL, "https://euw...")
curl_easy_setopt(m_pCurl, CURLOPT_WRITEFUNCTION, &DataSuccessCB);
curl_easy_setopt(curl, CURLOPT_WRITEDATA, (void *)&data);
if(curl_easy_perform(m_pCurl) == CURLE_OK)
{
// Parse the JSON data in data string
}
I am very new to cURL, so hopefully could get some help. Currently, I am in a Window environment, and using Visual Studio.
I am trying to use cURL to access a DLink IP camera through the DLink website (https://mydlink.com/login). And grab the video stream by the IP camera to do some processing. But to do this, I have to first login. But I am not sure how to do it.
Below is my code.
int main()
{
CURL *curl;
CURLcode result;
char *url_1 = "https://mydlink.com/login";
char *postdata = "email=xyz#gmail.com&password=123456";
char *cookiefile = "tempcookie";
curl = curl_easy_init();
if( curl )
{
curl_easy_setopt(curl, CURLOPT_COOKIEFILE, cookiefile);
curl_easy_setopt(curl, CURLOPT_FOLLOWLOCATION, 1);
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, dummy);
curl_easy_setopt(curl, CURLOPT_URL, url_1);
curl_easy_setopt(curl, CURLOPT_POSTFIELDS, postdata);
// Connect to target (login)
result = curl_easy_perform(curl);
if( result )
cout << "Cannot connect to site, check your url!\n";
else
{
//...
}
}
return 0;
}
Could someone please enlighten me, or provide some piece of code for it?
Thank you
1) Your example code is incomplete: you use dummy function which is not in your listing.
It is important that the dummy function returns size*nmemb (see manual for CURLOPT_WRITEFUNCTION), so it is difficult to say what went wrong.
2) you don't output your error code: please use curl_easy_strerror to decode your error in the result, then you would know why it failed.
3) if I supply my own "dummy" callback, then I get an HTML page without errors, and the page itself does not complain about wrong password or anything (which is strange, but it kind of works).
Here is my dummy:
size_t dummy(char *ptr, size_t size, size_t nmemb, void *userdata)
{
printf("%.*s", size*nmemb, ptr);
return size*nmemb;
}
I looked a bit further what mydlink.com is doing and it is doing acrobatics with the email address (like deciding if it is local, tries to guess a region etc), then manipulates cookies -- it is all in javascript, thus I am afraid one has to dig that Javascript in order to emulate proper login POST, or perhaps find some proper documentation about mydlink.com services, sorry.