I have a libcurl program running on Ubuntu 20.04.1 LTS that downloads a webpage's data when it detects that the webpage's HTTPS status code changes from 503 to 200. When the code changes, it means the site owners have uploaded new info to the page.
I used to be able to detect the change from 503 to 200 in around 150-200 milliseconds consistently for months.
Since around mid-July 2021 however (and without changing the libcurl program's code whatsoever), when the site changes from 503 to 200 I do not get a response for anywhere between 7 to 18 seconds to tell me the code has changed.
Note: I have spoken to other users who have confirmed that they are still recieving the updated data in sub 200 milliseconds, as well as the site owners who described everything as working as normally.
Current output/ attempts to solve:
Uninstalled and reinstalled libcurl with sudo apt-get install libcurl4-openssl-dev
I decided to use Wireshark, the output of which can be seen here [use code 173967 if required/ happy to do so]. The scheduled change of the status code is usually at/after 1633012200. The libcurl sending IP is seen as 192.168.0.29 and the destination is 13.224.247.34, but I cannot ascertain if the information is indeed being delivered on time and my application is causing the bottleneck, or if there is a genuine network issue.
Employed debugging print statements. One statement prints "sent" when the response code request is sent, and the other prints "received" with the returned response code and an EPOCH timestamp.
On the countdown to the website's scheduled change from 503 to 200, in the terminal I see the expected output of sent -> received -> EPOCH timestamp. When the website's scheduled change from 503 to 200 occurs, all I see in the terminal is sent, followed by a 7 to 18 second delay before an output of received 200 -> timestamp.
Tested the same code with the same URL on three separate servers with different IP addresses but with the same OS, which all produced the same delayed result although some slightly less slow than others.
An example of print statement output:
/*
listening...
sent
response
503
1632925797111
sent
response
503
1632925797227
sent
response
503
1632925797336
sent
<---- Program hangs here for approx. 7-18 seconds before printing 200 and timestamp*/
Current code:
#include <iomanip>
#include <vector>
#include <iostream>
#include <string>
#include <chrono>
#include <future>
#include <algorithm>
#include <cstring>
#include <curl/curl.h>
// Function for writing callback
size_t write_callback(char *ptr, size_t size, size_t nmemb, void *userdata) {
std::vector<char> *response = reinterpret_cast<std::vector<char> *>(userdata);
response->insert(response->end(), ptr, ptr+nmemb);
return nmemb;
}
// Handle requests to URL
long request(CURL *curl, const std::string &url) {
std::vector<char> response;
long response_code;
curl_easy_setopt(curl, CURLOPT_URL, url.c_str());
curl_easy_setopt(curl, CURLOPT_FOLLOWLOCATION, 1L);
curl_easy_setopt(curl, CURLOPT_IPRESOLVE, CURL_IPRESOLVE_V4);
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, write_callback);
curl_easy_setopt(curl, CURLOPT_WRITEDATA, &response);
curl_easy_setopt(curl, CURLOPT_COOKIEFILE, "");
curl_easy_setopt(curl, CURLOPT_COOKIE, "");
auto res = curl_easy_perform(curl);
curl_easy_getinfo(curl, CURLINFO_RESPONSE_CODE, &response_code);
if (response_code == 200) {
// do something
}
return response_code;
}
int main() {
curl_global_init(CURL_GLOBAL_ALL);
CURL *curl = curl_easy_init();
std::string value1;
std::cout << std::endl << "Press 1 to start listening..." << std::endl;
std::cin >> value1;
if (value1 == "1") {
std::cout << std::endl << "listening..." << std::endl;
while (true) {
std::cout << "sent" << std::endl;
long response_code = request(curl, "https://someurl.xyz");
std::cout << "response" << std::endl;
if (response_code == 200) {
std::int64_t epoch_2 = std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::system_clock::now().time_since_epoch()).count();
std::cout << "code 200 detected at : " << epoch_2 << std::endl;
break; // Page updated
}
else {
std::cout << response_code << std::endl;
std::int64_t epoch_3 = std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::system_clock::now().time_since_epoch()).count();
std::cout << epoch_3 << std::endl;
}
}
}
curl_easy_cleanup(curl);
curl_global_cleanup();
return 0;
}
Summary question:
Q1. Given that other users are not experiencing this and I have ran this same test on different servers with different IP addresses, does anyone know what could be causing the delay of my program detecting the status code change? As previously stated, this program worked as expected with little to no delay before July 2021.
This issue has been ongoing for over three months. If anyone has any ideas, or could possibly point me in a direction to diagnose this, I would be highly appreciative.
Related
Goal: To send requests to the same URL without having to wait for the request-sending function to finish executing.
Currently when I send a request to a URL, I have to wait around 10 ms for the server's response before sending another request using the same function. The aim is to detect changes on a webpage slightly faster than the program currently is doing, so for the WHILE loop to behave in a non-blocking manner.
Question: Using libcurl C++, if I have a WHILE loop that calls a function to send a request to a URL, how can I avoid waiting for the function to finish executing before sending another request to the SAME URL?
Note: I have been researching libcurl's multi-interface but I am struggling to determine if this interface is more suited to parallel requests to multiple URLs rather than sending requests to the same URL without having to wait for the function to finish executing each time. I have tried the following and looked at these resources:
an attempt at multi-threading a C program using libcurl requests
How to do curl_multi_perform() asynchronously in C++?
http://www.godpatterns.com/2011/09/asynchronous-non-blocking-curl-multi.html
https://curl.se/libcurl/c/multi-single.html
https://curl.se/libcurl/c/multi-poll.html
Here is my attempt at sending a request to one URL, but I have to wait for the request() function to finish and return a response code before sending the same request again.
#include <vector>
#include <iostream>
#include <curl/curl.h>
size_t write_callback(char *ptr, size_t size, size_t nmemb, void *userdata) {
std::vector<char> *response = reinterpret_cast<std::vector<char> *>(userdata);
response->insert(response->end(), ptr, ptr+nmemb);
return nmemb;
}
long request(CURL *curl, const std::string &url) {
std::vector<char> response;
long response_code;
curl_easy_setopt(curl, CURLOPT_URL, url.c_str());
curl_easy_getinfo(curl, CURLINFO_RESPONSE_CODE, &response_code);
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, write_callback);
curl_easy_setopt(curl, CURLOPT_WRITEDATA, &response);
auto res = curl_easy_perform(curl);
// ...
// Print variable "response"
// ...
return response_code;
}
int main() {
curl_global_init(CURL_GLOBAL_ALL);
CURL *curl = curl_easy_init();
while (true) {
// blocking: request() must complete before executing again
long response_code = request(curl, "https://example.com");
// ...
// Some condition breaks loop
}
curl_easy_cleanup(curl);
curl_global_cleanup();
return 0;
}
I'm at a point where I have tried to understand the multi-interface documentation as best as possible, but still struggle to fully understand it / determine if it's actually suited to my particular problem. Apologies if this question appears to have not provided enough of my own research, but there are gaps in my libcurl knowledge I'm struggling to fill.
I'd appreciate it if anyone could suggest / explain ways in which I can modify my single libcurl example above to behave in a non-blocking manner.
EDIT:
From libcurl's C implemented example called "multi-poll", when I run the below program the URL's content is printed, but because it only prints once despite the WHILE (1) loop I'm confused as to whether or not it is sending repeated non-blocking requests to the URL (which is the aim), or just one request and is waiting on some other change/event?
#include <stdio.h>
#include <string.h>
/* somewhat unix-specific */
#include <sys/time.h>
#include <unistd.h>
/* curl stuff */
#include <curl/curl.h>
int main(void)
{
CURL *http_handle;
CURLM *multi_handle;
int still_running = 1; /* keep number of running handles */
curl_global_init(CURL_GLOBAL_DEFAULT);
http_handle = curl_easy_init();
curl_easy_setopt(http_handle, CURLOPT_URL, "https://example.com");
multi_handle = curl_multi_init();
curl_multi_add_handle(multi_handle, http_handle);
while (1) {
CURLMcode mc; /* curl_multi_poll() return code */
int numfds;
/* we start some action by calling perform right away */
mc = curl_multi_perform(multi_handle, &still_running);
if(still_running) {
/* wait for activity, timeout or "nothing" */
mc = curl_multi_poll(multi_handle, NULL, 0, 1000, &numfds);
}
// if(mc != CURLM_OK) {
// fprintf(stderr, "curl_multi_wait() failed, code %d.\n", mc);
// break;
// }
}
curl_multi_remove_handle(multi_handle, http_handle);
curl_easy_cleanup(http_handle);
curl_multi_cleanup(multi_handle);
curl_global_cleanup();
return 0;
}
You need to move curl_multi_add_handle and curl_multi_remove_handle inside the
while loop. Below is the extract from curl documentation https://curl.se/libcurl/c/libcurl-multi.html
When a single transfer is completed, the easy handle is still left added to the >multi stack. You need to first remove the easy handle with curl_multi_remove_handle >and then close it with curl_easy_cleanup, or possibly set new options to it and add >it again with curl_multi_add_handle to start another transfer.
Scenario:
Before updating at a scheduled time, a web page has a HTTP status code of 503. When new data is added to the page after the scheduled time, the HTTP status code changes to 200.
Goal:
Using a non-blocking loop, to detect this change in the HTTP status code from 503 to 200 as fast as possible. With the current code seen further below, a WHILE loop successfully listens for the change in HTTP status code and prints out a success statement. Once 200 is detected, a break statement stops the loop.
However, it seems that the program must wait for a response every time a HTTP request is made before moving to the next WHILE loop iteration, behaving in a blocking manner.
Question:
Using libcurl C++, how can the below program be modified to transmit requests (to a single URL) to detect a HTTP status code change without having to wait for the response before sending another request?
Please note: I am aware that excessive requests may be deemed as unfriendly (this is an experiment for my own URL).
Before posting this question, the following SO questions and resources have been consulted:
How to do curl_multi_perform() asynchronously in C++?
Is curl_easy_perform() synchronous or asynchronous?
http://www.godpatterns.com/2011/09/asynchronous-non-blocking-curl-multi.html
https://curl.se/libcurl/c/multi-single.html
https://curl.se/libcurl/c/multi-poll.html
What's been tried so far:
Using multi-threading with a FOR loop in C to repeatedly call function to detect HTTP code change, which had a slight latency advantage. See code here: https://pastebin.com/73dBwkq3
Utilised OpenMP, again when using a FOR loop instead of the original WHILE loop. Latency advantage wasn't substantial.
Using the libcurl documentation C tutorials to try to replicate a program that listens to just one URL for changes, using the asynchronous multi-interface with difficulty.
Current attempt using curl_easy_opt:
#include <iostream>
#include <iomanip>
#include <vector>
#include <string>
#include <curl/curl.h>
// Function for writing callback
size_t write_callback(char *ptr, size_t size, size_t nmemb, void *userdata) {
std::vector<char> *response = reinterpret_cast<std::vector<char> *>(userdata);
response->insert(response->end(), ptr, ptr+nmemb);
return nmemb;
}
long request(CURL *curl, const std::string &url) {
std::vector<char> response;
long response_code;
curl_easy_setopt(curl, CURLOPT_URL, url.c_str());
curl_easy_getinfo(curl, CURLINFO_RESPONSE_CODE, &response_code);
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, write_callback);
curl_easy_setopt(curl, CURLOPT_WRITEDATA, &response);
auto res = curl_easy_perform(curl);
if (response_code == 200) {
std::cout << "SUCCESS" << std::endl;
}
return response_code;
}
int main() {
curl_global_init(CURL_GLOBAL_ALL);
CURL *curl = curl_easy_init();
while (true) {
long response_code = request(curl, "www.example.com");
if (response_code == 200) {
break; // Page updated
}
}
curl_easy_cleanup(curl);
curl_global_cleanup();
return 0;
}
Summary:
Using C++ and libcurl, does anyone know how a WHILE loop can be used to repeatedly send a request to one URL only, without having to wait for the response in between sending requests? The aim of this is to detect the change as quickly as possible.
I understand that there is ample libcurl documentation, but have had difficulties grasping the multi-interface aspects to help apply them to this issue.
/* get us the resource without a body - use HEAD! */
curl_easy_setopt(curl, CURLOPT_NOBODY, 1L);
If HEAD does not work for you, the server may reject HEAD, another solution:
size_t header_callback(char *buffer, size_t size, size_t nitems, void *userdata) {
long response_code = 0;
curl_easy_getinfo(curl, CURLINFO_RESPONSE_CODE, &response_code);
if (response_code != 200)
return 0; // Aborts the request.
return nitems;
}
curl_easy_setopt(curl, CURLOPT_HEADERFUNCTION, header_callback);
The second solution will consume network traffic, the HEAD is much better, once you receive 200, you can request GET.
I'm am trying to use libcurl (linked to a C++ program) for the first time, and need beginner-level help. I'm also largely unfamiliar with HTTP/HTML, etc. so please forgive me if my terminology belies that.
Using the executable curl, if I execute the following...
curl -k -u user:password https://confluence/pages/viewpage.action?pageId=42
...I get what looks like the legit contents of the webpage.
I would like to do the same from my C++ program using libcurl.
I've started with a minimal modification of basic example posted at https://curl.haxx.se/libcurl/c/simple.html:
#include <iostream>
#include <curl/curl.h>
int main(void)
{
CURL *curl;
CURLcode res;
curl = curl_easy_init();
if(curl)
{
curl_easy_setopt(curl, CURLOPT_URL, "https://confluence/pages/viewpage.action?pageId=42");
res = curl_easy_perform(curl);
if(CURLE_OK == res) { std::cout << "curl success" << std::endl; }
else { std::cout << "curl failure" << std::endl; }
curl_easy_cleanup(curl);
}
return 0;
}
This code results in the output:
curl failure
Can anyone guide me on how I can programmatically do what I did earlier with the curl executable? There are some obvious deficiencies with my sample code, i.e. the absence of a username and password, so I'd appreciate any guidance in the right direction. Thank you.
Update
The reason I used the -k option when executing the curl executable was because running the command without -k resulted in no webpage content being returned by curl. I just tried adding -k based on the help text and observed it worked. Sorry for my lack of understanding and ability to explain. I'd be grateful if an answerer can touch on these topics too, to help me understand.
Update and Closure
I'm a little embarrassed that I turned to StackOverflow without a little bit more effort on my part - apologies to the community for this poor question.
The (insecure) solution, from just a bit of elbow-grease is:
#include <iostream>
#include <curl/curl.h>
int main(void)
{
CURL *curl;
CURLcode res;
curl = curl_easy_init();
if(curl)
{
curl_easy_setopt(curl, CURLOPT_URL, "https://confluence/pages/viewpage.action?pageId=42");
curl_easy_setopt(curl, CURLOPT_SSL_VERIFYPEER, 0L);
curl_easy_setopt(curl, CURLOPT_SSL_VERIFYHOST, 0L);
res = curl_easy_perform(curl);
if(CURLE_OK == res) { std::cout << "curl success" << std::endl; }
else { std::cout << "curl failure" << std::endl; }
curl_easy_cleanup(curl);
}
return 0;
}
You are trying to access a secure website, it won't work unless you use secure socket(443) protocol.
Try using CURLOPT_USE_SSL e.g.
curl_easy_setopt(curl, CURLOPT_USE_SSL, "https://confluence/pages/viewpage.action?pageId=42");
My current curl setup to call a webpage, save it into a string, and reiterate the process after sleeping for a second. This is the code to write into the string:
#include <curl/curl.h>
#include <string>
#include <iostream>
#include <thread>
#include <chrono>
size_t curl_writefunc(void* ptr, size_t size, size_t nmemb, std::string* data)
{
data->append((const char*)ptr, size * nmemb);
return size * nmemb;
}
void curl_handler(std::string& data)
{
int http_code = 0;
CURL* curl;
// Initialize cURL
curl = curl_easy_init();
// Set the function to call when there is new data
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, curl_writefunc);
// Set the parameter to append the new data to
curl_easy_setopt(curl, CURLOPT_WRITEDATA, &data);
// Set the URL to download; just for this question.
curl_easy_setopt(curl, CURLOPT_URL, "http://www.example.com/");
// Download
curl_easy_perform(curl);
// Get the HTTP response code
curl_easy_getinfo(curl, CURLINFO_RESPONSE_CODE, &http_code);
// Clean up
curl_easy_cleanup(curl);
curl_global_cleanup();
}
int main()
{
bool something = true;
std::string data;
while (something)
{
curl_handler(data);
std::cout << data << '\n';
data.clear();
std:: this_thread:: sleep_for (std:: chrono:: seconds(1));
}
}
However it runs into a problem about 20 minutes into runtime and this is the message it confronts me with:
140377776379824:error:02001018:system library:fopen:Too many open files:bss_file.c:173:fopen('/etc/ssl/openssl.cnf','rb')
140377776379824:error:2006D002:BIO routines:BIO_new_file:system lib:bss_file.c:178:
140377776379824:error:0E078002:configuration file routines:DEF_LOAD:system lib:conf_def.c:199:
It seems to stem from an openssl file, that does not close once it has fullfilled its task in the single iteration. If iterated more than once, the open files add up and are bound to enter into an error at some point.
I am still much of a beginner programmer, and therefore don't want to start messing with openSSL, so I came here to ask, wether there is a solution for this kind of problem. Could it be solved by declaring the curl object outside of the recalled function?
What has to be done is simply declaring the handle and its settings before getting the data. Only the actual download and its accompanying response is then reiterated in the loop. It is encouraged to re-use a handler as often as needed, since part of its resources (like the files opened in this session), may need to redeployed again.
I'm working on a program which will download lyrics from sites like AZLyrics. I'm using libcurl.
It's my code
lyricsDownloader.cpp
#include "lyricsDownloader.h"
#include <curl/curl.h>
#include <cstring>
#include <iostream>
#define DEBUG 1
/////////////////////////////////////////////////////////////////////////////
size_t lyricsDownloader::write_data_to_var(char *ptr, size_t size, size_t nmemb, void *userdata) // this function is a static member function
{
ostringstream * stream = (ostringstream*) userdata;
size_t count = size * nmemb;
stream->write(ptr, count);
return count;
}
string AZLyricsDownloader::toProviderCode() const
{ /*this creates an url*/ }
CURLcode AZLyricsDownloader::download()
{
CURL * handle;
CURLcode err;
ostringstream buff;
handle = curl_easy_init();
if (! handle) return static_cast<CURLcode>(-1);
// set verbose if debug on
curl_easy_setopt( handle, CURLOPT_VERBOSE, DEBUG );
curl_easy_setopt( handle, CURLOPT_URL, toProviderCode().c_str() ); // set the download url to the generated one
curl_easy_setopt(handle, CURLOPT_WRITEDATA, &buff);
curl_easy_setopt(handle, CURLOPT_WRITEFUNCTION, &AZLyricsDownloader::write_data_to_var);
err = curl_easy_perform(handle); // The segfault should be somewhere here - after calling the function but before it ends
cerr << "cleanup\n";
curl_easy_cleanup(handle);
// copy the contents to text variable
lyrics = buff.str();
return err;
}
main.cpp
#include <QString>
#include <QTextEdit>
#include <iostream>
#include "lyricsDownloader.h"
int main(int argc, char *argv[])
{
AZLyricsDownloader dl(argv[1], argv[2]);
dl.perform();
QTextEdit qtexted(QString::fromStdString(dl.lyrics));
cout << qPrintable(qtexted.toPlainText());
return 0;
}
When running
./maelyrica Anthrax Madhouse
I'm getting this logged from curl
* About to connect() to azlyrics.com port 80 (#0)
* Trying 174.142.163.250... * connected
* Connected to azlyrics.com (174.142.163.250) port 80 (#0)
> GET /lyrics/anthrax/madhouse.html HTTP/1.1
Host: azlyrics.com
Accept: */*
< HTTP/1.1 301 Moved Permanently
< Server: nginx/1.0.12
< Date: Thu, 05 Jul 2012 16:59:21 GMT
< Content-Type: text/html
< Content-Length: 185
< Connection: keep-alive
< Location: http://www.azlyrics.com/lyrics/anthrax/madhouse.html
<
Segmentation fault
Strangely, the file is there. The same error is displayed when there's no such page (redirect to azlyrics.com mainpage)
What am I doing wrong?
Thanks in advance
EDIT: I made the function for writing data static, but this changes nothing.
Even wget seems to have problems
$ wget http://www.azlyrics.com/lyrics/anthrax/madhouse.html
--2012-07-06 10:36:05-- http://www.azlyrics.com/lyrics/anthrax/madhouse.html
Resolving www.azlyrics.com... 174.142.163.250
Connecting to www.azlyrics.com|174.142.163.250|:80... connected.
HTTP request sent, awaiting response... No data received.
Retrying.
Why does opening the page in a browser work and wget/curl not?
EDIT2: After adding this:
curl_easy_setopt(handle, CURLOPT_FOLLOWLOCATION, 1);
And making the function static everything's OK.
Your code
curl_easy_setopt(handle,CURLOPT_WRITEFUNCTION,&AZLyricsDownloader::write_data_to_var);
and the following quote from the documentation from libcurl
There's basically only one thing to keep in mind when using C++
instead of C when interfacing libcurl:
The callbacks CANNOT be non-static class member functions
Example C++ code:
class AClass { static size_t write_data(void *ptr, size_t size, size_t nmemb, void* ourpointer) { /* do what you want with the data */ } }
could be the source of your problem as your function is not a static member. Even if not you are breaking this rule.
This may not solve your problem but given the amount of code you have posted in your example, that was the first thing that immediately came to mind and it is worth changing this as recommended by libcurl. If it does not solve your problem I would suggest identifying the error you are getting in more detail so that you can pose a more specific question next time (with a lot less code displayed).