I have been scraping a website (successfully using curl.h) with an application that also connects to a couple of TCPIP sockets to consume some data (successfully too). This app runs for hours gathering data from these two sources, the problem I am having is that after some time running successfully I get disconnected from the TCPIP socket because curl connects using the same IP and port that the TCPIP socket uses. So to solve this issue I have been trying (unsuccessfully) to set up manually either the IP or the port that curl uses. Here is sample of the ways I have been trying to accomplish it.
void BaseScraper::ScrapWebsite()
{
std::string URL = "https://some_website.com";
curl = curl_easy_init();
if (curl)
{
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, &BaseScraper::CurlWriter);
curl_easy_setopt(curl, CURLOPT_WRITEDATA, &curlBuffer);
curl_easy_setopt(curl, CURLOPT_URL, URL.c_str());
//These are the ways I have been trying to manipulate local IP/port unsuccessfully :(
//curl_easy_setopt(curl, CURLOPT_DNS_LOCAL_IP4, "192.168.1.1");
//curl_easy_setopt(curl, CURLOPT_LOCALPORT, 4912L);
//curl_easy_setopt(curl, CURLOPT_LOCALPORTRANGE, 20L);
curl_easy_setopt(curl, CURLOPT_FOLLOWLOCATION, (long)1);
curl_easy_perform(curl);
curl_easy_cleanup(curl);
}
}
int BaseScraper::CurlWriter(char *pData, int pSize, int pNmemb, std::string *pBuffer)
{
int result = 0;
if (pBuffer != NULL) {
pBuffer->append(pData, pSize * pNmemb);
result = pSize * pNmemb;
}
return result;
}
Note: I have simplified the code a little bit but my code works and the issue I am having is setting up IP/ports with the lines that are commented out.
Related
I'm learning how to use CURL properly, and according to all the examples (the documentation is a pain) my code should work, but for some reason sometimes it connects and other times it won't.
I did check if there was a firewall problem, or the antivirus interfering, but both are turn off and the problem persists.
The main idea is to connect to a local server (rpi), and in the future to an external server for backup/updates.
My code is as follows. Here's the callback function, and the actual function that does all the work, the different URLs are for example purposes.
static std::size_t callback(const char* in,std::size_t size, std::size_t num, std::string* out){
Silo* silo = new Silo();
const std::size_t totalBytes(size * num);
std::string data = std::to_string(totalBytes);
silo->Log("Total Bytes recive " + QString::fromStdString(data));
out->append(in, totalBytes);
return totalBytes;
}
void Server::RPI_Request(){
Silo* silo = new Silo();
//curl_global_init(CURL_GLOBAL_ALL);
CURL *curl = curl_easy_init();
const std::string url_A("http://date.jsontest.com/");
const std::string url_B("https://jsonplaceholder.typicode.com/todos/1");
const std::string url_C("https://www.google.com/");
const std::string url_D("https://stackoverflow.com/");
if (curl){
CURLcode res;
// set Ip Direction
curl_easy_setopt(curl, CURLOPT_URL, url_C.c_str() );
// Don't bother trying IPv6, which would increase DNS resolution time.
curl_easy_setopt(curl, CURLOPT_IPRESOLVE, CURL_IPRESOLVE_V4);
// Don't wait forever, time out after 10 seconds.
silo->Log("antes de timeout");
curl_easy_setopt(curl, CURLOPT_TIMEOUT, 10);
// Follow HTTP redirects if necessary.
//curl_easy_setopt(curl, CURLOPT_FOLLOWLOCATION, 1L);
// Response information.
long httpCode(0);
std::unique_ptr<std::string> httpData(new std::string());
// Hook up data handling function.
silo->Log("antes de write function");
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, callback);
// Hook up data container (will be passed as the last parameter to the
// callback handling function). Can be any pointer type, since it will
// internally be passed as a void pointer.
curl_easy_setopt(curl, CURLOPT_WRITEDATA, httpData.get());
// Run our HTTP GET command, capture the HTTP response code, and clean up.
silo->Log("antes de easy perform");
res = curl_easy_perform(curl);
curl_easy_getinfo(curl, CURLINFO_RESPONSE_CODE, &httpCode);
silo->Log("Respuesta de httpCode: " + QString::number(httpCode));
if (res != CURLE_OK){
silo->Log("Hay pedo no se conecto " + QString::fromStdString(url_C) );
} else {
silo->Log("Coneccion establecida con " + QString::fromStdString(url_C));
}
curl_easy_cleanup(curl);
//curl_global_cleanup();
}
}
I'm trying to send email and password in a post request to my express nodejs server using curl c++. The post data has '_' changed to ' ' when I log it from server.
char emailtext[50];
char passwordtext[50];
int emailstrlen = wcstombs(emailtext, email->getText(), 50);
int passwordstrlen = wcstombs(passwordtext, password->getText(), 50);
long totalsize = emailstrlen + passwordstrlen;
strcat(emailtext, ":");
strcat(emailtext, passwordtext);
// "myemail#yahoo.com:mypassword\0"
curl_global_init(CURL_GLOBAL_ALL);
CURLcode res;
CURL* curl = curl_easy_init();
curl_easy_setopt(curl, CURLOPT_VERBOSE, 1);
curl_easy_setopt(curl, CURLOPT_URL, "https://localhost:3000/login");
curl_easy_setopt(curl, CURLOPT_SSL_VERIFYPEER, 0L);
curl_easy_setopt(curl, CURLOPT_SSL_VERIFYHOST, 0L);
struct curl_slist *headers=NULL;
headers = curl_slist_append(headers, "Content-Type:text/plain; charset=utf-8");
cout << emailtext << endl;
curl_easy_setopt(curl, CURLOPT_POSTFIELDS, emailtext);
/* pass our list of custom made headers */
curl_easy_setopt(curl, CURLOPT_HTTPHEADER, headers);
curl_easy_perform(curl); /* post away! */
curl_slist_free_all(headers); /* free the header list */
curl_global_cleanup();
return true;
express nodejs server:
app.post('/login', bodyParser.text(), function (req, res) {
console.log("we got the post request for /login");
console.log("logging the body!");
console.log(req.body);
res.header('Content-type', 'text/plain');
return res.end('<h1>Hello, Secure World!</h1>');
});
say char* emailtext = "cool_stewj#yahoo.com:lololol"
output from log on server will then be "cool stewj#yahoo.com:lololol"
What's happening? I've tried url encoding the data which turned out to be silly and pointless. How do I get that underscore?
The terminal in ubuntu apparently doesn't show underscores, so any log output will have underscores replaced with spaces.
Btw that's a bad example of string code in beginning. Possible to just use one buffer. Just pointing it out.
When running my code (pertinent sections pasted below), I periodically get the following error:
program(34010,0x70000e58b000) malloc: *** error for object
0x7fc43d93fcf0: pointer being freed was not allocated set a breakpoint
in malloc_error_break to debug Signal: SIGABRT (signal SIGABRT)
I am running multi-threaded C++ code on a Macbook (OS-10.13) wherein different threads make use of the code in question simultaneously. To my knowledge, libcurl is indeed thread safe as long as I do not utilize the same "curl handle" (which I understand to be an instance of "CURL" aka "CURL *curl = curl_easy_init();") in two different threads at the same time. In my case, since each thread calls the function separately and initializes a new instance of a CURL object, I should be "safe", right? Hopefully there is something obvious that I'm missing that is causing me (or lib curl in this case) to attempt to free memory that has already been freed. If there is any more information I should have included (below) please don't hesitate to let me know.
The function that seg faults is
string http_lib::make_get_request(string url)
on the line that reads
curl_easy_cleanup(curl);
and sometimes (less often) on the line that reads
res = curl_easy_perform(curl);
Below is what I think would be the pertinent sections of my code:
size_t http_lib::CurlWrite_CallbackFunc_StdString(void *contents, size_t size, size_t nmemb, std::string *s)
{
size_t newLength = size*nmemb;
size_t oldLength = s->size();
try
{
s->resize(oldLength + newLength);
}
catch(std::bad_alloc &e)
{
//handle memory problem
return 0;
}
std::copy((char*)contents,(char*)contents+newLength,s->begin()+oldLength);
return size*nmemb;
}
string http_lib::make_post_request(string url, vector<string> headers, string post_params) {
CURL *curl;
CURLcode res;
curl = curl_easy_init();
string s;
if(curl)
{
struct curl_slist *chunk = NULL;
for(int i=0; i<headers.size(); i++){
/* Add a custom header */
chunk = curl_slist_append(chunk, headers[i].c_str());
}
/* set our custom set of headers */
res = curl_easy_setopt(curl, CURLOPT_HTTPHEADER, chunk);
curl_easy_setopt(curl, CURLOPT_URL, url.c_str());
curl_easy_setopt(curl, CURLOPT_POSTFIELDS, post_params.c_str());
curl_easy_setopt(curl, CURLOPT_SSL_VERIFYPEER, 0L); //only for https
curl_easy_setopt(curl, CURLOPT_SSL_VERIFYHOST, 0L); //only for https
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, CurlWrite_CallbackFunc_StdString);
curl_easy_setopt(curl, CURLOPT_WRITEDATA, &s);
if(networking_debug){
curl_easy_setopt(curl, CURLOPT_VERBOSE, 1L); //verbose output
}
/* Perform the request, res will get the return code */
res = curl_easy_perform(curl);
/* Check for errors */
if(res != CURLE_OK)
{
fprintf(stderr, "curl_easy_perform() failed: %s\n",
curl_easy_strerror(res));
}
/* always cleanup */
curl_easy_cleanup(curl);
}
// Debug output
if (networking_debug){
cout<<"Response: " << s <<endl;
}
return s;
}
string http_lib::make_get_request(string url) {
//SslCurlWrapper sslObject;
CURL *curl;
CURLcode res;
curl = curl_easy_init();
string s;
if (curl) {
curl_easy_setopt(curl, CURLOPT_URL, url.c_str());
//tell libcurl to follow redirection
curl_easy_setopt(curl, CURLOPT_FOLLOWLOCATION, 1L);
curl_easy_setopt(curl, CURLOPT_SSL_VERIFYPEER, 0L); //only for https
curl_easy_setopt(curl, CURLOPT_SSL_VERIFYHOST, 0L); //only for https
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, CurlWrite_CallbackFunc_StdString);
curl_easy_setopt(curl, CURLOPT_WRITEDATA, &s);
if(networking_debug){
curl_easy_setopt(curl, CURLOPT_VERBOSE, 1L); //verbose output
}
/* Perform the request, res will get the return code */
res = curl_easy_perform(curl);
/* Check for errors */
if (res != CURLE_OK)
fprintf(stderr, "curl_easy_perform() failed: %s\n", curl_easy_strerror(res));
/* always cleanup */
curl_easy_cleanup(curl);
}
if (networking_debug){
cout << "Response: " << s << endl;
}
return s;
}
In main() I have
int main(int argc, char *argv[]){
// Initialize http_lib (curl)
curl_global_init(CURL_GLOBAL_DEFAULT);
... spin up 10 or so threads that make get/post requests to https site (some requests utilize the make_post_request() function and others utilize make_get_requet() function).
}
CMAKE doesn't/didn't seem to want to use anything other than CURL_ROOT_DIR of "/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.13.sdk/usr/include" for libcurl (aka curl).
Thus it was using the curl lib that mac (and/or Xcode) ships with. I haven't figured out what version that is, but I can say that not using it and instead using CURL version 7.57 is what fixed my issue.
I used "brew" package manager to
brew install curl
Doing so created /usr/local/Cellar/curl/7.57.0 directory and put all libs/includes in there.
Then I added
-I/usr/local/Cellar/curl/7.57.0/include -L/usr/local/Cellar/curl/7.57.0/lib
to my CMAKE CMAKE_CXX_FLAGS.
TLDR; Solution was to ensure I was using the newest version of the curl lib. Now that I am, no problem.
I used Curl 7.2.9 and checked connection this way:
Here's example:
curl = curl_easy_init();
bool result = false;
if(curl)
{
curl_easy_setopt(curl, CURLOPT_URL, m_checkConnectionUrl);
CURLcode res = curl_easy_perform(curl);
}
if(res != CURLE_OK)
{
}
else
{
// connection is available
}
Now I switched to curl-7.33.0 and got *CURLE_WRITE_ERROR* error,
and to make it work I must code it like
std::string output;
char* encodedUrl = curl_easy_escape(curl, m_checkConnectionUrl, 0);
curl_easy_setopt(curl, CURLOPT_POST, 0);
curl_easy_setopt(curl, CURLOPT_URL, encodedUrl);
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, writeMemoryCurlCallbackStub);
CURLcode res = curl_easy_perform(curl);
But I don't need to write anything. Any ideas?
Manily the Curl option *CURLOPT_WRITEFUNCTION* is used to have a certain amount of data periodically(at the callback functoin) to handle a large file download. I don't see any reason to use this with your curl purpose, regardless the version.
Remove the *CURLOPT_POST*(by default its 0) and *CURLOPT_WRITEFUNCTION* from the code and it should work. If it doesn't, then you are doing something wrong at other places in your code!
Also, if you are checking whether the URL is ok or not, then using CURL is ok. But to only check for connection, you can only check whether the port 80 of the domain is on or not.
You need to write a writecallback as well
size_t CurlWriteCallback(char* buf, size_t size, size_t nmemb, void* up)
{
TRACE("CURL - Response received:\n%s", buf);
TRACE("CURL - Response handled %d bytes:\n%s", size*nmemb);
// tell curl how many bytes we handled
return size*nmemb;
}
if(curl)
{
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, &CurlWriteCallback);
curl_easy_setopt(curl, CURLOPT_URL, m_checkConnectionUrl);
CURLcode res = curl_easy_perform(curl);
}
Old question, but I have just encountered a similar problem. After some more googling this is the solution:
curl_easy_setopt(curl, CURLOPT_CONNECT_ONLY, 1L);
curl_easy_perform(curl);
// OK, now we are connected (if nothing bad happened),
// but it would be nice to communicate with the server:
curl_easy_setopt(curl, CURLOPT_CONNECT_ONLY, 0L);
//now we can do the actual communication
I used this to separate authentication from the actual emails sending.
I get this error from Curl, yet, the site is available, and I see nothing going out on wireshark.
What might cause this error ?
I've try to run it against www.google.com and got the same error.
this very code was working a few hours ago. I have no idea what might cause this.
here is the code:
CURL *curl;
CURLcode res = CURLE_OK;
struct curl_slist *headers=NULL;
headers = curl_slist_append(headers, "Content-Type: text/xml");
curl_global_init(CURL_GLOBAL_ALL);
curl = curl_easy_init();
if(curl) {
struct rcvdstring s;
init_string(&s);
string FullAddress = URL+Method;
curl_easy_setopt(curl, CURLOPT_URL, FullAddress.c_str());
curl_easy_setopt(curl, CURLOPT_SSL_VERIFYPEER, 0L);
res = curl_easy_setopt(curl, CURLOPT_POSTFIELDS, DATA.c_str());
res = curl_easy_setopt(curl, CURLOPT_HTTPHEADER, headers);
char buf[1024];
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, writefunc);
curl_easy_setopt(curl, CURLOPT_WRITEDATA, &s);
res = curl_easy_perform(curl);
Respons.assign(s.ptr);
return res;
}
* Update - took the exe to another computer, it works there. This is local to my computer....
Ok, Moving the code to another computer - showed me the code is OK.
The problem was with Norton Internet Security that after a few days of developing decided to block my communication DLL.