Extract specific data from webpage - c++

Basically this is my code :
int main()
{
CURL *curl;
FILE *fp;
CURLcode res;
std::string readBuffer;
curl = curl_easy_init();
char outfilename[FILENAME_MAX] = "C:\\Users\\admin\\desktop\\test.txt";
if(curl) {
fp = fopen(outfilename,"wb");
curl_easy_setopt(curl, CURLOPT_URL, "http://www.example.com");
curl_easy_setopt(curl, CURLOPT_POSTFIELDS, "user=123&pass=123");
curl_easy_setopt(curl, CURLOPT_FOLLOWLOCATION, 1);
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, write_data);
curl_easy_setopt(curl, CURLOPT_WRITEDATA, fp);
res = curl_easy_perform(curl);
Sleep(1000);
curl_easy_cleanup(curl);
fclose(fp);
}
return EXIT_SUCCESS;
}
The output is successfully saved in the text file.
My concern is how to extract specific content in between specific tags.
For example i want only the content between < bla> .............. < /bla> .
Whats the easiest way and thank you.

In your Example, you are dumping the response from the website to a file, libcURL writes the data returned by the webpage that you hit as it is, it does not take efforts for restructuring the returned data.
You can obtain the data in a memory, by defining the write_data function, which needs the following format only:
size_t write_data(char *ptr, size_t size, size_t nmemb, void *userdata);
Once you get the data in a memory, you can parse it and restructure it as required.
See Example Here for using write_data function.
For XML Parsing you may use This sample code

Related

Downloading zip file using curl (c++)

I'm trying to download a zip file using curl from a localhost server
http://localhost:8080/zip/json.zip?assetIds=<comma-separated asset ids>
when I type the url on my browser the file starts downloading with no problem.
So when I tried to use curl to an already existing zip file :
RestClient::response RestClient::get(const std::string& url)
{
RestClient::response ret = {};
CURL *curl = NULL;
CURLcode res = CURLE_OK;
FILE *fp
curl = curl_easy_init();
if (curl)
{
curl_easy_setopt(curl, CURLOPT_URL, url.c_str());
char outfilename[FILENAME_MAX] = "/Users/stage/Documents/temp/json.zip";
fp = fopen(outfilename,"wb");
curl_easy_setopt(curl, CURLOPT_CAINFO, "./ca-bundle.crt");
curl_easy_setopt(curl, CURLOPT_SSL_VERIFYPEER, false);
curl_easy_setopt(curl, CURLOPT_SSL_VERIFYHOST, false);
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, RestClient::write_data);
curl_easy_setopt(curl, CURLOPT_WRITEDATA, fp);
int i=fclose(fp);
if( i==0)
system("unzip -j json.zip");
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, RestClient::write_callback);
curl_easy_setopt(curl, CURLOPT_WRITEDATA, &ret);
curl_easy_setopt(curl, CURLOPT_HEADERFUNCTION, RestClient::header_callback);
curl_easy_setopt(curl, CURLOPT_HEADERDATA, &ret);
res = curl_easy_perform(curl);
if (res != CURLE_OK)
{
ret.body = "Failed to query.";
ret.code = -1;
return ret;
}
long http_code = 0;
curl_easy_getinfo(curl, CURLINFO_RESPONSE_CODE, &http_code);
ret.code = static_cast<int>(http_code);
curl_easy_cleanup(curl);
curl_global_cleanup();
}
return ret;
}
and the function for writing to the file
size_t RestClient::write_data(void *ptr, size_t size, size_t nmemb, FILE *stream) {
return fwrite(ptr, size, nmemb, stream);
}
when I run the code I get a message :
Archive: json.zip
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of json.zip or
json.zip.zip, and cannot find json.zip.ZIP, period.
The json.zip file that used to contain an image becomes an EMPTY file not even a zip :/
Does somebody know what went wrong?
I was having the exact same issue with the curl command in my script (not C++). I finally looked at the .zip file that was downloaded in a text editor and found it contained an HTML error message instead of being the actual file I thought I downloaded. I then did some debugging in my script and copied the URL I was creating and pasted it into my browser. I got the same error message when I checked the URL.
Long story short, I had a typo in my URL. I would venture to guess that you're having a very similar issue.
In my case however, it was as easy as calling the shell command "curl -o", your C++ code seems a bit more complicated than what I was doing.
I hope this helps in same way.

CURL finish executing and timeout

I'm performing a server request with curl in C++ which return responses in pieces and those pieces's size may also vary.
At the time of arrival of each piece, the callback function is being called. The problem is I can't detect when the connection finished in order to perform an another callback to my parent class.
And by the way, I want to know if we can set and detect timeout for a curl?
Here is my code in short:
CURL *curl = curl_easy_init();
curl_global_init(CURL_GLOBAL_ALL);
curl_easy_setopt(curl, CURLOPT_URL, "My URL");
curl_easy_setopt(curl, CURLOPT_POST, 1);
curl_easy_setopt(curl, CURLOPT_POSTFIELDS, "My Postfields");
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, writeCallback);
curl_easy_perform(curl);
curl_easy_cleanup(curl);
curl_global_cleanup();
The default callback:
size_t writeCallback(char* buf, size_t size, size_t nmemb, void* up)
{
//do something
//But how can I detect the last callback when connection finished
//in order to call an another one?
return size*nmemb;
}
The data you want can be saved off during the callback, then used once curl_easy_perform returns. Example:
CURL *curl = curl_easy_init();
curl_global_init(CURL_GLOBAL_ALL);
// NOTE: added to accumulate data.
std::string result;
curl_easy_setopt(curl, CURLOPT_URL, "My URL");
curl_easy_setopt(curl, CURLOPT_POST, 1);
curl_easy_setopt(curl, CURLOPT_POSTFIELDS, "My Postfields");
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, writeCallback);
curl_easy_setopt(curl, CURLOPT_WRITEDATA, &result); // NOTE: added
curl_easy_perform(curl);
// TODO: do something with your data stored in result
curl_easy_cleanup(curl);
curl_global_cleanup();
And in your write callback:
size_t writeCallback(char* buf, size_t size, size_t nmemb, void* up)
{
std::string* pstr = static_cast<std::string*>(up);
std::copy(buf, buf+size*nmemb, std::back_inserter(*pstr));
return size*nmemb;
}
or something along those lines. I leave all the error checking to you (and sorry for any typos; I don't have a compiler to validate this on immediately available to me).
Regarding timeout length, there are a multitude of timeout options available to a easy-mode curl request. Too many to mention here, in fact. See the documentation for curl_easy_setopt, in particular the connection options approximately 2/3rd of the way down the page.
Best of luck.

uploading file with libcurl

Take a look at the following code
static size_t reader(void *ptr, size_t size, size_t nmemb, FILE *stream) {
size_t retcode = fread(ptr, size, nmemb, stream);
cout << "*** We read " << retcode << " bytes from file" << endl;
return retcode;
}
void upload() { //upload() is called from ouside
FILE *pFile;
pFile = fopen("map.txt" , "r");
struct stat file_info;
stat("map.txt", &file_info);
size_t size = (size_t)file_info.st_size;
uploadFile(pFile, size);
}
bool uploadFile(void* data, size_t datasize) {
CURL *curl;
CURLcode res;
curl = curl_easy_init();
if (curl) {
char *post_params = ...;
curl_easy_setopt(curl, CURLOPT_URL, url);
curl_easy_setopt(curl, CURLOPT_UPLOAD, 1L);
curl_easy_setopt(curl, CURLOPT_POSTFIELDS, post_params);
curl_easy_setopt(curl, CURLOPT_POSTFIELDSIZE, (long) strlen(post_params));
curl_easy_setopt(curl, CURLOPT_READFUNCTION, reader);
curl_easy_setopt(curl, CURLOPT_READDATA, data);
curl_easy_setopt(curl, CURLOPT_INFILESIZE_LARGE, (curl_off_t) datasize);
res = curl_easy_perform(curl);
curl_easy_cleanup(curl);
}
return true;
}
When the code is executed, the following is outputed
*** We read 490 bytes from file
*** We read 0 bytes from file
after that the app does nothing (even not exiting).
Can someone point out at what's wrong here?
Will be grateful for any help!!!
There's some serious confusions shown in this code. Let me try to explain:
CURLOPT_UPLOAD - this will ask libcurl to PUT the file when the protocol of choice is HTTP
CURLOPT_POSTFIELDS - tells libcurl to POST the data that is provided in the additional argument (which has the size set with CURLOPT_POSTFIELDSIZE)
CURLOPT_READFUNCTION - provides libcurl an alternative way to get data than CURLOPT_POSTFIELDS to allow a POST that reads the data from a file. When using CURLOPT_UPLOAD this is the only way to provide data.
So in the end the questions left for you are:
Do you want PUT or POST?
Do you want to provide the data as a string or do you want it provided with a callback?

Using libcurl in C++ application to send POST data from value in .INI file

So I have a C++ application that takes a value from a key in a settings.INI file, uses libcurl to reach out to a PHP page, post that data, and it should return some different data.
And it actually works, aside from grabbing the data from the INI file. If I explicitely type in for the POSTFIELDS option for libcurl (e.i.: "Serial=454534" , instead of the variable storing the data that it retrived from my file).
Here's some code..
PHP:
<?
include("DBHeader.inc.php");
$Serial= $_POST['Serial'];
$sql = "SELECT * FROM `LockCodes` WHERE `sLockCode`=\"$Serial\"";
$RS=mysql_query($sql, $SQLConn);
$num_rows=mysql_num_rows($RS);
if ($num_rows>0)
{
while ($row = mysql_fetch_array($RS))
{
echo $row['iLockCodeID'];
}
}
else
{
echo "...";
}
?>
Snippet of C++ code:
TCHAR szKeyValue[36];
GetPrivateProfileString(_T("Test"), _T("LockCode"), _T(""), szKeyValue, 36, _T("C:\\Test\\Settings.INI"));
CString sLockCode = szKeyValue;
CURL *curl;
CURLcode res;
CString Serial = _T("Serial=") + sLockCode;
string LCID;
curl_global_init(CURL_GLOBAL_ALL);
curl = curl_easy_init();
if (curl)
{
curl_easy_setopt(curl, CURLOPT_URL, "http://regserver2.nyksys.com/GetLCID.php");
curl_easy_setopt(curl, CURLOPT_POSTFIELDS, Serial);
res = curl_easy_perform(curl);
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, &writeCallback);
curl_easy_setopt(curl, CURLOPT_VERBOSE, 1L);
res = curl_easy_perform(curl);
if (data == _T("..."))
{
data = "0";
AfxMessageBox("Invalid Serial Number");
exit(0);
}
My Settings.INI is in the standard format..
[Test]
LockCode=1D4553C7E7228E462DBAAE267977B7CDED8A
What happens is whenever I use the variable "Serial" instead of typing it in, the PHP page returns "..." instead of the desired result.
I feel like I'm missing something obvious. Any help would be TREMENDOUSLY appreciated.
I am guessing that you have _UNICODE defined, which means that TCHAR is wchar_t, CString is CStringT<wchar_t>, and the code:
curl_easy_setopt(curl, CURLOPT_POSTFIELDS, Serial);
actually passes a wide char string to curl_easy_setopt() when the function is expecting a narrow char string. If your machine is little Endian, then curl_easy_setopt() interprets the parameter as the string "S\x00e\x00r\x00i\x00... (on a Big Endian machine, it's "\x00S\x00e\x00r\x00i...) and because curl_easy_setopt() will use strlen() when CURLOPT_POSTFIELDSIZE is not set, the entire POST request body on a little Endian machine is S. See, for example, http://codepad.org/JE2MYZfU
What you need to do is use narrow char strings:
#define ARRAY_LEN(arr_id) ((sizeof (arr_id))/(sizeof ((arr_id)[0])))
char szKeyValue[36];
GetPrivateProfileStringA("Test", "LockCode", "", szKeyValue, ARRAY_LEN(szKeyValue), "C:\\Test\\Settings.INI");
CURL *curl;
CURLcode res;
CStringT<char> Body = CStringT<char>("Serial=") + szKeyValue;
string LCID;
curl_global_init(CURL_GLOBAL_ALL);
curl = curl_easy_init();
if (curl)
{
curl_easy_setopt(curl, CURLOPT_URL, "http://regserver2.nyksys.com/GetLCID.php");
curl_easy_setopt(curl, CURLOPT_POSTFIELDS, Body);
res = curl_easy_perform(curl);
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, &writeCallback);
curl_easy_setopt(curl, CURLOPT_VERBOSE, 1L);
res = curl_easy_perform(curl);
//...
Also, your PHP script looks vulnerable to SQL injection.
EDIT: One more thing. Are you setting CURLOPT_POST to 1?
curl_easy_setopt(curl, CURLOPT_POST, 1);

Downloading multiple files with libcurl in C++

I am currently trying to make an updater for my software project. I need it to be able to download multiple files, I don't mind if they download in sync or one after each other, whatever is easier (file size is not an issue). I followed the example from the libcurl webpage and a few other resources and came up with this:
#include <iostream>
#include <stdio.h>
#include <curl/curl.h>
#include <string.h>
size_t write_data(void *ptr, size_t size, size_t nmemb, FILE *stream) {
size_t written;
written = fwrite(ptr, size, nmemb, stream);
return written;
}
int main(void){
for (int i = 0; i < 2;){ //download 2 files (loop twice)
CURL *curl;
FILE *fp;
CURLcode res;
char *url = "http://sec7.org/1024kb.txt"; //first file URL
char outfilename[FILENAME_MAX] = "C:\\users\\grant\\desktop\\1024kb.txt";
curl = curl_easy_init();
if (curl){
fp = fopen(outfilename,"wb");
curl_easy_setopt(curl, CURLOPT_URL, url);
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, write_data);
curl_easy_setopt(curl, CURLOPT_WRITEDATA, fp);
res = curl_easy_perform(curl);
curl_easy_cleanup(curl);
fclose(fp);
}
url = "http://sec7.org/index.html"; //I want to get a new file this time
outfilename[FILENAME_MAX] = "C:\\users\\grant\\desktop\\index.html";
}
return 0;
}
The first issue is if i remove the new file assignments (*url = "http://...") and just try to loop the download code twice, the program simply stops responding. This occurs in any combination of the download being called more than once in the program. The other issue is that I am unable to change the value of the character array outfilename[FILENAME_MAX]. I feel like this is just some silly error I am making but no solution comes to mind. Thank you!
Why not put this in a function and call it twice?
Your syntax for the arrays is all wrong, plus all the variables inside the loop are local, which means they are destroyed after each loop iteration.
What Conspicuous Compiler said. That's what's causing your program to freeze; it's stuck in an infinite loop because i is never > 2.
Put your code into a function like so:
void downloadFile(const char* url, const char* fname) {
CURL *curl;
FILE *fp;
CURLcode res;
curl = curl_easy_init();
if (curl){
fp = fopen(fname, "wb");
curl_easy_setopt(curl, CURLOPT_URL, url);
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, write_data);
curl_easy_setopt(curl, CURLOPT_WRITEDATA, fp);
res = curl_easy_perform(curl);
curl_easy_cleanup(curl);
fclose(fp);
}
}
And call it twice with the relevant file names and urls:
downloadFile("http://sec7.org/1024kb.txt", "C:\\users\\grant\\desktop\\1024kb.txt");
downloadFile("http://sec7.org/index.html", "C:\\users\\grant\\desktop\\index.html");
The example function is very bad though, it's just an example. You should alter it to return error codes/throw exceptions, and stuff like that.