Downloading UTF-8 file with libcurl (ANSI works fine) - c++

I am writing an simple file downloader with a help of libcurl. Here's the code for downloading the file from HTTP server:
static size_t WriteCallback(void *contents, size_t size, size_t nmemb, void *userp) {
((std::string*)userp)->append((char*)contents, size * nmemb);
return size * nmemb;
}
std::wstring result; //result with polish letters (ą, ę etc.)
CURL *curl;
CURLcode res;
std::string readBuffer;
curl = curl_easy_init();
ERROR_HANDLE(curl, L"CURL could not been inited.", MOD_INTERNET);
curl_easy_setopt(curl, CURLOPT_URL, url.c_str());
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, WriteCallback);
curl_easy_setopt(curl, CURLOPT_WRITEDATA, &readBuffer);
curl_easy_setopt(curl, CURLOPT_SSL_VERIFYPEER, 0L);
curl_easy_setopt(curl, CURLOPT_SSL_VERIFYHOST, 0L);
curl_easy_setopt(curl, CURLOPT_HTTPAUTH, CURLAUTH_BASIC);
curl_easy_setopt(curl, CURLOPT_USERPWD, (login + ":" + password).c_str()); //e.g.: "login:password"
curl_easy_setopt(curl, CURLOPT_POST, true);
//curl_easy_setopt(curl, CURLOPT_ENCODING, "UTF-8"); //does not change anything
res = curl_easy_perform(curl);
curl_easy_cleanup(curl);
result = C::toWString(readBuffer);
return res == 0; //0 = OK
It works fine when the file I want to download is encoded as ANSI (according to e.g. Notepad++). But when I try to download the UTF-8 file (UTF-8 without BOM), I get an error with some characters (e.g. polish letters) due to encoding problem.
For example, I run the code for two files with the same text ("to jest teść to") and saved it to std::wstring. The result is from ANSI file and result2 (problematic) from UTF-8 version:
Both files opened on server with e.g. Notepad++ displays the right text.
So, how can I get the UTF-8 file content with libcurl and save it to std::wstring with the proper encoding (so the debugger of Visual Studio will show it as to jest teść to)?

This is not a libcurl issue. You are storing the raw data in a std::string and then converting that to a std::wstring after the download is finished. You have to look at the charset reported in the HTTP response and decode the data to std::wstring accordingly. C::toWString() has no concept of charsets, so you should use something else, like ICONV or ICU. Or, if you know the data is always UTF-8, do the conversion manually (UTF conversions are easy to code by hand), or use C++11's built in UTF conversions using the std::wstring_convert class.

libcurl won't convert or translate the contents for you. It will deliver the exact bytes to your application that the server sent out.
You can use HTTP Accept headers etc to affect what the server responds, but then you need to check the received charset and convert accordingly by yourself if you're not satisfied with what you get.

Related

Issue trying to set up TTS service of IBM Watson using libcurl

I am trying to implement TTS service of IBM Watson using libcurl. I am sending the text "Hello World" and the voice to be synthesized will be "D:\log\Output.aac"
Setting up the CURLOPT_HTTPHEADER, CURLOPT_POSTFIELDS and CURLOPT_FILE is a bit of an issue for me as I am new to libcurl. How do I correctly set these fields? IBM Cloud service also reported error authenticating to my Watson service due use of deprecated of legacy credentials. I am lost, please help.
#include <curl/curl.h>
void Curl_Perform_TTS() {
CURL* curl;
curl_global_init(CURL_GLOBAL_DEFAULT);
curl = curl_easy_init();
if (curl)
{
curl_easy_setopt(curl, CURLOPT_URL, "<url>/v1/synthesize?text=Hello%20world");
curl_easy_setopt(curl, CURLOPT_USERNAME, "Text to Speech-ej"); //not sure, I use service name here
curl_easy_setopt(curl, CURLOPT_PASSWORD, "<API key>");
//curl_easy_setopt(curl, CURLOPT_RETURNTRANSFER, true); //Don't work
curl_easy_setopt(curl, CURLOPT_POST, true);
curl_easy_setopt(curl, CURLOPT_SSL_VERIFYPEER, 0L);
curl_easy_setopt(curl, CURLOPT_SSL_VERIFYHOST, 0L);
curl_easy_setopt(curl, CURLOPT_HTTPHEADER, "{\"Content-Type\":\"audio/flac\", \"Transfer-Encoding: chunked\"}");
curl_easy_setopt(curl, CURLOPT_POSTFIELDS, "{\"path\":\"D:\\log\"}");
curl_easy_setopt(curl, CURLOPT_FILE, "{\"Output.mp3\"}");
CURLcode result = curl_easy_perform(curl);
if (result != CURLE_OK)
{
fprintf(stderr, "curl_easy_perform() failed: %s\n",
curl_easy_strerror(result));
}
curl_easy_cleanup(curl);
}
}
int main()
{
Curl_Perform_TTS();
return 0;
}
The main problem, I think, is that you are not forming the header properly.
Looking into libcurl documentation you can notice that for adding an HTTP Header
CURLcode curl_easy_setopt(CURL *handle, CURLOPT_HTTPHEADER, struct curl_slist *headers);
You must create first a struct curl_slist, and pass it as the argument for curl_easy_setopt. For example:
struct curl_slist *headerslist = NULL;
// .....
//and later when you need to add a header, do it in this way
headerlist = curl_slist_append(headerslist, "Content-Type: audio/flac");
//When you are done, clean the memory used by the linked list
//This should be done after performing the request of course
curl_slist_free_all(headerslist);
This could be one of your issues, the other one is related to CURLOPT_POSTFIELDS. Which have this interface to deal with it:
CURLcode curl_easy_setopt(CURL *handle, CURLOPT_POSTFIELDS, char *postdata);
The documentation for this one is here. An example could be as simple as this
curl_easy_setopt(curl, CURLOPT_POSTFIELDS, "name=daniel&project=curl");
I think in general, this examples from the official documentation should help you to for the request correctly:
custom header
http post
simple post
And the list goes on. In case you need more examples, you could read all of them in here also

How to read compressed files c++

My Question is closed so I have to update this.
1- My purpose is to send my compressed file google cloud storage URL.
2- To do that I have generated a postman request. I have stored my file to my google cloud storage by using the postman tool and the tool has generated the following code.
CURL *curl;
CURLcode res;
curl = curl_easy_init();
if(curl) {
curl_easy_setopt(curl, CURLOPT_CUSTOMREQUEST, "PUT");
curl_easy_setopt(curl, CURLOPT_URL, "my URL");
curl_easy_setopt(curl, CURLOPT_FOLLOWLOCATION, 1L);
curl_easy_setopt(curl, CURLOPT_DEFAULT_PROTOCOL, "https");
struct curl_slist *headers = NULL;
headers = curl_slist_append(headers, "Content-Type:
application/octet-stream");
curl_easy_setopt(curl, CURLOPT_HTTPHEADER, headers);
curl_easy_setopt(curl,CURLOPT_POSTFIELDS,"<file contents here>");
res = curl_easy_perform(curl);
}
curl_easy_cleanup(curl);
3- Then I have copied the code above into my c++ project to send my compressed file to the URL.
4- To create CURLOPT_POSTFIELDS content I did implement the following code;
std::ifstream ifs;
ifs.open ("./compressed.gz", std::ios::binary |
std::ios::ate);
PRINT("Size ->", ifs.tellg());
curl_easy_setopt(curl, CURLOPT_POSTFIELDS, &ifs);
And when I compile and run my code, the request returns me the 200 success response.
But when I checked the google storage dashboard, it just contains 6 bytes of data. Actually, the size of my ifstream data is 1090.
So my problem is that why my request uploads all bytes of the compressed file to cloud storage? Whats wrong in my code ?
How to read compressed files c++
Compressed files are generally binary formats, they are not null terminated text. You cannot use strlen to get their length because that requires a null terminated text as input.
You can use any UnformattedInputFunction to read binary data. Don't forget to open any stream in binary mode.
I need to put the compressed file in a char pointer to put it to the server by using libcurl.
You don't need to read the file in order to do that. You can let libcurl take care of reading the file by passing a FILE* to CURLOPT_READDATA.
This way it won't be necessary to store the entire file in memory. Reading the entire file could be a problem if the file is very large.
You can do something like that when you want to read binary data:
std::ifstream file("./compressed.gz", std::ios::binary);
if (!file.is_open())
ERROR();
std::vector<unsigned char> buffer(std::istreambuf_iterator<char>(file), {});
Then you will have it in the vector buffer. Access the size via buffer.size() and to get raw data use buffer.data() Note that since buffer is an std::vector if it goes outside of scope it will be destructed so the data will be deleted.

libcurl IMAP not working

Using the below code, I'm trying to get any of the libcurl IMAP commands to work.
Currently, regardless of the command set via CURLOPT_CUSTOMREQUEST, in my callback function the only data that is given is the oldest email (1st) in my inbox. I can even put something like "dfsafdasfasfaf" in the CURLOPT_CUSTOMREQUEST, and no error will be shown, and the oldest email will be printed from the callback.
I've tried using the sample codes on libcurl's site, to list folders, LSUB, etc and it's always the same - the only thing returned is the contents of the 1st email in my inbox.
I'm using curl 7.40 mingw32 on win32 g++ (-lcurldll).
Surely I must be doing something wrong. If you could take a moment to correct my error, I would be most appreciative. Thank you.
EDIT - Even if you don't know the answer, could you please leave a comment if you have successfully gotten libcurl IMAP to work before? Because if no one has gotten libcurl imap to work before I'll stop wasting my time with it and move on to VMime or another option..
EDIT2- My principal question is how can I list folders via libcurl?
size_t writeCallback(char* buf, size_t size, size_t nmemb, void* up)
{
printf("%s\n", buf);
return size*nmemb; //tell curl how many bytes we handled
}
int main(void)
{
CURL *curl;
CURLcode res = CURLE_OK;
curl = curl_easy_init();
if(curl) {
curl_easy_setopt(curl, CURLOPT_USERNAME, "gmailuser");
curl_easy_setopt(curl, CURLOPT_PASSWORD, "password");
curl_easy_setopt(curl, CURLOPT_URL, "imaps://imap.gmail.com/INBOX");
curl_easy_setopt(curl, CURLOPT_SSL_VERIFYPEER, 0L);
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, &writeCallback);
curl_easy_setopt(curl, CURLOPT_VERBOSE, 1L);
curl_easy_setopt(curl, CURLOPT_CUSTOMREQUEST, "LIST");
res = curl_easy_perform(curl);
if(res != CURLE_OK)
{
fprintf(stderr, "curl_easy_perform() failed: %s\n", curl_easy_strerror(res));
}
curl_easy_cleanup(curl);
}
_getch ();
return (int)res;
}
In order to get list of folders in a given GMail inbox, you should use:
curl_easy_setopt(curl, CURLOPT_URL, "imaps://imap.gmail.com/");
Also, I believe you don't need this line to perform LIST request:
curl_easy_setopt(curl, CURLOPT_CUSTOMREQUEST, "LIST");
I have tested it on Linux, on libcurl version 7.35.0, though I believe the problems you are encountering are not OS-specific and are rather caused by the current state of the implementation of IMAP support in the library. You can find source code for libcurl version 7.35.0 here.
You can also find more examples of current libcurl IMAP support on the examples page (see the links on the right for more detailed examples).

How to get an image using curl?

I'm having trouble getting an image from a URL using curl. It works if I pass the URL in as the constructor of an ImageMagick Image object. But using curl I'm not having much luck and I need to use curl.
Right now I'm doing...
curl_easy_setopt(curl, CURLOPT_URL, url.c_str());
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, &curlCallback);
curl_easy_setopt(curl, CURLOPT_FOLLOWLOCATION, 1L);
curl_easy_perform(curl);
And then
size_t curlCallback(char* buf, size_t size, size_t nmemb, void* up)
{
ofstream out;
out.open("/home/name/Desktop/img.png");
out.write(buf, nmemb * size);
return size * nmemb;
}
It does seem to get the start of a PNG, but not the whole thing. It only returns 251 bytes (header info or something maybe??). An image viewer will open it as a png and know its resolution, but the image itself is blank. If I print the buffer to console, I see ?PNG and then the binary data symbol.
I know its not a problem with the remote host because if I use ImageMagick:
Image image = Image(url);
Then I get the image in its entirety and can save it and it's just fine.
The function set with CURLOPT_WRITEFUNCTION (curlCallback in your case) can be called multiple times during the download (see the docs).
Using CURLOPT_WRITEDATA passing in a FILE* might be easier.

Corrupted Binary Files after Transfer libcurl

I am transferring a binary file (.exe) with FTP using libcurl, and saving it to a local file. The problem is that after the file is transferred, it is altered and is no longer a valid Win32 application, and doesn't run. Here's how I'm doing it:
CURL *curl;
curl = curl_easy_init();
FILE* f = fopen("C:\\blah.exe", "w");
if(curl) {
curl_easy_setopt(curl, CURLOPT_URL, "ftp://ftp.mysite.com");
curl_easy_setopt(curl, CURLOPT_USERPWD, "blah:blah");
curl_easy_setopt(curl, CURLOPT_FTP_FILEMETHOD, CURLFTPMETHOD_SINGLECWD);
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, NULL);
curl_easy_setopt(curl, CURLOPT_WRITEDATA, &f);
} else {
fclose(f);
return CURL_EASY_INIT_FAIL;
}
fclose(f);
The file is written but is bigger than it is on the FTP server. Like I said, trying to run it results in the "%1 is not a valid Win32 application" error. Did I forget to set an option or something?
You forgot the binary flag.
This is the correct code:
FILE* f = fopen("C:\\blah.exe", "wb");
The reason is that you transfer as ASCII and not as binary. So your end of lines might get broken. Of there are CRs in the binary they might turn into CR LF or the other way around. Tune CURL to make a binary transfer.