libcurl outputting extra trailing bytes when downloading file - c++

I'm having a bit of a strange problem with libcurl. Essentially, while downloading a file from an HTTP server, it's outputting some garbage bytes at the end of the file. Whereas the file should be 1,710,017 bytes, the library instead writes 1,712,128, i.e. 2,111 more. I suspect it's some sort of buffering issue, as the latter number is a multiple of 2^12 (and 2^13, but it conforms to multiples of 2^12 in other cases). The extra data is either a respective number of bytes read from another part of the file (it only seems to read from one of 4 addresses each time, all towards the end), or in one case the byte CD repeated 2,111 times.
Relevant code:
std::string url; // defined elsewhere
FILE* data; // initialized elsewhere with option "wb"
CURL* query = curl_easy_init();
curl_easy_setopt(query, CURLOPT_WRITEDATA, data);
curl_easy_setopt(query, CURLOPT_URL, url);
curl_easy_setopt(query, CURLOPT_FOLLOWLOCATION, true);
curl_easy_setopt(query, CURLOPT_SSL_VERIFYPEER, false);
CURLcode res = curl_easy_perform(query);
Also: the same issue occurs when using a simple write callback, and the issue occurs with any given remote server, not just this particular one.
Edit #1: I can only replicate this on Windows (tested on two machines with the same library files). It works on Debian.
Edit #2: It also occurs when libcurl is built on my laptop. To provide additional context, I am building from Marc Hörsken's ZIP (available from the official curl downloads page) using a VC14 environment on Windows 10.

I'm chalking this up to some weird VC bug or otherwise anomalous behavior. Rewriting the problem code to use an fstream instead of a FILE did the trick.

Related

libcurl in C++: converting URL to some weird symbols

I have noticed that libcurl for C++ changes the URL provided to some weird symbols. Here is the code:
curl_global_init(CURL_GLOBAL_ALL);
curl_handle = curl_easy_init();
cout << "http://subdomain.mydomain.com/folder/check.php?key=" + key << endl;
curl_easy_setopt(curl_handle, CURLOPT_URL, "http://subdomain.mydomain.com/folder/check.php?key=" + addon_key);
curl_easy_setopt(curl_handle, CURLOPT_WRITEFUNCTION, &writeCallback);
curl_easy_setopt(curl_handle, CURLOPT_VERBOSE, 1L);
res = curl_easy_perform(curl_handle);
That's what I get in the console:
http://subdomain.mydomain.com/folder/check.php?key=tasdasm34234k23l423m4234mn23n4jk23bjk4b23nasdasdasdasdsdsd
* Rebuilt URL to: � ��g/
* IDN support not present, can't parse Unicode domains
* getaddrinfo(3) failed for � ��g:80
* Couldn't resolve host '� ��g'
* Closing connection 0
This code works perfectly fine when I build my project in Windows, but when I build it with Linux, this happens. If I just try to access "http://subdomain.mydomain.com/folder/check.php" with this code, it works, but as soon as I add the key, libcurl changes the whole URL.
Thanks in advance.
Like I said in my comment, the CURL library is a library of C functions, and C functions doesn't know anything about objects or classes from C++.
When you do "http://subdomain.mydomain.com/folder/check.php?key=" + addon_key the result is a (temporary) std::string object. Passing that to a C function will not work well, and I'm surprised that the compiler actually let you pass that argument without complaining. It should have been a compiler error I think, or at the very least should give you a stern warning.
You can solve this by creating another variable to store the string object, and the use the c_str member function to get a C-style string (a pointer to constant char):
std::string url = "http://subdomain.mydomain.com/folder/check.php?key=" + addon_key;
curl_easy_setopt(curl_handle, CURLOPT_URL, url.c_str());
I don't know if cURL copies the string, or if you need to keep the url variable alive until you're all done.
That it apparently work on Windows is nothing more than pure luck. Passing a C++ object to a function that does not expect it is undefined behavior.

C++ WinINet InternetReadFile function refresh

I am trying to get the content of a file using WinHTTP in C++. The file is a XML File and is generated by a executable on a server.
The code for init, connect and even read a file on the specified server address is working.
// Connect to internet.
m_hInternet = InternetOpen(L"HTTPRIP",INTERNET_OPEN_TYPE_PRECONFIG,NULL,NULL,0);
// Check if worked.
if( !m_hInternet )
return;
// Connect to selected URL.
m_hUrl = InternetOpenUrlA(m_hInternet, strUrl.c_str(), NULL, 0, INTERNET_FLAG_PRAGMA_NOCACHE | INTERNET_FLAG_RESYNCHRONIZE, 0);
// Check if worked.
if( !m_hUrl )
return;
if( InternetReadFile(m_hUrl, buf, BUFFER_SIZE, &bytesread) && bytesread != 0 )
{
// Put into std::string.
strData = std::string(buf,buf+bytesread);
}
Now I want to update the file (same address). The server update the file at 50Hz and I want my code to be able to ReadFile only if it has been updated by the server. Can InternetReadFile do that kind of thing? Maybe with a FLAG but I didn't find a thing on MSDN.
Thanks for your help.
There is no way in the HTTP protocol for you directly do that, hence there is no such function in WinHTTP. The easiest solution might be to download the file and see if it's changed, if the file is relatively small, or if the file is large, let the server which writes the file, also write a timestamp, checksum or counter increment file next to it.
Then your code would download the checksum file, see if it's changed, and in that case download the original file.
Or another solution would be to put a timestamp or similar data in the beginning of the XML file, and stop downloading the file if the timestamp (or checksum) is not updated. (This comes with its own drawbacks of course, you may have to write your own parser.)
If HTTP server has a page with info (e.g. timestamp) on this file (no matters that a file is generated; the page may be generated too), you may examine this page.
As you know that server updates the file with (nearly) constant speed, your app may just use the timer.
P.S. I doubt if there's really a sense in reading some file 50 times every second.

iOS file size during write using only C/C++ APIs

Purpose: I am monitoring file writes in a particular directory on iOS using BSD kernel queues, and poll for file sizes to determine write ends (when the size stops changing). The basic idea is to refresh a folder only after any number of file copies coming from iTunes sync. I have a completely working Objective-C implementation for this but I have my reasons for needing to implement the same thing in C++ only.
Problem: The one thing stopping me is that I can't find a C or C++ API that will get the correct file size during a write. Presumably, one must exist because Objective-C's [NSFileManager attributesOfItemAtPath:] seems to work and we all know it is just calling a C API underneath.
Failed Solutions:
I have tried using stat() and lstat() to get st_size and even st_blocks for allocated block count, and they return correct sizes for most files in a directory, but when there is a file write happening that file's size never changes between poll intervals, and every subsequent file iterated in that directory have a bad size.
I have tried using fseek and ftell but they are also resulting in a very similar issue.
I have also tried modified date instead of size using stat() and st_mtimespec, and the date doesn't appear to change during a write - not that I expected it to.
Going back to NSFileManager's ability to give me the right values, does anyone have an idea what C API call that [NSFileManager attributesOfItemAtPath:] is actually using underneath?
Thanks in advance.
Update:
It appears that this has less to do with in-progress write operations and more with specific files. After closer inspection there are some files which always return a size, and other files that never return a size when using the C API (but will work fine with the Objective-C API). Even creating a copy of the "good" files the C API does not want to give a size for the copy but works fine with the original "good" file. I have both failures and successes with text (xml) files and binary (zip) files. I am using iTunes to add these files to the iPad's app's Documents directory. It is an iPad Mini Retina.
Update 2 - Answer:
Probably any of the above file size methods will work, if your path isn't invisibly trashed, like mine was. See accepted answer on why the path was trashed.
Well this weird behavior turned out to be a problem with the paths, which result in strings that will print normally, but are likely trashed in memory enough that file descriptors sometimes didn't like it (thus only occurring in certain file paths). I was using the dirent API to iterate over the files in a directory and concatenating the dir path and file name erroneously.
Bad Path Concatenation: Obviously (or apparently not-so-obvious at runtime) str-copying over three times is not going to end well.
char* fullPath = (char*)malloc(strlen(dir) + strlen(file) + 2);
strcpy(fullPath, dir);
strcpy(fullPath, "/");
strcpy(fullPath, file);
long sizeBytes = getSize(fullPath);
free(fullPath);
Correct Path Concatenation: Use proper str-concatenation.
char* fullPath = (char*)malloc(strlen(dir) + strlen(file) + 2);
strcpy(fullPath, dir);
strcat(fullPath, "/");
strcat(fullPath, file);
long sizeBytes = getSize(fullPath);
free(fullPath);
Long story short, it was sloppy work on my part, via two typos.

libCurl : curl_easy_setopt in one method and curl_easy_perform in another does not work

I have a code, where, in one local function I use curl_easy_setopt to set the proxy URL. And in another local function I call curl_easy_perform. But when te control moves from one function to another, the proxy url set using local variable contains junk characters and the DNS query returns an error. The libcurl help page says that when we do setopt the string values is copied by the curl library. But I feel the library just referes to that value whenever it needs it. It doesn't copy the string. So if local variable is used to set proxy url, it will contain junk by the time I call curl_easy_perform.
Following is the example code snippet.
void funcSetOpt
{
char ProxyUrl[] = "someproxy";
curl_easy_setopt(curlHandle, CURLOPT_PROXY, ProxyUrl);
}
void funcPerform
{
curl_easy_perform(curlHandle);
}
That would imply that you're using a fairly old libcurl version and the following section from the curl_easy_setopt man page might affect you:
Before version 7.17.0, strings were not copied. Instead the user was
forced keep them available until libcurl no longer needed them.

How to get the length of a file without downloading the file in a cURL binary get request

I want to create a cURL request in some C++ code which will get me the length of a file in a server without downloading the file. For that, I use some cURL options to tell I only want headers in the request response, and then I examine the response to get the file length.
I'm setting the following request options:
curl_easy_setopt(_curl_handle, CURLOPT_HEADER, 1);
curl_easy_setopt(_curl_handle, CURLOPT_NOBODY, 1);
Then processing the request, waiting for the response, which shows a OK=200, and finally enquiring about the file length:
curl_easy_getinfo(_curl_handle, CURLINFO_CONTENT_LENGTH_UPLOAD, &dResult);
But I get a file length of -1. According to cURL documentation, that means size is unknown. How can it happen that cURL doesn't get the file length information from the server?
CURLINFO_CONTENT_LENGTH_UPLOAD is the number of bytes uploaded. You need to use CURLINFO_CONTENT_LENGTH_DOWNLOAD instead.
Note that if the server dynamically generates the data, the length may be different when you actualy download the file versus just downloading its headers.
Also note that if the server sends data as compressed when downloaded, there may not be any size available in the headers (if the Transfer-Encoding header is used instead of the Content-Length header), so CURLINFO_CONTENT_LENGTH_DOWNLOAD would still return -1. The only way to know the size in that situation would be to download it in full.
Have you tried with CURLINFO_CONTENT_LENGTH_DOWNLOAD instead?
need call perform()
curl_easy_setopt(_curl_handle, CURLOPT_HEADER, 1);
curl_easy_setopt(_curl_handle, CURLOPT_NOBODY, 1);
curl_easy_perform(_curl_handle);
curl_easy_getinfo(_curl_handle, CURLINFO_CONTENT_LENGTH_UPLOAD,
&dResult);