PDF download corrupted / HTTP header problems

PDF download corrupted / HTTP header problems - c++

I have some legacy code in one of application which is used for PDF file download (PDF file is around 350-400KB size) and recently we had complains (from around 1% customers) saying PDF download is failing with damaged/corrupted file errors.
Here is snippet of code (C++ application) setting headers for download
String header;
header.append("Content-type: application/force-download\r\n");
header.append("Content-Transfer-Encoding: Binary\r\n");
header.append("Content-length: %d\r\n", filebuf.length());
header.append("Connection: Close\r\n");
header.append("Content-disposition: attachment; filename=%s\r\n\r\n", filename_to_download.chars());
String class and append method is just for an example.
I understand above headers are not the best way to trigger PDF file download (I've simplified headers by having "Content-Type application/octet-stream and Content-Disposition : attachment; filename=example.pdf" and seems like its working for me).
But I am not able to understand why above original code should not work 1% of time.
I was trying to understand browser/adobe combination but seems there is no pattern here, YES one thing few of customers mentioned is when they changed to "chrome browser" it worked most of the times.
Any pointers?

After couple of days of struggle finally figured out whats happening in here.
We are setting content length as size of buffer (pdf file size) in header from our code and sending this data to client but in-between apache module mod_gzip/mod_deflate is compressing data buffer and what reaches client/browser is "Content-Length: 100 bytes" but actual data is say 60-70 bytes.
Not every browser complains about this mismatch but certain browsers treats this as FATAL error and shows message "couldn't download file" (we've seen this issue frequently in Win8/IE10 and Win8/IE11, there could be some other security settings too causing this on browser!).
For the fix, we've removed "Content-Length" from header.

Related

C++ - CGI - Audio not working properly

I have a website with an HTML5 audio element whose audio data shall be served via a cgi script.
The markup is rather simple:
<audio controls>
<source type="audio/mpeg" src="audio.cgi?test.mp3">
<em>Me, your browser does not support HTML5 audio</em>
</audio>
The cgi is written in C++ and is pretty simple too, I know there is need of optimizing, e.g. reading the whole file in a buffer is really bad, but that's not the point.
This basic version kinda works, meaning the audio is played, but the player does not display the full length and one can only seek through the track in parts that have already been played.
If the audio file is placed in a location accessible via the web-server everything works fine.
The difference between these two methods seems to be, that the client issues a partial-content request if the latter method is chosen and an ordinary 200 if I try to serve the audio data via the cgi at once.
I wanted to implement partial-content serving into the cgi but I failed to read out the environment variable Request-Range, which is needed to serve the requested part of data.
This leads me to my questions:
Why does the HTML5 player not display the full length of the track if I'm serving the audio data via the cgi script?
Would implementing a partial-content handling solve this issue?
If the partial-content handling is the right approach, how would I access the required environment variables in apache, since I have not found anything about them? Do I need to send a complete HTTP header indicating partial-content is coming, so the client knows he needs to send the required fields?
This is the source of the .cgi:
void serveAudio()
{
//tried these, were not the right ones
//getenv("HTTP_RANGE");
//getenv("HTTP_CONTENT_RANGE");
ifstream in(audioFile, ios::binary | ios::ate);
size_t size = in.tellg();
char *buffer = new char[size];
in.seekg(0, ios::beg);
in.read(buffer, size);
cout<<"Content-Type: audio/mpeg\n\n";
cout.write(buffer, size);
}
Any suggestions and helpful comments are appreciated!
Thanks in advance!
P.S.:
Forgot to mention that this behaviour applies to FF 31 and IE 11.

iOS file size during write using only C/C++ APIs

Purpose: I am monitoring file writes in a particular directory on iOS using BSD kernel queues, and poll for file sizes to determine write ends (when the size stops changing). The basic idea is to refresh a folder only after any number of file copies coming from iTunes sync. I have a completely working Objective-C implementation for this but I have my reasons for needing to implement the same thing in C++ only.
Problem: The one thing stopping me is that I can't find a C or C++ API that will get the correct file size during a write. Presumably, one must exist because Objective-C's [NSFileManager attributesOfItemAtPath:] seems to work and we all know it is just calling a C API underneath.
Failed Solutions:
I have tried using stat() and lstat() to get st_size and even st_blocks for allocated block count, and they return correct sizes for most files in a directory, but when there is a file write happening that file's size never changes between poll intervals, and every subsequent file iterated in that directory have a bad size.
I have tried using fseek and ftell but they are also resulting in a very similar issue.
I have also tried modified date instead of size using stat() and st_mtimespec, and the date doesn't appear to change during a write - not that I expected it to.
Going back to NSFileManager's ability to give me the right values, does anyone have an idea what C API call that [NSFileManager attributesOfItemAtPath:] is actually using underneath?
Thanks in advance.
Update:
It appears that this has less to do with in-progress write operations and more with specific files. After closer inspection there are some files which always return a size, and other files that never return a size when using the C API (but will work fine with the Objective-C API). Even creating a copy of the "good" files the C API does not want to give a size for the copy but works fine with the original "good" file. I have both failures and successes with text (xml) files and binary (zip) files. I am using iTunes to add these files to the iPad's app's Documents directory. It is an iPad Mini Retina.
Update 2 - Answer:
Probably any of the above file size methods will work, if your path isn't invisibly trashed, like mine was. See accepted answer on why the path was trashed.

Well this weird behavior turned out to be a problem with the paths, which result in strings that will print normally, but are likely trashed in memory enough that file descriptors sometimes didn't like it (thus only occurring in certain file paths). I was using the dirent API to iterate over the files in a directory and concatenating the dir path and file name erroneously.
Bad Path Concatenation: Obviously (or apparently not-so-obvious at runtime) str-copying over three times is not going to end well.
char* fullPath = (char*)malloc(strlen(dir) + strlen(file) + 2);
strcpy(fullPath, dir);
strcpy(fullPath, "/");
strcpy(fullPath, file);
long sizeBytes = getSize(fullPath);
free(fullPath);
Correct Path Concatenation: Use proper str-concatenation.
char* fullPath = (char*)malloc(strlen(dir) + strlen(file) + 2);
strcpy(fullPath, dir);
strcat(fullPath, "/");
strcat(fullPath, file);
long sizeBytes = getSize(fullPath);
free(fullPath);
Long story short, it was sloppy work on my part, via two typos.

determine file type c

I am trying to write a client/server program in C++ with Visual Studio 2008. So far the project runs does the following:
Run the webserver on cmd prompt - webserver 8080
open web browser - localhost 8080
to open local html file - localhost:8080/demo.html
But now... let's say the client requests for a gif file, then the server should send gif file.
client request for txt file, then the server should send .txt file. Similarly for .html and .xbm files.
I don't know how to do it.. Any help greatly appreciated.

On UNIX systems you'd use the file command: it uses a set of known "magic number" which are used to identify different file types. anda few heuristics to address the remaining files. Most file formats have some sort of identifier embedded, often in the first couple of bytes. Especially text files normally don't have a magic number but use only printable characters instead (with UTF8 and UTF16 being popular, classifying text files became a bit harder).
Once the file type is determined, you'd just set ghe corresponding HTTP header(s).

okay, because we're in the same class, I'll give you a clue :)
In the header part, put some if-else like this:
if(strcmp(type,"html")==0){
(void) sprintf(buff,"Content-Type:text/html\r\n");
(void) send(conn,buff,strlen(buff),0);
}
else if(strcmp(type,"gif")==0){
(void) sprintf(buff,"Content-Type:image/gif\r\n");
(void) send(conn,buff,strlen(buff),0);
}
Got it? And by the way, you need to get the extension (check path using endsWith function), compare the extension with file type then give out the right header. Test it with gif file :) I have it works already :) Going to submit now. Remember to vote up for me :)

How to get the file size before downloading

I am trying to download a binary file from a http: server. I am using the functions InternetOpenUrl() and then InternetReadFile() to download the file. Is it possible to know the size of the file before downloading?.
Thanks
Vinod

You can use HttpQueryInfo() with HTTP_QUERY_CONTENT_LENGTH for this, although the server is not required to send the content length and so you should not rely on this.

You might want to call InternetGetLastResponseInfo after the call to InternetOpenUrl to get the response headers. Those should (most likely) contain the content-length of the file.

Check for valid image

I'm writing a program that downloads information from the web and part of that is images.
At the moment I'm having a problem as the code to download the images is a different part to the code that displays them (under mvc). If a 404 is issued or the image download fails some way the display code pops a message propmt up which i would like to avoid.
Is there an easy way to check to see if an image is valid? I'm only concerned about jpg, gif and png.
Note: I dont care about reading the image data, just to check to see if it is valid image format.

Do you want to check whether the download would be successful? Or do you want to check that what is downloaded is, in fact, an image?
In the former case, the only way to check is to try to access it and see what kind of HTTP response code you get. You can send an HTTP HEAD request to get the response code without actually downloading the image, but if you're just going to go ahead and download the image anyway (if it's successful) then sending a separate HEAD request seems like a waste of time (and bandwidth).
Alternatively, if you really want to check that what you're downloading is a valid image file, you have to read the whole file to check it for corruption. But if you just want to check that the file extension is accurate, it should be enough to check the first few bytes of the file. All GIF images start with the ASCII text GIF87 or GIF89 depending on which GIF specification is used. PNG images start with the ASCII text PNG, and JPEG images have some magic number, which appears to be 0xd8ffe0ff based on the JPEGs I looked at. (You should do some research and check that, try Wikipedia for links) Keep in mind, though, that to get even the first few bytes of the image, you will need to send an HTTP request which could return a 404 (and in that case you don't have any image to check).

Thanks for the answers guys. I have all ready downloaded the file so i went with just checking the magic number as the front end i use (wxWidgets) all ready has image library's and i wanted something very light.
uint8 UTIL_isValidImage(const unsigned char h[5])
{
//GIF8
if (h[0] == 71 && h[1] == 73 && h[2] == 70 && h[3] == 56)
return IMAGE_GIF;
//89 PNG
if (h[0] == 137 && h[1] == 80 && h[2] == 78 && h[3] == 71)
return IMAGE_PNG;
//FFD8
if (h[0] == 255 && h[1] == 216)
return IMAGE_JPG;
return IMAGE_VOID;
}

If you really want to know if an image file is valid, you actually have to decode it (although you don't need to store the bits). This is because the file might be the wrong size or might be corrupted.
If you're using an HTTP library to do the downloads, you should be able to examine the header and know that you're getting a 404 error and not a real payload. Look at the documentation for the library you're using.
If you're getting back a file and you want to see if it's probably an image without fully-decoding it, then you'll need to check at least the headers for validity. libpng and libjpeg offer pretty low-level access to png and jpeg files, respectively. You could also look at higher-level libraries like ImageMagick, Microsoft's MFC, or whatever library is most appropriate for your platform.

When you GET a resource through HTTP, you must use the Content-Type header to determine how to process the content. If you've already downloaded it to a local file, the information that a real web browser relies upon is already lost. In many cases, the URL will match the Content-Type (e.g. http://example.com/image.png is served up as Content-Type: image/png). However, you cannot rely on this.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

PDF download corrupted / HTTP header problems - c++

Related

C++ - CGI - Audio not working properly

iOS file size during write using only C/C++ APIs

determine file type c

How to get the file size before downloading

Check for valid image

Categories

Resources