Get files (other than text) from .zip with libzip - c++

I am learning C++ and decided to train me by making a little program that extract files from zip, like text files, images, or even other zip files (but I don't want to extract them directly, one thing a time) with the libzip library.
So I made my program, but now I have a problem.
It extracts well text files, but not files like images or zip. It detects them, gives me exact names and sizes, but once extracted, they are just a few bytes. (but they are located where they should).
Here is my code: http://pastie.org/6221955
So if someone could help me to extract files that aren't texts from zip, it would be great! Thank you!

You're reading and writing binary data as a textual string. The problem is that strings use the presence of a NULL character (0-byte) to indicate end-of-string. Binary data can (and definitely does) contain zeros all over the place, not just at the end.
You need to use ofstream's .write (buffer, <size in bytes>) to write to the disk; by manually specifying the size in bytes, you force it read that many bytes instead of stopping at the first instance of a NULL character.

The issue is with the << operator. You output a character array / string. Strings in C are null terminated. Thus the first binary 0 will terminate your output.

Related

How to read large files in C++ with mixed text and binary

I need to read a large file of either text, binary, or combination, such as a JPEG file, encrypt it, and write it to a file. At some later time I will need to read the encrypted data, and decrypt it.
The end goal is to verify that the decrypted data matches the original data.
My problem is that with large files greater than 1Meg, I don't want to read and write character by character. I am targeting this code for a phone and I/O will cause too long a delay for the user.
With a pure text file, using fread() and fwrite() convert the data to binary, and the result is different than the original. With a jpeg image, it appears that there is some textual content mixed in with the binary data.
Is there a way to efficiently read in an arbitrary type of file and write it back in the original format?
Or is character by character the only option?
Or am I still out of luck?
After debugging it turned out that the decrypt function had the plain text and cipher text buffers assigned backwards. After swapping the buffer assignments, the decrypted results matched the original data. I originally thought that maybe reading the text as binary and then rewriting as binary would not appear as text, but I was wrong.
Reading the entire file as binary works just fine.

Add/edit string in compiled C program?

I have a strange question, I am wondering if there is a way to add/edit a string (or something that could be accessed via the C program (inside, ie not an external file)) after it has been compiled?
The purpose is to change a URL on an Windows program via PHP on Linux (obviously I cannot just compile it).
Many posix platforms come with the program strings which will read through a binary file searching for strings. There is an option to print out the offset of the strings. For example:
strings -td myexec
From there you can use a hex editor but the main problem is that you wouldn't be able to make a string bigger than it already is.
A Hex Editor is probably your best bet.
A hex editor will work, but you have to be careful not to alter the size of the executable. If the string happens to be in the .res file, you can use ResEdit.
There are specialized tools to modify existing executable files. A notable tool is
Resource Tuner, which can be used to edit all sorts of resources in an executable.
Another option is to use a text editor, like Hex Workshop, to edit the characters in the strings of an executable. However, bear in mind that with this method, you can only edit existing strings in an executable, and the replaced strings must have an equal or smaller length than the original ones, otherwise you'll end up modifying executable code.
As others have suggested, you can use a binary file editor (hex editor) to change the string in the executable file. You will want to embed into the string a marker (unique sequence of bytes) so that you can find the string in your file. And you will want to ensure that you are reading/writing the file at correct offsets.
As OP stated plans to use PHP on linux to rewrite the file, you will need to use fseek to position the file pointer to the starting location of this URL string, ensure you stay within the size of the string as you replace bytes, and then use fseek/rewind and fwrite to change the file.
This technique can be used to change a URL embedded in a binary file, and it can also be used to embed a license key into a binary, or to embed an application checksum value into a binary so that one can detect when the binary has changed.
As some posters have suggested, you may need to recompute a checksum or re-sign a binary file. A quick way to check for this behavior would be to compile two versions of your binary with different URL values. Then compare the files and see if there are differences other than in the URL values.
to properly edit a string in a compiled program you need to:
read in the files bytes
search the .rdata for strings and record the address of the first occurrence of the string
convert that address to the virtual address using some of the data in the file header
write a new .rdata onto the executable and write your new string into it recording its address and getting its virtual address.
search the .text section for references to the virtual address of the old string and replace it with the reference to your new string.
fortunately i made a program to do this on windows it only works on 32 bit programs here
Not unless you want to poke around in the generated hex or assembly code.

Converting WAV file audio input into plain ASCII characters

I am working on a project where we need to convert WAV file audio input into plain ASCII characters. The input WAV file will contain a single short alphanumeric code e.g. asdrty543 and each character will be pronounced one by one when you play the WAV file. Our requirement is that when a single character code is pronounced we need to convert it into it's equivalent ASCII code. The implementation will be done in C/C++ as un-managed Win32 DLL. We are open to use third party libraries. I am already googling for directions. However, I will really appreciate it if I can get directions/pointers from an experienced programmer who has already worked on similar requirement. Thank you in advance for your help.
ASCII characters like Az09 are only a portion of the ASCII Table. WAV files like any other file is stored and accessed in bytes.
1 byte has 256 different values. Therefore one can't simply convert bytes into Az09 since there are not enough Az09 characters.
You'll have to find a library which opens WAV files and creates the wave format for you. In relation to the wave's intensity and length, a chain of Az or Az09 characters can be produced.
I believe you're trying to convert the wave to a series of notes. That's possible too, using the same approach.

getline() text with UNIX formatting characters

I am writing a C++ program which reads lines of text from a .txt file. Unfortunately the text file is generated by a twenty-something year old UNIX program and it contains a lot of bizarre formatting characters.
The first few lines of the file are plain, English text and these are read with no problems. However, whenever a line contains one or more of these strange characters mixed in with the text, that entire line is read as characters and the data is lost.
The really confusing part is that if I manually delete the first couple of lines so that the very first character in the file is one of these unusual characters, then everything in the file is read perfectly. The unusual characters obviously just display as little ascii squiggles -arrows, smiley faces etc, which is fine. It seems as though a decision is being made automatically, without my knowledge or consent, based on the first line read.
Based on some googling, I suspected that the issue might be with the locale, but according to the visual studio debugger, the locale property of the ifstream object is "C" in both scenarios.
The code which reads the data is as follows:
//Function to open file at location specified by inFilePath, load and process data
int OpenFile(const char* inFilePath)
{
string line;
ifstream codeFile;
//open text file
codeFile.open(inFilePath,ios::in);
//read file line by line
while ( codeFile.good() )
{
getline(codeFile,line);
//check non-zero length
if (line != "")
ProcessLine(&line[0]);
}
//close line
codeFile.close();
return 1;
}
If anyone has any suggestions as to what might be going on or how to fix it, they would be very welcome.
From reading about your issues it sounds like you are reading in binary data, which will cause getline() to throw out content or simply skip over the line.
You have a couple of choices:
If you simply need lines from the data file you can first sanitise them by removing all non-printable characters (that is the "official" name for those weird ascii characters). On UNIX a tool such as strings would help you with that process.
You can off course also do this programmatically in your code by simply reading in X amount of data, storing it in a string, and then removing those characters that fall outside of the standard ASCII character range. This will most likely cause you to lose any unicode that may be stored in the file.
You change your program to understand the format and basically write a parser that allows you to parse the document in a more sane way.
If you can, I would suggest trying solution number 1, simply to see if the results are sane and can still be used. You mention that this is medical data, do you per-chance know what file format this is? If you are trying to find out and have access to a unix/linux machine you can use the utility file and maybe it can give you a clue (worst case it will tell you it is simply data).
If possible try getting a "clean" file that you can post the hex dump of so that we can try to provide better help than that what we are currently providing. With clean I mean that there is no personally identifying information in the file.
For number 2, open the file in binary mode. You mentioned using Windows, binary and non-binary files in std::fstream objects are handled differently, whereas on UNIX systems this is not the case (on most systems, I'm sure I'll get a comment regarding the one system that doesn't match this description).
codeFile.open(inFilePath,ios::in);
would become
codeFile.open(inFilePath, ios::in | ios::binary);
Instead of getline() you will want to become intimately familiar with .read() which will allow unformatted operations on the ifstream.
Reading will be like this:
// This code has not been tested!
char input[1024];
codeFile.read(input, 1024);
int actual_read = codeFile.gcount();
// Here you can process input, up to a maximum of actual_read characters.
//ProcessLine() // We didn't necessarily read a line!
ProcessData(input, actual_read);
The other thing as mentioned is that you can change the locale for the current stream and change the separator it considers a new line, maybe this will fix your issue without requiring to use the unformatted operators:
imbue the stream with a new locale that only knows about the newline. This method may or may not let your getline() function without issues.

Incorporating text files in applications?

Is there anyway I can incorporate a pretty large text file (about 700KBs) into the program itself, so I don't have to ship the text files together in the application directory ? This is the first time I'm trying to do something like this, and I have no idea where to start from.
Help is greatly appreciated (:
Depending on the platform that you are on, you will more than likely be able to embed the file in a resource container of some kind.
If you are programming on the Windows platform, then you might want to look into resource files. You can find a basic intro here:
http://msdn.microsoft.com/en-us/library/y3sk7e6b.aspx
With more detailed information here:
http://msdn.microsoft.com/en-us/library/zabda143.aspx
Have a look at the xxd command and its -include option. You will get a buffer and a length variable in a C formatted file.
If you can figure out how to use a resource file, that would be the preferred method.
It wouldn't be hard to turn a text file into a file that can be compiled directly by your compiler. This might only work for small files - your compiler might have a limit on the size of a single string. If so, a tiny syntax change would make it an array of smaller strings that would work just fine.
You need to convert your file by adding a line at the top, enclosing each line within quotes, putting a newline character at the end of each line, escaping any quotes or backslashes in the text, and adding a semicolon at the end. You can write a program to do this, or it can easily be done in most editors.
This is my example document:
"Four score and seven years ago,"
can be found in the file c:\quotes\GettysburgAddress.txt
Convert it to:
static const char Text[] =
"This is my example document:\n"
"\"Four score and seven years ago,\"\n"
"can be found in the file c:\\quotes\\GettysburgAddress.txt\n"
;
This produces a variable Text which contains a single string with the entire contents of your file. It works because consecutive strings with nothing but whitespace between get concatenated into a single string.