Store file in unsigned char array and print it - c++

I've used the code below to read a binary file (in my case .docx file) and storing it in unsigned char array instead of just char (took reference from here Reading and writing binary file)
#include <fstream>
#include <iterator>
#include <vector>
int main()
{
std::ifstream input("C:\\test.docx", std::ios::binary);
std::vector<unsigned char> buffer((std::istreambuf_iterator<unsigned char>(input)),
(std::istreambuf_iterator<unsigned char>()));
}
Now I got two questions.
First thing I wanna know, is this a correct way to read a .docx file in an unsigned char array? Or are there better options available?
Secondly, I need to print the contents of file that are read in the unsigned char array, just to verify if it has correctly read the file or not. How can that be achieved?

That is an OK way if you're fine having the whole file in memory. If you want to read the file in parts, you should iterate over it. A use-case for it would be for transmitting it over the network - there, you won't need the whole file in memory.
About printing the file, it's possible to print the bytes read, for example, like this:
#include <fstream>
#include <iterator>
#include <vector>
#include <iostream>
#include <iomanip>
int main()
{
std::ifstream input("C:\\test.docx", std::ios::binary);
std::vector<unsigned char> buffer((std::istreambuf_iterator<unsigned char>(input)),
(std::istreambuf_iterator<unsigned char>()));
std::cout << std::hex;
for (unsigned char b : buffer)
std::cout << "0x" << std::setfill('0') << std::setw(2) << (int)b << " ";
std::cout << std::dec << std::endl;
}
If you meant printing the contents of the file to see some familiar text, that's not going to work directly. docx files use the Open XML File Format, which first of all, makes them a zip file. Inside the zip file, you will find XML representations of the data in the document, which are readable.

Related

SIGABRT on std::ifstream close

1I am currently working on a project of creating my own game in OpenGL. My problem is right now, that if I read a file, that my function reading that file results in a SIGABRT, because of something inside the std::ifstream deconstructor (more specifically in "std::basic_ifstream<char, std::char_traits<char> >::~basic_ifstream()"). This function previously worked for me, but suddenly stopped working.
My Goal is simple: A reliable implementation for reading a file to a char*. Multi threading is currently not my concern.
Here is my implementation of the file reading function.
It takes in a path, and should write the content of the file at that path into the out parameter.
#include <fstream>
#include <iostream>
#include <sstream>
#include <string>
#include <cstring>
#include <cassert>
#include "Utility.h"
char * Utility::readFile(const char* path,char*& out){
#ifndef NDEBUG
std::cout<<"Getting file: "<<path<<"\n";
#endif
// Open the file, but freak out if not valid.
std::ifstream file=std::ifstream(path);
assert(file.good());
if(!file.good())
{
throw std::runtime_error((std::string)"Couldn't open file for loading: "+path);
}
// Read the file contents into a char buffer.
std::stringstream buffer;buffer << file.rdbuf();
std::string fileContentsStr = buffer.str();
out = new char[fileContentsStr.size()];
strcpy(out,fileContentsStr.c_str());
return out;
}
My code is located at C0D3-M4513R/OpenGlGame.
I already tried a minimal example, which is working and using the same compile flags (except linker flags). test.txt and test1.txt just contain some rubbish text generated by randomly hacking on my keyboard.
#include <cassert>
#include <fstream>
#include <iostream>
#include <sstream>
#include <string>
#include <cstring>
//This Function is the same as the one above!!!
char *readFile(const char *path, char *&out) {
#ifndef NDEBUG
std::cout << "Getting file: " << path << "\n";
#endif
// Open the file, but freak out if not valid.
std::ifstream file = std::ifstream(path);
assert(file.good());
if (!file.good()) {
throw std::runtime_error((std::string) "Couldn't open file for loading: " + path);
}
// Read the file contents into a char buffer.
std::stringstream buffer;
buffer << file.rdbuf();
//convert the stringstream to a string
std::string fileContentsStr = buffer.str();
//copy the contents of the string to a char array
out = new char[fileContentsStr.size()];
strcpy(out, fileContentsStr.c_str());
//return char array address (which should be the same as the start?)
return out;
}
int main() {
//The programm started!
std::cout << "Hello, World!" << std::endl;
//Define a space for the contents of the file to live
char *out;
//Read the contents of a file
out = readFile("test.txt", out);
//Print contents of the file
std::cout << out << std::endl;
char *out1;
//Read the contents of a file
out1 = readFile("test1.txt", out1);
//Print contents of the file
std::cout << out1 << std::endl;
return 0;
}
strcpy:
Copies the character string pointed to by src, including the null terminator, to the character array whose first element is pointed to by dest.
The behavior is undefined if the dest array is not large enough. The behavior is undefined if the strings overlap.
c_str:
Returns a pointer to a null-terminated character array with data equivalent to those stored in the string.
out = new char[fileContentsStr.size()];
strcpy(out,fileContentsStr.c_str());
You need to be careful when mixing std::string with c-strings, because a std::string is not null-terminated and does not count the nullterminator for its size. However, c_str does return a pointer to a null-terminated character array.
You are asking strcpy to write fileContentsStr.size()+1 (size + null terminator) into a char array with only fileContentsStr.size() elements.
PS: As mentioned in a comment, you should consider to return a std::string instead. You are using a raw owning pointer which is error prone and should be avoided. Either use a smart-pointer or let a std::string manage the char-array (thats what its made for actually ;).

multiple reading from folder

I need to read all files from the folder and write to buffer. All files have the same name except the last portion (file_0000.mdf, file_0001.mdf,....file_9999.mdf). How i can read all the files? testFolder contains all the files. If i go with for_loop,it will start with 0 but my test_files start from 0000 and so on. Also, i need file size of each individual test file. My logic is wrong, but i do not know how to fix it. Some updated code is also given below the first approach.
#include <iostream>
#include <fstream>
#include <string>
int main(){
std::string path="C:\\testFolder\\";
std::string constName="file_";
std::string lastName = ".mdf";
std::fstream InputStream;
std::string fileWithPath;
for (int i=0; i <9999;i++){
fileWithPath=path+constName+std::to_string(static_cast<long long>
(i)+lastName;
InputStream.open(fileWithPath,std::ios::binary);
long InputFileSize= InputStream.tellg();
}
return 0;
}
Short update by using boost/filesystem. Need comments on this approach.
#include <boost/filesystem.hpp>
#include <boost/range/iterator_range.hpp>
std::string SourceFolder="C:\\testFolder\\";
path mDirectory(SourceFolder);
std::cout<<"Directory includes the following files"
if(is_directory(mDirectory)){
for(auto testFile=mDirectory.begin();testFile!=mDirectory.end();testFile++){
std::cout<< testFile->string()<<std::endline;
}
Plain integers doesn't have leading zeros. To get leading zeros you need to use some other way of formatting your file-names. For example by using std::ostringstream and standard I/O manipulators like std::setw and std::setfill:
std::ostringstream oss;
oss << path << constName << std::setw(4) << std::setfill('0') << i << lastName;
fileWithPath = oss.str();
You may use FindFirstFile() and FindNextFile() functions scanning the files in the directory using wildcard, i.e. "C:\Data\file_???.mdf".
Returned WIN32_FIND_DATA will contain also a file size.
Take a look on the complete example "Listing the Files in a Directory".
Once a file is listed, you may read its content with ifstream as usually.

Calculating the info-hash of a torrent file

I'm using C++ to parse the info hash of a torrent file, and I am having trouble getting a "correct" hash value in comparison to this site:
http://i-tools.org/torrent
I have constructed a very simple toy example just to make sure I have the basics right.
I opened a .torrent file in sublime and stripped off everything except for the info dictionary, so I have a file that looks like this:
d6:lengthi729067520e4:name31:ubuntu-12.04.1-desktop-i386.iso12:piece lengthi524288e6:pieces27820:¡´E¶ˆØËš3í ..............(more unreadable stuff.....)..........
I read this file in and parse it with this code:
#include <string>
#include <sstream>
#include <iomanip>
#include <fstream>
#include <iostream>
#include <openssl/sha.h>
void printHexRep(const unsigned char * test_sha) {
std::cout << "CALLED HEX REP...PREPPING TO PRINT!\n";
std::ostringstream os;
os.fill('0');
os << std::hex;
for (const unsigned char * ptr = test_sha; ptr < test_sha + 20; ptr++) {
os << std::setw(2) << (unsigned int) *ptr;
}
std::cout << os.str() << std::endl << std::endl;
}
int main() {
using namespace std;
ifstream myFile ("INFO_HASH__ubuntu-12.04.1-desktop-i386.torrent", ifstream::binary);
//Get file length
myFile.seekg(0, myFile.end);
int fileLength = myFile.tellg();
myFile.seekg(0, myFile.beg);
char buffer[fileLength];
myFile.read(buffer, fileLength);
cout << "File length == " << fileLength << endl;
cout << buffer << endl << endl;
unsigned char datSha[20];
SHA1((unsigned char *) buffer, fileLength, datSha);
printHexRep(datSha);
myFile.close();
return 0;
}
Compile it like so:
g++ -o hashes info_hasher.cpp -lssl -lcrypto
And I am met with this output:
4d0ca7e1599fbb658d886bddf3436e6543f58a8b
When I am expecting this output:
14FFE5DD23188FD5CB53A1D47F1289DB70ABF31E
Does anybody know what I might be doing wrong here? Could the problem lie with the un-readability of the end of the file? Do I need to parse this as hex first or something?
Make sure you don't have a newline at the end of the file, you may also want to make sure it ends with an 'e'.
The info-hash of a torrent file is the SHA-1 hash of the info-section (in bencoded form) from the .torrent file. Essentially you need to decode the file (it's bencoded) and remember the byte offsets where the content of the value associated with the "info" key begins and end. That's the range of bytes you need to hash.
For example, if this is the torrent file:
d4:infod6:pieces20:....................4:name4:test12:piece lengthi1024ee8:announce27:http://tracker.com/announcee
You wan to just hash this section:
d6:pieces20:....................4:name4:test12:piece lengthi1024ee
For more information on bencoding, see BEP3.
SHA1 calculation is just as simple as what you've written, more or less. The error is probably in the data you're feeding it, if you get the wrong answer from the library function.
I can't speak to the torrent file prep work you've done, but I do see a few problems. If you'll revisit the SHA1 docs, notice the SHA1 function never requires its own digest length as a parameter. Next, you'll want to be quite certain the technique you're using to read the file's contents is faithfully sucking up the exact bytes, no translation.
A less critical style suggestion: make use of the third parameter to SHA1. General rule, static storage in the library is best avoided. Always prefer to supply your own buffer. Also, where you have a hard-coded 20 in your print function, that's a marvelous place for that digest length constant you've been flirting with.

Adding text and lines to the beginning of a file

I'd like to be able to add lines to the beginning of a file.
This program I am writing will take information from a user, and prep it to write to a file. That file, then, will be a diff that was already generated, and what is being added to the beginning is descriptors and tags that make it compatible with Debian's DEP3 Patch tagging system.
This needs to be cross-platform, so it needs to work in GNU C++ (Linux) and Microsoft C++ (and whatever Mac comes with)
(Related Threads elsewhere: http://ubuntuforums.org/showthread.php?t=2006605)
See trent.josephsen's answer:
You can't insert data at the start of a file on disk. You need to read the entire file into memory, insert data at the beginning, and write the entire thing back to disk. (This isn't the only way, but given the file isn't too large, it's probably the best.)
You can achieve such by using std::ifstream for the input file and std::ofstream for the output file. Afterwards you can use std::remove and std::rename to replace your old file:
#include <iostream>
#include <fstream>
#include <string>
#include <cstdio>
int main(){
std::ofstream outputFile("outputFileName");
std::ifstream inputFile("inputFileName");
outputFile << "Write your lines...\n";
outputFile << "just as you would do to std::cout ...\n";
outputFile << inputFile.rdbuf();
inputFile.close();
outputFile.close();
std::remove("inputFileName");
std::rename("outputFileName","inputFileName");
return 0;
}
Another approach which doesn't use remove or rename uses a std::stringstream:
#include <iostream>
#include <fstream>
#include <string>
#include <sstream>
int main(){
const std::string fileName = "outputFileName";
std::fstream processedFile(fileName.c_str());
std::stringstream fileData;
fileData << "First line\n";
fileData << "second line\n";
fileData << processedFile.rdbuf();
processedFile.close();
processedFile.open(fileName.c_str(), std::fstream::out | std::fstream::trunc);
processedFile << fileData.rdbuf();
return 0;
}

text from a file turned into a variable?

If I made a program that stores strings on a text file using the "list"-function(#include ), and then I want to copy all of the text from that file and call it something(so I can tell the program to type in all of the text I copied somewhere by using that one variable to refer to the text), do I use a string,double,int or what do I declare that chunk of text as?
I'm making the program using c++ in a simple console application.
Easier explanation for PoweRoy:
I have a text in a .txt file,I want to copy everything in it and then all this that I just copied, I want to call it "int text" or "string text" or whatever.But I don't know which one of those "int","string","double" etc. to use.
To take some pity on you, this is about the simplest C++ program that reads a file into memory and then does something with it:
#include <iostream>
#include <string>
#include <vector>
#include <fstream>
using namespace std;
int main() {
ifstream input( "foo.txt" );
if ( ! input.is_open() ) {
cerr << "could not open input file" << endl;
return 1;
}
vector <string> lines;
string line;
while( getline( input, line ) ) {
lines.push_back( line );
}
for ( unsigned int i = 0; i < lines.size(); i++ ) {
cout << (i+1) << ": " << lines[i] << "\n";
}
}
Broadly speaking, you are talking about the concept of Serialization - storing variable values in permanent storage like a file so you can reload them later. Have a look at that link to broaden your understanding.
Specifically, it sounds like you have arbitrary text in a file and want to refer to it in your program. In that case, string sounds appropriate. Unless the text in the file is intended to represent one single number, that seems most appropriate.
Note that if you have structured data (like a CSV file or XML file), a more complex data structure (e.g. a class, array of classes, etc) might be a better choice.
#include <iostream>
#include <fstream>
#include <string>
int main() {
// Open stream from file
std::ifstream ifs("foo.txt");
// Get file contents
std::string file_contents(
(std::istreambuf_iterator<char>(ifs)),
std::istreambuf_iterator<char>()
);
// Output string to terminal to see that it works
std::cout << file_contents;
}