Not reading all of file - c++

I'm trying to use a simple decryption algorithm to decrypt some files. The code I have so far will work for the most part, but stop reading after a few hundred bytes and export what it has.
Example, I have a .X file, it's 14.7KB. I run it through the program and it comes out as 643 bytes.
The current code is here: http://pastebin.com/aNNjYTzg
Since the code formatting for this site is driving me insane...
I just added in the algorithm to existing code, so most of it is not used.
EDIT:
cout << "Enter the name of your file to " << encrypt_decrypt[choice-1] << ": ";
cin >> filename;
in.open(filename);
getline(in,buffer);
void encryptdecrypt(const string buffer,const char map[],int len,string& newbuffer)
{
int i=0;
char t;
char code;
for (i=0;i<buffer.length();i++)
{
t=buffer[i];
(t += 251 - ((i * 14) & 255));
cout << "Buffer length: " << buffer.length() << endl;
cout << "newbuffer length: " << newbuffer.length() << endl;
newbuffer.push_back(t);
}
newbuffer.push_back('\n');
}
out << newbuffer;
EDITx2:
Reads the whole file, but only beginning parts are decrypted.
<?xml version="1.0"?>
<Materi
+"Òû%÷*&$'
ëÐ!ÐÎ&"# ëÐ"!Ý "
Ü"ÐÎÝ컸

So, given that the result of (t += 251 - ((i * 14) & 255)) is any value in the character range, you will need to read and write the file as a "binary" file, or the content won't "work".
This means that you need to use stream::read to read a block of data and stream::write to write data to the output file, and when you open the file, you need to supply ifstream::binary and ofstream::binary respecitvely as the mode.
A text input (when you don't specify binary in the mode) will interpret certain input bytes as end of file (stopping the input) and other input bytes as newline characters (which, if you use getline will be ignored on input). Since in your encrypted form, you don't use those characters to mean exactly those things, you should not use text-based input (the encrypted file isn't a text file).

Related

A 'stack overflow' error returns upon any array size I enter above 36603. How can I make a string capable of capturing my entire .txt file?

I need to create a string capable of holding the entire book 'The Hunger Games' which comes out to around 100500 words. My code can capture samples of the txt, but anytime I exceed a string size of 36603(tested), I receive a 'stack overflow' error.
I can successfully capture anything below 36603 elements and can output them perfectly.
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main()
{
int i;
char set[100];
string fullFile[100000]; // this will not execute if set to over 36603
ifstream myfile("HungerGames.txt");
if (myfile.is_open())
{
// saves 'i limiter' words from the .txt to fullFile
for (i = 0; i < 100000; i++) {
//each word is saparated by a space
myfile.getline(set, 100, ' ');
fullFile[i] = set;
}
myfile.close();
}
else cout << "Unable to open file";
//prints 'i limiter' words to window
for (i = 0; i < 100000; ++i) {
cout << fullFile[i] << ' ';
}
What is causing the 'stack overflow' and how can I successfully capture the txt? I will later be doing a word counter and word frequency counter, so I need it in "word per element" form.
There's a limit on how much stack is used in a function; Use std::vector instead.
More here and here. The default in Visual studio is 1MB (more info here) and you can change it with /F, but this is a bad idea generally.
My system is Lubuntu 18.04, with g++ 7.3. The following snippet shows some "implementation details" of my system, and how to report them on yours. It would help you to understand what your system provides ...
void foo1()
{
int i; // Lubuntu
cout << "\n sizeof(i) " << sizeof(i) << endl; // 4 bytes
char c1[100];
cout << "\n sizeof(c1) " << sizeof(c1) << endl; // 100 bytes
string s1; // empty string
cout << "\n s1.size() " << s1.size() // 0 bytes
<< " sizeof(s1) " << sizeof(s1) << endl; // 32 bytes
s1 = "1234567890"; // now has 10 chars
cout << "\n s1.size() " << s1.size() // 10 bytes
<< " sizeof(s1) " << sizeof(s1) << endl; // 32 bytes
string fullFile[100000]; // this is an array of 100,000 strings
cout << "\n sizeof(fullFile) " // total is vvvvvvvvv
<< sops.digiComma(sizeof(fullFile)) << endl; // 3,200,000 bytes
uint64_t totalChars = 0;
for( auto ff : fullFile ) totalChars += ff.size();
cout << "\n total chars in all strings " << totalChars << endl;
}
What is causing the 'stack overflow' and how can I successfully
capture the txt?
The fullFile array is an unfortunate choice ... because each std::string, even when empty, consumes 32 bytes of automatic memory (~stack), for a total of 3,200,000 bytes, and this is with no data in the strings! This will stack overflow your system when the stack is smaller than the automatic var space.
On Lubuntu the default automatic-memory size (lately) is 10 M Bytes, so not a problem for me. But you will have to check on what your version of your target os defaults to. I think Windows defaults down near 1 M Byte. (Sorry, I don't know how to check Windows automatic-memory size.)
How can I make a string capable of capturing my entire .txt file.
The answer is -- you don't need to make your own. (unless you have some unstated requirement)
Also, you really should look at en.cppreference.com/w/cpp/string/basic_string/append".
In my 1st snippet above, you should take notice that the sizeof(string) reports 32 bytes, regardless of how many chars are in it.
Think on that a while ... if you put 1000 chars into a string, where do they go? The objects stays at 32 bytes! You might guess or read that the string object handles memory management on your behalf, and puts all characters into dynamic-memory (heap).
On my system, heap is about 4 G bytes. That's a lot more than stack.
In summary, every single std::string expands auto-magically, using heap, so if your text input will fit in heap, it will fit into '1 std::string'.
While browsing around in the cppreference, check out the 'string::reserve()' command.
Conclusion:
Any std::string you declare can auto-magically 'grow' to support your need, and will thus hold the entire text (if it will fit in memory).
Operationally, you simply get a line of text from the file, then append it to the single string, until the entire file is contained. You only need the one array, which std::string provides.
With this new idea ... I suggest you change fullFile from an array to a string.
string fullFile; // file will expand to handle append actions
// to the limit of available heap.
// open file ... check status
do {
myfile.getline(line); // fetch line of text up thru the line feed
// Note that getline does not put the \n into 'line'
// there are file state checks that should be done (perhaps here?)
// tbd - line += '\n';
// you may need the line feed in your fullFile string?
fullFile += line; // append the line
} while (!myfile.eof); // check for eof
// ... other file cleanup.
foo1() output on Lubuntu 18.04, g++ v7.3
sizeof(i) 4
sizeof(c1) 100
s1.size() 0 sizeof(s1) 32
s1.size() 10 sizeof(s1) 32
sizeof(fullFile) 3,200,000
total chars in all strings 0
Example slurp() :
string slurp(ifstream& sIn)
{
stringstream ss;
ss << sIn.rdbuf();
dtbAssert(!sIn.bad());
if(sIn.bad())
throw "\n DTB::slurp(sIn) 'ss << sIn.rdbuf()' is bad";
ss.clear(); // clear flags
return ss.str();
}

Need to convert char* to string or other type in order to run hash on it

I need to read in an mp3 file so that I can run the hash(). I do not need to parse the mp3 tag data out of this so I can just read the whole thing all together.
Currently I am using ifstream() to open the file in binary mode. I then get the size of the file, allocate enough space with a char* and read it all at once.
I know that when I run cout on this data I can only see "ID3 and some gibberish." I opened the mp3 file up in a hex editor and ID3 and the gibberish was what was at the beginning of the file. The next binary data I believe is being interpreted as end of line/string and does not print.
This is okay because I don't need to print it. I need to get the data in a format that I can run the Hash function on. Any ideas on a type I can convert it to that will not interpret the end of the file being a couple bytes in?
Here is code of what I have so far.
bool Sender::openSoundFile(){
streampos size;
soundSampleStream.open(soundFilePath.c_str(), ios::in|ios::binary|ios::ate);
if(!soundSampleStream.is_open()){
return false;
}
size = soundSampleStream.tellg();
cout << "Size of MP3: " << size << endl;
soundFileInMemory = new char [size];
soundSampleStream.seekg (0, ios::beg);
soundSampleStream.read(soundFileInMemory, size);
cout << "Error is: " << strerror(errno) << endl;
cout << "gcount: " << soundSampleStream.gcount() << endl;
soundSampleStream.close();
cout << soundFileInMemory << endl;
return true;
}
I get no error on reading the file and gcount() comes back with the correct numbers of bytes for the file.
Edit 1:
To add some more on this. The hash() seems to hash the char* and not the data being pointed at because the hash value changes on different program runs. This is why I need to convert to some other thing. I also don't think that a vector is supported by the c++11 hash().
std::string has a constructor that takes a char * and a size_t. See the fourth item in http://en.cppreference.com/w/cpp/string/basic_string/basic_string.
std::string file_contents(soundFileInMemory, size);
That will convert your char array to a string.

Trying to output everything inside an exe file

I'm trying to output the plaintext contents of this .exe file. It's got plaintext stuff in it like "Changing the code in this way will not affect the quality of the resulting optimized code." all the stuff microsoft puts into .exe files. When I run the following code I get the output of M Z E followed by a heart and a diamond. What am I doing wrong?
ifstream file;
char inputCharacter;
file.open("test.exe", ios::binary);
while ((inputCharacter = file.get()) != EOF)
{
cout << inputCharacter << "\n";
}
file.close();
I would use something like std::isprint to make sure the character is printable and not some weird control code before printing it.
Something like this:
#include <cctype>
#include <fstream>
#include <iostream>
int main()
{
std::ifstream file("test.exe", std::ios::binary);
char c;
while(file.get(c)) // don't loop on EOF
{
if(std::isprint(c)) // check if is printable
std::cout << c;
}
}
You have opened the stream in binary, which is good for the intended purpose. However you print every binary data as it is: some of thes characters are not printable, giving weird output.
Potential solutions:
If you want to print the content of an exe, you'll get more non-printable chars than printable ones. So one approach could be to print the hex value instead:
while ( file.get(inputCharacter ) )
{
cout << setw(2) << setfill('0') << hex << (int)(inputCharacter&0xff) << "\n";
}
Or you could use the debugger approach of displaying the hex value, and then display the char if it's printable or '.' if not:
while (file.get(inputCharacter)) {
cout << setw(2) << setfill('0') << hex << (int)(inputCharacter&0xff)<<" ";
if (isprint(inputCharacter & 0xff))
cout << inputCharacter << "\n";
else cout << ".\n";
}
Well, for the sake of ergonomy, if the exe file contains any real exe, you'd better opt for displaying several chars on each line ;-)
Binary file is a collection of bytes. Byte has a range of values 0..255. Printable characters that can be safely "printed" form a much narrower range. Assuming most basic ASCII encoding
32..63
64..95
96..126
plus, maybe, some higher than 128, if your codepage has them
see ascii table.
Every character that falls out of that range may, at least:
print out as invisible
print out as some weird trash
be in fact a control character that will change settings of your terminal
Some terminals support "end of text" character and will simply stop printing any text afterwards. Maybe you hit that.
I'd say, if you are interested only in text, then print only that printables and ignore others. Or, if you want everything, then maybe write them out in hex form instead?
This worked:
ifstream file;
char inputCharacter;
string Result;
file.open("test.exe", ios::binary);
while (file.get(inputCharacter))
{
if ((inputCharacter > 31) && (inputCharacter < 127))
Result += inputCharacter;
}
cout << Result << endl;
cout << "These are the ascii characters in the exe file" << endl;
file.close();

Writing/reading large vectors of data to binary file in c++

I have a c++ program that computes populations within a given radius by reading gridded population data from an ascii file into a large 8640x3432-element vector of doubles. Reading the ascii data into the vector takes ~30 seconds (looping over each column and each row), while the rest of the program only takes a few seconds. I was asked to speed up this process by writing the population data to a binary file, which would supposedly read in faster.
The ascii data file has a few header rows that give some data specs like the number of columns and rows, followed by population data for each grid cell, which is formatted as 3432 rows of 8640 numbers, separated by spaces. The population data numbers are mixed formats and can be just 0, a decimal value (0.000685648), or a value in scientific notation (2.687768e-05).
I found a few examples of reading/writing structs containing vectors to binary, and tried to implement something similar, but am running into problems. When I both write and read the vector to/from the binary file in the same program, it seems to work and gives me all the correct values, but then it ends with either a "segment fault: 11" or a memory allocation error that a "pointer being freed was not allocated". And if I try to just read the data in from the previously written binary file (without re-writing it in the same program run), then it gives me the header variables just fine but gives me a segfault before giving me the vector data.
Any advice on what I might have done wrong, or on a better way to do this would be greatly appreciated! I am compiling and running on a mac, and I don't have boost or other non-standard libraries at present. (Note: I am extremely new at coding and am having to learn by jumping in the deep end, so I may be missing a lot of basic concepts and terminology -- sorry!).
Here is the code I came up with:
# include <stdio.h>
# include <stdlib.h>
# include <string.h>
# include <fstream>
# include <iostream>
# include <vector>
# include <string.h>
using namespace std;
//Define struct for population file data and initialize one struct variable for reading in ascii (A) and one for reading in binary (B)
struct popFileData
{
int nRows, nCol;
vector< vector<double> > popCount; //this will end up having 3432x8640 elements
} popDataA, popDataB;
int main() {
string gridFname = "sample";
double dum;
vector<double> tempVector;
//open ascii population grid file to stream
ifstream gridFile;
gridFile.open(gridFname + ".asc");
int i = 0, j = 0;
if (gridFile.is_open())
{
//read in header data from file
string fileLine;
gridFile >> fileLine >> popDataA.nCol;
gridFile >> fileLine >> popDataA.nRows;
popDataA.popCount.clear();
//read in vector data, point-by-point
for (i = 0; i < popDataA.nRows; i++)
{
tempVector.clear();
for (j = 0; j<popDataA.nCol; j++)
{
gridFile >> dum;
tempVector.push_back(dum);
}
popDataA.popCount.push_back(tempVector);
}
//close ascii grid file
gridFile.close();
}
else
{
cout << "Population file read failed!" << endl;
}
//create/open binary file
ofstream ofs(gridFname + ".bin", ios::trunc | ios::binary);
if (ofs.is_open())
{
//write struct to binary file then close binary file
ofs.write((char *)&popDataA, sizeof(popDataA));
ofs.close();
}
else cout << "error writing to binary file" << endl;
//read data from binary file into popDataB struct
ifstream ifs(gridFname + ".bin", ios::binary);
if (ifs.is_open())
{
ifs.read((char *)&popDataB, sizeof(popDataB));
ifs.close();
}
else cout << "error reading from binary file" << endl;
//compare results of reading in from the ascii file and reading in from the binary file
cout << "File Header Values:\n";
cout << "Columns (ascii vs binary): " << popDataA.nCol << " vs. " << popDataB.nCol << endl;
cout << "Rows (ascii vs binary):" << popDataA.nRows << " vs." << popDataB.nRows << endl;
cout << "Spot Check Vector Values: " << endl;
cout << "Index 0,0: " << popDataA.popCount[0][0] << " vs. " << popDataB.popCount[0][0] << endl;
cout << "Index 3431,8639: " << popDataA.popCount[3431][8639] << " vs. " << popDataB.popCount[3431][8639] << endl;
cout << "Index 1600,4320: " << popDataA.popCount[1600][4320] << " vs. " << popDataB.popCount[1600][4320] << endl;
return 0;
}
Here is the output when I both write and read the binary file in the same run:
File Header Values:
Columns (ascii vs binary): 8640 vs. 8640
Rows (ascii vs binary):3432 vs.3432
Spot Check Vector Values:
Index 0,0: 0 vs. 0
Index 3431,8639: 0 vs. 0
Index 1600,4320: 25.2184 vs. 25.2184
a.out(11402,0x7fff77c25310) malloc: *** error for object 0x7fde9821c000: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug
Abort trap: 6
And here is the output I get if I just try to read from the pre-existing binary file:
File Header Values:
Columns (binary): 8640
Rows (binary):3432
Spot Check Vector Values:
Segmentation fault: 11
Thanks in advance for any help!
When you write popDataA to the file, you are writing the binary representation of the vector of vectors. However this really is quite a small object, consisting of a pointer to the actual data (itself a series of vectors, in this case) and some size information.
When it's read back in to popDataB, it kinda works! But only because the raw pointer that was in popDataA is now in popDataB, and it points to the same stuff in memory. Things go crazy at the end, because when the memory for the vectors is freed, the code tries to free the data referenced by popDataA twice (once for popDataA, and once again for popDataB.)
The short version is, it's not a reasonable thing to write a vector to a file in this fashion.
So what to do? The best approach is to first decide on your data representation. It will, like the ASCII format, specify what value gets written where, and will include information about the matrix size, so that you know how large a vector you will need to allocate when reading them in.
In semi-pseudo code, writing will look something like:
int nrow=...;
int ncol=...;
ofs.write((char *)&nrow,sizeof(nrow));
ofs.write((char *)&ncol,sizeof(ncol));
for (int i=0;i<nrow;++i) {
for (int j=0;j<ncol;++j) {
double val=data[i][j];
ofs.write((char *)&val,sizeof(val));
}
}
And reading will be the reverse:
ifs.read((char *)&nrow,sizeof(nrow));
ifs.read((char *)&ncol,sizeof(ncol));
// allocate data-structure of size nrow x ncol
// ...
for (int i=0;i<nrow;++i) {
for (int j=0;j<ncol;++j) {
double val;
ifs.read((char *)&val,sizeof(val));
data[i][j]=val;
}
}
All that said though, you should consider not writing things into a binary file like this. These sorts of ad hoc binary formats tend to live on, long past their anticipated utility, and tend to suffer from:
Lack of documentation
Lack of extensibility
Format changes without versioning information
Issues when using saved data across different machines, including endianness problems, different default sizes for integers, etc.
Instead, I would strongly recommend using a third-party library. For scientific data, HDF5 and netcdf4 are good choices which address all of the above issues for you, and come with tools that can inspect the data without knowing anything about your particular program.
Lighter-weight options include the Boost serialization library and Google's protocol buffers, but these address only some of the issues listed above.

Can not read enough data from a file when the file has enough data c++

I have this code in c++ ( it is after I did some tests to see why I can not read enough data from file, so it is not final code and I am looking to find why I am getting this result)
size_t readSize=629312;
_rawImageFile.seekg(0,ifstream::end);
size_t s=_rawImageFile.tellg();
char *buffer=(char*) malloc(readSize);
_rawImageFile.seekg(0);
int p=_rawImageFile.tellg();
_rawImageFile.read(buffer,readSize);
size_t extracted = _rawImageFile.gcount();
cout << "s="<< s <<endl;
cout << "p="<< p <<endl;
cout << "readsize="<< readSize<<endl;
cout << "extracted="<< extracted <<endl;
cout << "eof ="<< _rawImageFile.eofbit<<endl;
cout << "fail="<< _rawImageFile.failbit <<endl;
The output is as follow:
s=3493940224
p=0
readsize=629312
extracted=2085
eof =1
fail=2
As you can see the file size is 3493940224 and I am at the start of file (p=0) and I am trying to read 629312 bytes, but I can only read 2085?
What is the problem with this code? I did open this file in other methods and read some data out of it, but am using seekg to move pointer to the beginning of file.
The file was opened as binary.
edit 1
To find a solution, I put all code inside a function and here is it:
_config=config;
ifstream t_rawImageFile;
t_rawImageFile.open(rawImageFileName,std::ifstream::in || std::ios::binary );
t_rawImageFile.seekg (0);
size_t readSize=629312;
t_rawImageFile.seekg(0,ifstream::end);
size_t s=t_rawImageFile.tellg();
char *buffer=(char*) malloc(readSize);
t_rawImageFile.seekg(0);
size_t p=t_rawImageFile.tellg();
t_rawImageFile.read(buffer,readSize);
size_t x=t_rawImageFile.tellg();
size_t extracted = t_rawImageFile.gcount();
cout << "s="<< s <<endl;
cout << "p="<< p <<endl;
cout << "x="<< x <<endl;
cout << "readsize="<< readSize<<endl;
cout << "extracted="<< extracted <<endl;
cout << "eof ="<< t_rawImageFile.eof()<<endl;
cout << "fail="<< t_rawImageFile.fail() <<endl;
and the result is:
s=3493940224
p=0
x=4294967295
readsize=629312
extracted=2085
eof =1
fail=1
Interestingly, after read the file pointer moves to a very big value. is it possible that since the file size is very big, the application fails?
edit 2
Tested the same code with another file. the result is as follow:
s=2993007872
p=0
x=4294967295
readsize=629312
extracted=1859
eof =1
fail=1
What I can read from this test is that:
after read the file pointer moves to a big number which is always the same. The amount that it reads depend on file (!).
edit 3
After changing the size_t to fstream::pos_type the result is as follow:
s=2993007872
p=0
x=-1
readsize=629312
extracted=1859
eof =1
fail=1
Why file position goes to -1 after a read?
t_rawImageFile.open(rawImageFileName, std::ifstream::in || std::ios::binary );
...does not open the file in binary mode. Since || is the lazy or operator and std::ifstream::in is non zero, the whole expression has the value 1.
t_rawImageFile.open(rawImageFileName, std::ifstream::in | std::ios::binary );
...will surely work better.
You don't show the part where your file is being opened, but I'm pretty sure it is missing ios::binary to make sure the C runtime code doesn't interpret CTRL-Z (or CTRL-D) as end of file.
Change this line:
t_rawImageFile.open(rawImageFileName,std::ifstream::in || std::ios::binary );
into this:
t_rawImageFile.open(rawImageFileName,std::ifstream::in | std::ios::binary );