Opening files over 5 mb and storing them in an array - c++

I want to put each byte in a char array and rewrite the text file removing the first 100,000 characters.
int fs=0;
ifstream nm,nm1;
nm1.open("C:\\Dev-Cpp\\DCS\\Decom\\a.txt");
if(nm1.is_open())
{
nm1.seekg(0, ios::end );
fs = nm1.tellg();
}
nm1.close();
char ss[500000];
nm.open("C:\\Dev-Cpp\\DCS\\Decom\\a.txt");
nm.read(ss,fs-1);
nm.close();
ofstream om;
om.open("C:\\Dev-Cpp\\DCS\\Decom\\a.txt");
for(int i=100000;i<fs-1;i++){
om >> ss[i];
}
om.close();
Problem is i can't set the character array to a 5 million size. I tried using vector also
vector <char> ss (5000000);
int w=0;
ifstream in2("C:\\Dev-Cpp\\DCS\\Decom\\a.txt", ios::binary);
unsigned char c2;
while( in2.read((char *)&c2, 1) )
{
in2 >> ss[w];
w++;
}
Over here the size of w is almost half that of fs and a lot of characters are missing.
How to do it ?

In most implementations, char ss[5000000] tries allocating on the stack, and the size of the stack is limited as compared to the overall memory size. You can often allocate larger arrays on the heap than on the stack, like this:
char *ss = new char [5000000];
// Use ss as usual
delete[] ss; // Do not forget to delete
Note that if the file size fs is larger than 5000000, you will write past the end of the buffer. You should limit the amount of data that you read:
nm.read(ss,min(5000000,fs-1));

This part is not correct
while( in2.read((char *)&c2, 1) )
{
in2 >> ss[w];
w++;
}
bacause you first try to read one character into c2 and, if that succeeds, read another character into ss[w].
I'm not at all surprised if you lose about half the characters here!

The best way to solve your problem is to use the facilities of the standard library. That way, you also don't have to care about buffer overflows.
The following code is untested.
std::fstream file("C:\\Dev-Cpp\\DCS\\Decom\\a.txt", std::ios_base::in);
if (!file)
{
std::cerr << "could not open file C:\\Dev-Cpp\\DCS\\Decom\\a.txt for reading\n";
exit(1);
}
std::vector<char> ss; // do *not* give a size here
ss.reserve(5000000); // *expected* size
// if the file is too large, the capacity will automatically be extended
std::copy(std::istreambuf_iterator<char>(file), std::istreambuf_iterator<char>(),
std::back_inserter(ss));
file.close();
file.open("C:\\Dev-Cpp\\DCS\\Decom\\a.txt", std::ios_base::out | std::ios_base::trunc);
if (!file)
{
std::cerr << "could not open C:\\Dev-Cpp\\DCS\\Decom\\a.txt for writing\n";
exit(1);
}
if (ss.size() > 100000) // only if the file actually contained more than 100000 characters
std::copy(ss.begin()+100000, ss.end(), std::ostreambuf_iterator<char>(file));
file.close();

Related

C++ - Read the bytes of any file into an unsigned char array

I have an assignment where I have to implement the Rijndael Algorithm for AES-128 Encryption. I have the algorithm operational, but I do not have proper file input/output.
The assignment requires us to use parameters passed in from the command line. In this case, the parameter will be the file path to the particular file the user wishes to encrypt.
My problem is, I am lost as to how to read in the bytes of a file and store these bytes inside an array for later encryption.
I have tried using ifstream and ofstream to open, read, write, and close the files and it works fine for plaintext files. However, I need the application to take ANY file as input.
When I tried my method of using fstream with a pdf as input, it would crash my program. So, I now need to learn how to take the bytes of a file, store them inside an unsigned char array for Encryption, and then store them inside another file. This process of encryption and storage of ciphertext needs to occur in 16 byte intervals.
The below implementation is my first attempt to read files in binary mode and then write whatever was read in another file also in binary mode.
The output is readable in a hex reader.
int main(int argc, char* argv[])
{
if (argc < 2)
{
cerr << "Use: " << argv[0] << " SOURCE_FILEPATH" << endl << "Ex. \"C\\Users\\Anthony\\Desktop\\test.txt\"\n";
return 1;
}
// Store the Command Line Parameter inside a string
// In this case, a filepath.
string src_fp = argv[1];
string dst_fp = src_fp.substr(0, src_fp.find('.', 0)) + ".enc";
// Open the filepaths in binary mode
ifstream srcF(src_fp, ios::in | ios::binary);
ofstream dstF(dst_fp, ios::out | ios::binary);
// Buffer to handle the input and output.
unsigned char fBuffer[16];
srcF.seekg(0, ios::beg);
while (!srcF.eof())
{
srcF >> fBuffer;
dstF << fBuffer << endl;
}
dstF.close();
srcF.close();
}
The code implementation does not work as intended.
Any direction on how to solve my dilemma would be greatly appreciated.
Like you, I really struggled to find a way to read a binary file into a byte array in C++ that would output the same hex values I see in a hex editor. After much trial and error, this seems to be the fastest way to do so without extra casts.
It would go faster without the counter, but then sometimes you end up with wide chars. To truly get one byte at a time I haven't found a better way.
By default it loads the entire file into memory, but only prints the first 1000 bytes.
string Filename = "BinaryFile.bin";
FILE* pFile;
pFile = fopen(Filename.c_str(), "rb");
fseek(pFile, 0L, SEEK_END);
size_t size = ftell(pFile);
fseek(pFile, 0L, SEEK_SET);
uint8_t* ByteArray;
ByteArray = new uint8_t[size];
if (pFile != NULL)
{
int counter = 0;
do {
ByteArray[counter] = fgetc(pFile);
counter++;
} while (counter <= size);
fclose(pFile);
}
for (size_t i = 0; i < 800; i++) {
printf("%02X ", ByteArray[i]);
}

How can I store a string(from a file with n number of lines) in a dynamic array? C++

Newb here... taking a C++ class on data structures. I am making a program that takes a list of chores from a text file and stores them in a dynamic array.
//In header/ In class:
private:
/* var to keep len of list */
int len = 99; // Not sure what to set this to or if I need to even set it.
/* add appropriate data structure to store list */
string *arr = new string[len];
//In .cpp:
ListOfChores::ListOfChores(string fileName) {
ifstream file(fileName, ifstream::in);
string line;
if (file.is_open()) //Checking if the file can be opened
{
while (!file.eof()) // To get all the lines.
{
getline(file, line); // Gets a single line
arr[len] = line; // Store a line in the array
len++; // Increases the array size by one
}
file.close(); // Closes file
}
else cout << "Unable to open file" << endl; // Gives error if the file can't be opened
}
But I am getting an error for storing a line in the array. It says "Access violation reading location." There is another function executed in the main.cpp for printing the lines.
You overrun your array buffer at once because len is already 99. You should have a notion of capacity and length. Capacity is the maximum you can store without reallocating, and length is the actual number of data lines.
Please avoid this C-style array in C++ code. Use vector, which has been around for at least 20 years (STL) if I'm not mistaken.
(you're not a lost cause, you are already using std::string :))
Check this:
#include <vector>
//In header/ In class:
private:
/* add appropriate data structure to store list */
std::vector<string> arr; // define a vector
//In .cpp:
ListOfChores::ListOfChores(string fileName) {
ifstream file(fileName, ifstream::in);
string line;
if (file.is_open()) //Checking if the file can be opened
{
while (getline(file, line))
{
arr.push_back(line);
}
file.close(); // Closes file
}
else cout << "Unable to open file" << endl; // Gives error if the file can't be opened
}
Now arr.size() holds the number of lines, it is no longer limited to 99 lines but to the max. program memory capacity. You can still access line number 13 by arr[12] or arr.at(12) for boundary checked access.
proper way to iterate through it (C++11) for instance to print all lines:
for (auto s : arr)
{
std::cout << s << std::endl;
}
Now, if you REALLY have to use an array, you can emulate/mimic what vector does (well, not as performant I'm sure, but does the job):
private:
int len=0;
int capacity=100;
string *arr = new string[capacity];
now in the code, just before inserting (untested, but the idea is right):
if (len>=capacity)
{
string *narr = new string[capacity+100];
for (int i = 0; i < capacity; i++)
{
narr[i] = arr[i];
}
delete [] arr;
arr = narr;
capacity += 100; // growth
}
(you cannot use realloc or memcpy because you're handling objects in the arrays)

ifstream::read keeps returning incorrect value

I am currently working my way through teaching myself how to work with files in c++, and I am having a good bit of difficulty extracting binary information from files.
My code:
std::string targetFile = "simplehashingfile.txt";
const char* filename = targetFile.c_str();
std::ifstream file;
file.open( filename, std::ios::binary | std::ios::in );
file.seekg(0, std::ios::end); // go to end of file
std::streamsize size = file.tellg(); // get size of file
std::vector<char> buffer(size); // create vector of file size bytes
file.read(buffer.data(), size); // read file into buffer vector
int totalread = file.gcount();
// Check that data was read
std::cout<<"total read: " << totalread << std::endl;
// check buffer:
std::cout<<"from buffer vector: "<<std::endl;
for (int i=0; i<size; i++){
std::cout << buffer[i] << std::endl;
}
std::cout<<"\n\n";
The "simplehashingfile.txt" file only contains 50 bytes of normal text. The size is correctly determined to be 50 bytes, but gcount returns 0 chars read, and the buffer output is (understandably from the gcount) a 50 line list of nothing.
For the life of me I cannot figure out where I went wrong! I made this test code earlier:
// Writing binary to file
std::ofstream ofile;
ofile.open("testbinary", std::ios::out | std::ios::binary);
uint32_t bytes4 = 0x7FFFFFFF; // max 32-bit value
uint32_t bytes8 = 0x12345678; // some 32-bit value
ofile.write( (char*)&bytes4 , 4 );
ofile.write( (char*)&bytes8, 4 );
ofile.close();
// Reading from file
std::ifstream ifile;
ifile.open("testbinary", std::ios::out | std::ios::binary);
uint32_t reading; // variable to read data
uint32_t reading2;
ifile.read( (char*)&reading, 4 );
ifile.read( (char*)&reading2, 4 );
std::cout << "The file contains: " << std::hex << reading << std::endl;
std::cout<<"next 4 bytes: "<< std::hex << reading2 << std::endl;
And that test code wrote and read perfectly. Any idea what I am doing wrong? Thank you to anyone who can point me in the right direction!
You never reset the file back to the beginning when you read from it
std::streamsize size = file.tellg(); //<- goes to the end of the file
std::vector<char> buffer(size); // create vector of file size bytes
file.read(buffer.data(), size); //<- now we read from the end of the file which will read nothing
int totalread = file.gcount();
You need to call seekg() again and reset the file pointer back to the beginning. To do that use
fille.seekg(0, std::ios::beg);
before
file.read(buffer.data(), size);
It would be worth to return to the begin of the file, before trying to read:
file.seekg(0, std::ios::beg)
I think the problem is that you do a seek to the end to get the file size, but don't seek back to the beginning before trying to read the file.

Read file, line by line, and store values into a array/string?

I have learned my lesson, so i will be short, and to the subiect.
I need a function, in my class, that can read a file line by line, and store them into a array/string so i can use it.
I have the following example( please don`t laugh, i am a begginer):
int CMYCLASS::LoadLines(std::string Filename)
{
std::ifstream input(Filename, std::ios::binary | ios::in);
input.seekg(0, ios::end);
char* title[1024];
input.read((char*)title, sizeof(int));
// here what ?? -_-
input.close();
for (int i = 0; i < sizeof(title); i++)
{
printf(" %.2X ";, title[i]);
}
printf("\");
return 0;
}
I'm not sure exactly what your are asking.
However - below is some code that reads a file line-by-line and stores the lines in a vector. The code also prints the lines - both as text lines and the integer value of each character. Hope it helps.
int main()
{
std::string Filename = "somefile.bin";
std::ifstream input(Filename, std::ios::binary | ios::in); // Open the file
std::string line; // Temp variable
std::vector<std::string> lines; // Vector for holding all lines in the file
while (std::getline(input, line)) // Read lines as long as the file is
{
lines.push_back(line); // Save the line in the vector
}
// Now the vector holds all lines from the file
// and you can do what ever you want it
// For instance we can print the lines
// Both as a line and as the hexadecimal value of every character
for(auto s : lines) // For each line in vector
{
cout << s; // Print it
for(auto c : s) // For each character in the line
{
cout << hex // switch to hexadecimal
<< std::setw(2) // print it in two
<< std::setfill('0') // leading zero
<< (unsigned int)c // cast to get the integer value
<< dec // back to decimal
<< " "; // and a space
}
cout << endl; // new line
}
return 0;
}
I do not laugh due to your original code - no way - I was also a beginner once. But your code is c-style code and contains a lot of bugs. So my advice is: Please use c++ style instead. For instance: never use the C-style string (i.e. char array). It is so error prone...
As you are a beginner (your own words :) let me explain a few things about your code:
char* title[1024];
This is not a string. It is 1024 pointers to characters which can also by 1024 pointers to c-style strings. However - you have not reserved any memory for holding the strings.
The correct way would be:
char title[1024][256]; // 1024 lines with a maximum of 256 chars per line
Here you must make sure that the input file has less than 1024 lines and that each line each less than 256 chars.
Code like that is very bad. What to do if the input file has 1025 lines?
This is where c++ helps you. Using std::string you don't need to worry about the length of the string. The std::string container will just adjust to the size you put into in to it.
The std::vector is like an array. But without a fixed size. So you can just keep adding to it and it will automatically adjust the size.
So c++ offers std::string and std::vector to help you to handle the dynamic size of the input file. Use it...
Good luck.

C++ reading text file by blocks

I really didn't find a satisfied answer at google and I/O in C++ is a little bit tricky. I would like to read text file by blocks into a vector if possible. Alas, I couldn't figure out how. I am not even sure, if my infinite loop will be break in all possibilities, because I/O is tricky. So, the best way I was able to figure out is this:
char buffer[1025]; //let's say read by 1024 char block
buffer[1024] = '\0';
std::fstream fin("index.xml");
if (!fin) {
std::cerr << "Unable to open file";
} else {
while (true) {
fin.read(buffer, 1024);
std::cout << buffer;
if (fin.eof())
break;
}
}
Please, note the second line with '\0'. Is it not odd? Can I do something better? Can I read the data into the vector instead of char array? Is it appropriate to read into vector directly?
Thanks for your answers.
PS. Reading by chunks have sense indeed. This code is short but I am storing it in cyclic buffer.
You should be fine doing the following
vector<char> buffer (1024,0); // create vector of 1024 chars with value 0
fin.read(&buffer[0], buffer.size());
The elements in a vector are guaranteed to be stored contiguously, so this should work - but you should ensure that the vector is never empty. I asked a similar question here recently - check the answers to that for specific details from the standard Can I call functions that take an array/pointer argument using a std::vector instead?
std::ifstream fin("index.xml");
std::stringstream buffer;
buffer << fin.rdbuf();
std::string result = buffer.str();
Exactly what you need.
Recently, I have encountered the same problem. I use read and gcount founction to solve it. It works well. Here is the code.
vector<string> ReadFileByBlocks(const char* filename)
{
vector<string> vecstr;
ifstream fin(filename, ios_base::in);
if (fin.is_open())
{
char* buffer = new char[1024];
while (fin.read(buffer, 1024))
{
string s(buffer);
vecstr.push_back(s);
}
// if the bytes of the block are less than 1024,
// use fin.gcount() calculate the number, put the va
// into var s
string s(buffer, fin.gcount());
vecstr.push_back(s);
delete[] buffer;
fin.close();
}
else
{
cerr << "Cannot open file:" << filename << endl;
}
return vecstr;
}