Read binary file and count specific number c++ - c++

Hey everyone I've been looking everywhere for insight on how to do this particular assignment. I saw something similar but it didn't have a clear explanation. I'm trying to read a bin file and count the number of times a specific number appears. I saw examples of this using a .txt file and it seemed very straight forward using getline. I tried to replicate the similar structure but using a binary file.
int main() {
int searching = 3;
int counter = 0;
unsigned char * memblock;
long long int size;
//open bin file
ifstream file;
file.open("threesData.bin", ios:: in | ios::binary | ios::ate);
//read bin file
if (file.is_open()) {
cout << "it opened\n";
size = file.tellg();
memblock = new unsigned char[size];
file.seekg(0, ios::beg);
file.read((char * ) memblock, size);
while (file.read((char * ) memblock, size)) {
for (int i = 0; i < size; i++) {
(int) memblock[i];
if (memblock[i] == searching) {
counter++;
}
}
}
}
file.close();
cout << "The number " << searching << " appears ";
cout << counter << " times!";
return 0;
}
When I run the program it's clear that it opens but it doesn't count the number I'm searching for. What am I doing wrong?

You seem to be thinking this through but here's how I would go about doing it.
Initialize a buffer with a sensible size.
Cast it to integers, so you can do array[size_t] syntax for simpler arithmetic.
Open the stream, and read while the stream is valid.
Convert the number of read bytes to the number of ints you would expect.
Increment the counter for each character you find that is valid.
Code
#include <fstream>
#include <iostream>
bool check_character(int value)
{
return value == 3;
}
int main(void)
{
// choose the size, cast a pointer as an int type, and initialize
// our counter
static constexpr size_t size = 4096;
char* buffer = new char[size];
int* ints = (int*) buffer;
size_t counter = 0;
// create our stream,
std::ifstream stream("file.bin", std::ios_base::binary);
while (stream) {
// keep reading while the stream is valid
stream.read(buffer, size);
auto count = stream.gcount();
// we only want to go to the last valid integer
// if we expect the file to be only integers,
// we could do `assert(count % sizeof(int) == 0);
// otherwise, we may have trailing characters
// if we have trailing characters, we may want to move them
// to the front of the buffer....
auto chars = count / sizeof(int); // floor division
for (size_t i = 0; i < chars; ++i) {
// false == 0, true == 1, so we can just add
// if the value is 3
counter += check_character(ints[i]);
}
}
std::cout << "Counter is: " << counter << std::endl;
delete[] buffer;
return 0;
}
As NeilButterworth points out, you could also use a vector. I don't really like this, but "meh".
#include <fstream>
#include <iostream>
#include <vector>
/* ellipsed lines */
int main(void)
{
/* ellipsed lines */
static constexpr size_t size = 4096;
std::vector<int> ints;
ints.resize(size / sizeof(int));
char* buffer = (char*) ints.data();
/* ellipsed lines */
/* ellipsed lines */
std::cout << "Counter is: " << counter << std::endl;
// no delete[]
return 0;
}

Related

C++ decoding LZ77-compressed data using std::fstream too slow

I have a function in my code which decodes a file compressed using the LZ77 algorithm. But on 15 MB input file decompression takes about 3 minutes (too slow). What's the reason of poor performance? On every step of the loop I read two or three bytes and get length, offset and next character. If offset is not zero I also have to move "offset" bytes back in output stream and read "length" bytes. Then I insert them to the end of the same stream before writing next character there.
void uncompressData(long block_size, unsigned char* data, fstream &file_out)
{
unsigned char* append;
append = new unsigned char[buf_length];
link myLink;
long cur_position = 0;
file_out.seekg(0, ios::beg);
cout << file_out.tellg() << endl;
int i=0;
myLink.length=-1;
while(i<(block_size-1))
{
if(myLink.length!=-1) file_out << myLink.next;
myLink.length = (short)(data[i] >> 4);
//cout << myLink.length << endl;
if(myLink.length!=0)
{
myLink.offset = (short)(data[i] & 0xF);
myLink.offset = myLink.offset << 8;
myLink.offset = myLink.offset | (short)data[i+1];
myLink.next = (unsigned char)data[i+2];
cur_position=file_out.tellg();
file_out.seekg(-myLink.offset,ios_base::cur);
if(myLink.length<=myLink.offset)
{
file_out.read((char*)append, myLink.length);
}
else
{
file_out.read((char*)append, myLink.offset);
int k=myLink.offset,j=0;
while(k<myLink.length)
{
append[k]=append[j];
j++;
if(j==myLink.offset) j=0;
k++;
}
}
file_out.seekg(cur_position);
file_out.write((char*)append, myLink.length);
i++;
}
else {
myLink.offset = 0;
myLink.next = (unsigned char)data[i+1];
}
i=i+2;
}
unsigned char hasOddSymbol = data[block_size-1];
if(hasOddSymbol==0x0) { file_out << myLink.next; }
delete[] append;
}
You could try doing it on a std::stringstream in memory instead:
#include <sstream>
void uncompressData(long block_size, unsigned char* data, fstream& out)
{
std::stringstream file_out; // first line in the function
// the rest of your function goes here
out << file_out.rdbuf(); // last line in the function
}

How to find a string in a binary file?

I want to find a specific string "fileSize" in a binary file.
The purpose of finding that string is to get 4 bytes that next to the string because that 4 bytes contains the size of data that I want to read it.
The content of the binary file like the following:
The same string in another position:
Another position:
The following is the function that writes the data to a file:
void W_Data(char *readableFile, char *writableFile) {
ifstream RFile(readableFile, ios::binary);
ofstream WFile(writableFile, ios::binary | ios::app);
RFile.seekg(0, ios::end);
unsigned long size = (unsigned long)RFile.tellg();
RFile.seekg(0, ios::beg);
unsigned int bufferSize = 1024;
char *contentsBuffer = new char[bufferSize];
WFile.write("fileSize:", 9);
WFile.write((char*)&size, sizeof(unsigned long));
while (!RFile.eof()) {
RFile.read(contentsBuffer, bufferSize);
WFile.write(contentsBuffer, bufferSize);
}
RFile.close();
WFile.close();
delete contentsBuffer;
contentsBuffer = NULL;
}
Also, the function that searches for the string:
void R_Data(char *readableFile) {
ifstream RFile(readableFile, ios::binary);
const unsigned int bufferSize = 9;
char fileSize[bufferSize];
while (RFile.read(fileSize, bufferSize)) {
if (strcmp(fileSize, "fileSize:") == 0) {
cout << "Exists" << endl;
}
}
RFile.close();
}
How to find a specific string in a binary file?
I think of using find() is an easy way to search for patterns.
void R_Data(const std::string filename, const std::string pattern) {
std::ifstream(filename, std::ios::binary);
char buffer[1024];
while (file.read(buffer, 1024)) {
std::string temp(buffer, 1024);
std::size_t pos = 0, old = 0;
while (pos != std::string::npos) {
pos = temp.find(pattern, old);
old = pos + pattern.length();
if ( pos != std::string::npos )
std::cout << "Exists" << std::endl;
}
file.seekg(pattern.length()-1, std::ios::cur);
}
}
How to find a specific string in a binary file?
If you don't know the location of the string in the file, I suggest the following:
Find the size of the file.
Allocate memory for being able to read everything in the file.
Read everything from the file to the memory allocated.
Iterate over the contents of the file and use std::strcmp/std::strncmp to find the string.
Deallocate the memory once you are done using it.
There are couple of problems with using
const unsigned int bufferSize = 9;
char fileSize[bufferSize];
while (RFile.read(fileSize, bufferSize)) {
if (strcmp(fileSize, "filesize:") == 0) {
cout << "Exists" << endl;
}
}
Problem 1
The strcmp line will lead to undefined behavior when fileSize actually contains the string "fileSize:" since the variable has enough space only for 9 character. It needs an additional element to hold the terminating null character. You could use
const unsigned int bufferSize = 9;
char fileSize[bufferSize+1] = {0};
while (RFile.read(fileSize, bufferSize)) {
if (strcmp(fileSize, "filesize:") == 0) {
cout << "Exists" << endl;
}
}
to take care of that problem.
Problem 2
You are reading the contents of the file in blocks of 9.
First call to RFile.read reads the first block of 9 characters.
Second call to RFile.read reads the second block of 9 characters.
Third call to RFile.read reads the third block of 9 characters. etc.
Hence, unless the string "fileSize:" is at the boundary of one such blocks, the test
if (strcmp(fileSize, "filesize:") == 0)
will never pass.

Writing and reading a file

I'm trying to use ifstream/ofstream to read/write but for some reason, the data gets corrupted along the way. Heres the read/write methods and the test:
void FileWrite(const char* FilePath, std::vector<char> &data) {
std::ofstream os (FilePath);
int len = data.size();
os.write(reinterpret_cast<char*>(&len), 4);
os.write(&(data[0]), len);
os.close();
}
std::vector<char> FileRead(const char* FilePath) {
std::ifstream is(FilePath);
int len;
is.read(reinterpret_cast<char*>(&len), 4);
std::vector<char> ret(len);
is.read(&(ret[0]), len);
is.close();
return ret;
}
void test() {
std::vector<char> sample(1024 * 1024);
for (int i = 0; i < 1024 * 1024; i++) {
sample[i] = rand() % 256;
}
FileWrite("C:\\test\\sample", sample);
auto sample2 = FileRead("C:\\test\\sample");
int err = 0;
for (int i = 0; i < sample.size(); i++) {
if (sample[i] != sample2[i])
err++;
}
std::cout << err << "\n";
int a;
std::cin >> a;
}
It writes the length correctly, reads it correctly and starts reading the data correctly but at some point(depending on input, usually at around the 1000'th byte) it goes wrong and everything to follow is wrong. Why is that?
for starter, you should open the file stream for binary read and write :
std::ofstream os (FilePath,std::ios::binary);
(edit: assuming char really means "signed char")
Do notice that regular char can hold up to CHAR_MAX/2 value, which is 127.
If the random number is bigger - the result will wrap around, resulting negative value. the stream will try to write this character as a text character, which is invalid value to write. binary format should at least fix this problem.
Also, you shouldn't close the stream yourself here, the destructor does it for you.
Two more simple points:
1) &(data[0]) should be just &data[0], the () are redundant
2) try keep the same convention. you write upper-camel-case for FilePath variable, but lower-camel-case for all the other variables.

c++ writing and reading objects to binary files

I'm trying to read an array object (Array is a class I've made using read and write functions to read and write from binary files. So far the write functions works but it won't read from the file properly for some reason. This is the write function :
void writeToBinFile(const char* path) const
{
ofstream ofs(path, ios_base::out | ios_base::app | ios_base::binary);
if (ofs.is_open())
{
ostringstream oss;
for (unsigned int i = 0; i < m_size; i++)
{
oss << ' ';
oss << m_data[i];
}
ofs.write(oss.str().c_str(), oss.str().size());
}
}
This is the read function :
void readFromBinFile(const char* path)
{
ifstream ifs(path, ios_base::in | ios_base::binary || ios_base::ate);
if (ifs.is_open())
{
stringstream ss;
int charCount = 0, spaceCount = 0;
ifs.unget();
while (spaceCount != m_size)
{
charCount++;
if (ifs.peek() == ' ')
{
spaceCount++;
}
ifs.unget();
}
ifs.get();
char* ch = new char[sizeof(char) * charCount];
ifs.read(ch, sizeof(char) * charCount);
ss << ch;
delete[] ch;
for (unsigned int i = 0; i < m_size; i++)
{
ss >> m_data[i];
m_elementCount++;
}
}
}
those are the class fields :
T* m_data;
unsigned int m_size;
unsigned int m_elementCount;
I'm using the following code to write and then read (1 execution for reading another for writing):
Array<int> arr3(5);
//arr3[0] = 38;
//arr3[1] = 22;
//arr3[2] = 55;
//arr3[3] = 7;
//arr3[4] = 94;
//arr3.writeToBinFile("binfile.bin");
arr3.readFromBinFile("binfile.bin");
for (unsigned int i = 0; i < arr3.elementCount(); i++)
{
cout << "arr3[" << i << "] = " << arr3[i] << endl;
}
The problem is now at the readFromBinFile function, it get stuck in an infinite loop and peek() returns -1 for some reason and I can't figure why.
Also note I'm writing to the binary file using spaces to make a barrier between each element so I would know to differentiate between objects in the array and also a space at the start of the writing to make a barrier between previous stored binary data in the file to the array binary data.
The major problem, in my mind, is that you write fixed-size binary data in variable-size textual form. It could be so much simpler if you just stick to pure binary form.
Instead of writing to a string stream and then writing that output to the actual file, just write the binary data directly to the file:
ofs.write(reinterpret_cast<char*>(m_data), sizeof(m_data[0]) * m_size);
Then do something similar when reading the data.
For this to work, you of course need to save the number of entries in the array/vector first before writing the actual data.
So the actual write function could be as simple as
void writeToBinFile(const char* path) const
{
ofstream ofs(path, ios_base::out | ios_base::binary);
if (ofs)
{
ofs.write(reinterpret_cast<const char*>(&m_size), sizeof(m_size));
ofs.write(reinterpret_cast<const char*>(&m_data[0]), sizeof(m_data[0]) * m_size);
}
}
And the read function
void readFromBinFile(const char* path)
{
ifstream ifs(path, ios_base::in | ios_base::binary);
if (ifs)
{
// Read the size
ifs.read(reinterpret_cast<char*>(&m_size), sizeof(m_size));
// Read all the data
ifs.read(reinterpret_cast<char*>(&m_data[0]), sizeof(m_data[0]) * m_size);
}
}
Depending on how you define m_data you might need to allocate memory for it before reading the actual data.
Oh, and if you want to append data at the end of the array (but why would you, in the current code you show, you rewrite the whole array anyway) you write the size at the beginning, seek to the end, and then write the new data.

Issues with flushing cout after getting file data

I am trying to write an implementation of rc4. I am reading in plaintext from a file using an ifstream. I noticed that it wasn't outputting at the end of the file, so I tried the various ways of explicitly clearing the buffers. No matter which way (using an endl, appending \n, calling cout.flush()) I try to flush the buffer, I get a segfault. As a sanity check, I replaced my code with an example from the web, which I also tested separately. It works if I put it in its own file and compile it (e.g., it prints out the contents of the file, doesn't segfault, and doesn't require any calls to flush() or endl to do so), but not in my code.
Here is the offending bit of code (which works fine outside of my code; its copied pretty much directly from cplusplus.com)
ifstream is;
is.open("plain");
char c;
while (is.good()) // loop while extraction from file is possible
{
c = is.get(); // get character from file
if (is.good())
cout << c;
// cout.flush();
}
is.close(); // close file*/
Here is the full code: (warning, lots of commented out code)
#include <iostream>
#include <fstream>
#include <string.h>
#include <vector>
using namespace std;
static char s[256], k[256];
//static char *i, *j;
void swap(int m, int n, char t[256]){
char tmp = t[m];
t[m] = t[n];
t[n] = tmp;
}
char getByte(){
static char i(0), j(0);
i = (i+1)%256;
j = (j + s[i])%256;
swap(i, j, s);
return s[(s[i]+s[j]) % 256];
}
int main(int argc, char ** argv){
/*string key = argv[1];*/
if(argc < 4){
cout << "Usage: \n rc4 keyfile plaintextfile outputfile" << endl;
return -1;
}
string key;
ifstream keyfile (argv[1]);
keyfile >> k;
cout << "Key = " << k << endl;
keyfile.close();
/*ifstream plaintextf;
plaintextf.open(argv[2]);*/
ofstream ciphertextf (argv[3]);
for(int q = 0; q < 256; q++){
s[q] = q;
}
int i, j;
for(int m = 0; m < 256; m++){
j = (j + s[m] + k[m % sizeof(k)])%256;
swap(m, j, s);
}
// vector<char> bytes(plaintext.begin(), plaintext.end());
// bytes.push_back('\0');
// vector<char>::iterator it = bytes.begin();
/* char pt;
while(plaintextf.good()){
pt = plaintextf.get();
if(plaintextf.good()){
cout << pt;
ciphertextf <<(char) (pt ^ getByte());
}
} */
ifstream is;
is.open("plain");
char c;
while (is.good()) // loop while extraction from file is possible
{
c = is.get(); // get character from file
if (is.good())
cout << c;
// cout.flush();
}
is.close(); // close file*/
/*// plaintextf.close();
ciphertextf.close();
keyfile.close();
*/
return 0;
}
Additionally, I think the second call to is.good() [ as in if(is.good()) ], would prevent the very last character of the file from being copied.