How to find a string in a binary file? - c++

I want to find a specific string "fileSize" in a binary file.
The purpose of finding that string is to get 4 bytes that next to the string because that 4 bytes contains the size of data that I want to read it.
The content of the binary file like the following:
The same string in another position:
Another position:
The following is the function that writes the data to a file:
void W_Data(char *readableFile, char *writableFile) {
ifstream RFile(readableFile, ios::binary);
ofstream WFile(writableFile, ios::binary | ios::app);
RFile.seekg(0, ios::end);
unsigned long size = (unsigned long)RFile.tellg();
RFile.seekg(0, ios::beg);
unsigned int bufferSize = 1024;
char *contentsBuffer = new char[bufferSize];
WFile.write("fileSize:", 9);
WFile.write((char*)&size, sizeof(unsigned long));
while (!RFile.eof()) {
RFile.read(contentsBuffer, bufferSize);
WFile.write(contentsBuffer, bufferSize);
}
RFile.close();
WFile.close();
delete contentsBuffer;
contentsBuffer = NULL;
}
Also, the function that searches for the string:
void R_Data(char *readableFile) {
ifstream RFile(readableFile, ios::binary);
const unsigned int bufferSize = 9;
char fileSize[bufferSize];
while (RFile.read(fileSize, bufferSize)) {
if (strcmp(fileSize, "fileSize:") == 0) {
cout << "Exists" << endl;
}
}
RFile.close();
}
How to find a specific string in a binary file?

I think of using find() is an easy way to search for patterns.
void R_Data(const std::string filename, const std::string pattern) {
std::ifstream(filename, std::ios::binary);
char buffer[1024];
while (file.read(buffer, 1024)) {
std::string temp(buffer, 1024);
std::size_t pos = 0, old = 0;
while (pos != std::string::npos) {
pos = temp.find(pattern, old);
old = pos + pattern.length();
if ( pos != std::string::npos )
std::cout << "Exists" << std::endl;
}
file.seekg(pattern.length()-1, std::ios::cur);
}
}

How to find a specific string in a binary file?
If you don't know the location of the string in the file, I suggest the following:
Find the size of the file.
Allocate memory for being able to read everything in the file.
Read everything from the file to the memory allocated.
Iterate over the contents of the file and use std::strcmp/std::strncmp to find the string.
Deallocate the memory once you are done using it.
There are couple of problems with using
const unsigned int bufferSize = 9;
char fileSize[bufferSize];
while (RFile.read(fileSize, bufferSize)) {
if (strcmp(fileSize, "filesize:") == 0) {
cout << "Exists" << endl;
}
}
Problem 1
The strcmp line will lead to undefined behavior when fileSize actually contains the string "fileSize:" since the variable has enough space only for 9 character. It needs an additional element to hold the terminating null character. You could use
const unsigned int bufferSize = 9;
char fileSize[bufferSize+1] = {0};
while (RFile.read(fileSize, bufferSize)) {
if (strcmp(fileSize, "filesize:") == 0) {
cout << "Exists" << endl;
}
}
to take care of that problem.
Problem 2
You are reading the contents of the file in blocks of 9.
First call to RFile.read reads the first block of 9 characters.
Second call to RFile.read reads the second block of 9 characters.
Third call to RFile.read reads the third block of 9 characters. etc.
Hence, unless the string "fileSize:" is at the boundary of one such blocks, the test
if (strcmp(fileSize, "filesize:") == 0)
will never pass.

Related

Handling extra bytes in huffman compression/decompression

I have a program that produces a Huffman tree based on ASCII character frequency read in a text input file. The Huffman codes are stored in a string array of 256 elements, empty string if the character is not read. This program also then encodes and compresses an output file and currently has some functionality in decompression and decoding.
In summary, my program takes a input file compresses and encodes an output file, closes the output file and opens the encoding as an input file, and takes a new output file that is supposed to have a decoded message identical to the original text input file.
My problem is that in my test run while compressing I notice that I have 3 extra bytes and in turn when I decompress and decode my encoded file, these 3 extra bytes are being decoded to my output file. Depending on the amount of text in the original input file, my other tests output these extra bytes.
My research has let me to a few suggestions such as making the first 8 bytes of your encoded output file the 64 bits of an unsigned long long that give the number of bytes in the file or using a psuedo-EOF but I am stuck on how I would go about handling it and which of the two is a smart way to handle it given the code I have already written or if either is a smart way at all?
Any guidance or solution to this problem is appreciated.
(For encodedOutput function, fileName is the input file parameter, fileName2 is the output file parameter)
(For decodeOutput function, fileName2 is the input file parameter, fileName 3 is output file parameter)
code[256] is a parameter for both of these functions and holds the Huffman code for each unique character read in the original input file, for example, the character 'H' being read in the input file may have a code of "111" stored in the code array for code[72] at the time it is being passed to the functions.
freq[256] holds the frequency of each ascii character read or holds 0 if it is not in original input file.
void encodeOutput(const string & fileName, const string & fileName2, string code[256]) {
ifstream ifile; //to read file
ifile.open(fileName, ios::binary);
if (!ifile)//to check if file is open or not
{
die("Can't read again"); // function that exits program if can't open
}
ofstream ofile;
ofile.open(fileName2, ios::binary);
if (!ofile) {
die("Can't open encoding output file");
}
int read;
read = ifile.get(); //read one char from file and store it in int
char buffer = 0, bit_count = 0;
while (read != -1) {//run this loop until reached to end of file(-1)
for (unsigned b = 0; b < code[read].size(); b++) { // loop through bits (code[read] outputs huffman code)
buffer <<= 1;
buffer |= code[read][b] != '0';
bit_count++;
if (bit_count == 8) {
ofile << buffer;
buffer = 0;
bit_count = 0;
}
}
read = ifile.get();
}
if (bit_count != 0)
ofile << (buffer << (8 - bit_count));
ifile.close();
ofile.close();
}
void decodeOutput(const string & fileName2, const string & fileName3, string code[256], const unsigned long long freq[256]) {
ifstream ifile;
ifile.open(fileName2, ios::binary);
if (!ifile)
{
die("Can't read again");
}
ofstream ofile;
ofile.open(fileName3, ios::binary);
if (!ofile) {
die("Can't open encoding output file");
}
priority_queue < node > q;
for (unsigned i = 0; i < 256; i++) {
if (freq[i] == 0) {
code[i] = "";
}
}
for (unsigned i = 0; i < 256; i++)
if (freq[i])
q.push(node(unsigned(i), freq[i]));
if (q.size() < 1) {
die("no data");
}
while (q.size() > 1) {
node *child0 = new node(q.top());
q.pop();
node *child1 = new node(q.top());
q.pop();
q.push(node(child0, child1));
} // created the tree
string answer = "";
const node * temp = &q.top(); // root
for (int c; (c = ifile.get()) != EOF;) {
for (unsigned p = 8; p--;) { //reading 8 bits at a time
if ((c >> p & 1) == '0') { // if bit is a 0
temp = temp->child0; // go left
}
else { // if bit is a 1
temp = temp->child1; // go right
}
if (temp->child0 == NULL && temp->child1 == NULL) // leaf node
{
answer += temp->value;
temp = &q.top();
}
}
}
ofile << ans;
}
Because of integral promotion rules, (buffer << (8 - bit_count)) will be an integer expression, causing 4 bytes to be written. To only write one byte, you need to cast this to a char.
ofile << char(buffer << (8 - bit_count));

Read binary file and count specific number c++

Hey everyone I've been looking everywhere for insight on how to do this particular assignment. I saw something similar but it didn't have a clear explanation. I'm trying to read a bin file and count the number of times a specific number appears. I saw examples of this using a .txt file and it seemed very straight forward using getline. I tried to replicate the similar structure but using a binary file.
int main() {
int searching = 3;
int counter = 0;
unsigned char * memblock;
long long int size;
//open bin file
ifstream file;
file.open("threesData.bin", ios:: in | ios::binary | ios::ate);
//read bin file
if (file.is_open()) {
cout << "it opened\n";
size = file.tellg();
memblock = new unsigned char[size];
file.seekg(0, ios::beg);
file.read((char * ) memblock, size);
while (file.read((char * ) memblock, size)) {
for (int i = 0; i < size; i++) {
(int) memblock[i];
if (memblock[i] == searching) {
counter++;
}
}
}
}
file.close();
cout << "The number " << searching << " appears ";
cout << counter << " times!";
return 0;
}
When I run the program it's clear that it opens but it doesn't count the number I'm searching for. What am I doing wrong?
You seem to be thinking this through but here's how I would go about doing it.
Initialize a buffer with a sensible size.
Cast it to integers, so you can do array[size_t] syntax for simpler arithmetic.
Open the stream, and read while the stream is valid.
Convert the number of read bytes to the number of ints you would expect.
Increment the counter for each character you find that is valid.
Code
#include <fstream>
#include <iostream>
bool check_character(int value)
{
return value == 3;
}
int main(void)
{
// choose the size, cast a pointer as an int type, and initialize
// our counter
static constexpr size_t size = 4096;
char* buffer = new char[size];
int* ints = (int*) buffer;
size_t counter = 0;
// create our stream,
std::ifstream stream("file.bin", std::ios_base::binary);
while (stream) {
// keep reading while the stream is valid
stream.read(buffer, size);
auto count = stream.gcount();
// we only want to go to the last valid integer
// if we expect the file to be only integers,
// we could do `assert(count % sizeof(int) == 0);
// otherwise, we may have trailing characters
// if we have trailing characters, we may want to move them
// to the front of the buffer....
auto chars = count / sizeof(int); // floor division
for (size_t i = 0; i < chars; ++i) {
// false == 0, true == 1, so we can just add
// if the value is 3
counter += check_character(ints[i]);
}
}
std::cout << "Counter is: " << counter << std::endl;
delete[] buffer;
return 0;
}
As NeilButterworth points out, you could also use a vector. I don't really like this, but "meh".
#include <fstream>
#include <iostream>
#include <vector>
/* ellipsed lines */
int main(void)
{
/* ellipsed lines */
static constexpr size_t size = 4096;
std::vector<int> ints;
ints.resize(size / sizeof(int));
char* buffer = (char*) ints.data();
/* ellipsed lines */
/* ellipsed lines */
std::cout << "Counter is: " << counter << std::endl;
// no delete[]
return 0;
}

Writing and reading a file

I'm trying to use ifstream/ofstream to read/write but for some reason, the data gets corrupted along the way. Heres the read/write methods and the test:
void FileWrite(const char* FilePath, std::vector<char> &data) {
std::ofstream os (FilePath);
int len = data.size();
os.write(reinterpret_cast<char*>(&len), 4);
os.write(&(data[0]), len);
os.close();
}
std::vector<char> FileRead(const char* FilePath) {
std::ifstream is(FilePath);
int len;
is.read(reinterpret_cast<char*>(&len), 4);
std::vector<char> ret(len);
is.read(&(ret[0]), len);
is.close();
return ret;
}
void test() {
std::vector<char> sample(1024 * 1024);
for (int i = 0; i < 1024 * 1024; i++) {
sample[i] = rand() % 256;
}
FileWrite("C:\\test\\sample", sample);
auto sample2 = FileRead("C:\\test\\sample");
int err = 0;
for (int i = 0; i < sample.size(); i++) {
if (sample[i] != sample2[i])
err++;
}
std::cout << err << "\n";
int a;
std::cin >> a;
}
It writes the length correctly, reads it correctly and starts reading the data correctly but at some point(depending on input, usually at around the 1000'th byte) it goes wrong and everything to follow is wrong. Why is that?
for starter, you should open the file stream for binary read and write :
std::ofstream os (FilePath,std::ios::binary);
(edit: assuming char really means "signed char")
Do notice that regular char can hold up to CHAR_MAX/2 value, which is 127.
If the random number is bigger - the result will wrap around, resulting negative value. the stream will try to write this character as a text character, which is invalid value to write. binary format should at least fix this problem.
Also, you shouldn't close the stream yourself here, the destructor does it for you.
Two more simple points:
1) &(data[0]) should be just &data[0], the () are redundant
2) try keep the same convention. you write upper-camel-case for FilePath variable, but lower-camel-case for all the other variables.

c++ writing and reading objects to binary files

I'm trying to read an array object (Array is a class I've made using read and write functions to read and write from binary files. So far the write functions works but it won't read from the file properly for some reason. This is the write function :
void writeToBinFile(const char* path) const
{
ofstream ofs(path, ios_base::out | ios_base::app | ios_base::binary);
if (ofs.is_open())
{
ostringstream oss;
for (unsigned int i = 0; i < m_size; i++)
{
oss << ' ';
oss << m_data[i];
}
ofs.write(oss.str().c_str(), oss.str().size());
}
}
This is the read function :
void readFromBinFile(const char* path)
{
ifstream ifs(path, ios_base::in | ios_base::binary || ios_base::ate);
if (ifs.is_open())
{
stringstream ss;
int charCount = 0, spaceCount = 0;
ifs.unget();
while (spaceCount != m_size)
{
charCount++;
if (ifs.peek() == ' ')
{
spaceCount++;
}
ifs.unget();
}
ifs.get();
char* ch = new char[sizeof(char) * charCount];
ifs.read(ch, sizeof(char) * charCount);
ss << ch;
delete[] ch;
for (unsigned int i = 0; i < m_size; i++)
{
ss >> m_data[i];
m_elementCount++;
}
}
}
those are the class fields :
T* m_data;
unsigned int m_size;
unsigned int m_elementCount;
I'm using the following code to write and then read (1 execution for reading another for writing):
Array<int> arr3(5);
//arr3[0] = 38;
//arr3[1] = 22;
//arr3[2] = 55;
//arr3[3] = 7;
//arr3[4] = 94;
//arr3.writeToBinFile("binfile.bin");
arr3.readFromBinFile("binfile.bin");
for (unsigned int i = 0; i < arr3.elementCount(); i++)
{
cout << "arr3[" << i << "] = " << arr3[i] << endl;
}
The problem is now at the readFromBinFile function, it get stuck in an infinite loop and peek() returns -1 for some reason and I can't figure why.
Also note I'm writing to the binary file using spaces to make a barrier between each element so I would know to differentiate between objects in the array and also a space at the start of the writing to make a barrier between previous stored binary data in the file to the array binary data.
The major problem, in my mind, is that you write fixed-size binary data in variable-size textual form. It could be so much simpler if you just stick to pure binary form.
Instead of writing to a string stream and then writing that output to the actual file, just write the binary data directly to the file:
ofs.write(reinterpret_cast<char*>(m_data), sizeof(m_data[0]) * m_size);
Then do something similar when reading the data.
For this to work, you of course need to save the number of entries in the array/vector first before writing the actual data.
So the actual write function could be as simple as
void writeToBinFile(const char* path) const
{
ofstream ofs(path, ios_base::out | ios_base::binary);
if (ofs)
{
ofs.write(reinterpret_cast<const char*>(&m_size), sizeof(m_size));
ofs.write(reinterpret_cast<const char*>(&m_data[0]), sizeof(m_data[0]) * m_size);
}
}
And the read function
void readFromBinFile(const char* path)
{
ifstream ifs(path, ios_base::in | ios_base::binary);
if (ifs)
{
// Read the size
ifs.read(reinterpret_cast<char*>(&m_size), sizeof(m_size));
// Read all the data
ifs.read(reinterpret_cast<char*>(&m_data[0]), sizeof(m_data[0]) * m_size);
}
}
Depending on how you define m_data you might need to allocate memory for it before reading the actual data.
Oh, and if you want to append data at the end of the array (but why would you, in the current code you show, you rewrite the whole array anyway) you write the size at the beginning, seek to the end, and then write the new data.

Concatenate multiple chars

I m reading a file and I would like to extract all of its contents and store them into a single char in C++. I know it can be done with strings however I cannot use strings and need to resort to char instead. I can I concatenate multiple chars to one char variable?
Here is what I've tried so far:
string str = "";
ifstream file("c:/path.....");
while (file.good())
{
str += file.get();
}
const char* content = str.c_str();
printf("%c", *content);
but this just gave me the first letter of the file and that's it.
If also tried:
ifstream file("c:/path.....");
char c = ' ';
char result[100];
while (file.good())
{
c= file.get();
strcat(result,c);
}
but this gave me runtime errors all the time.
For the second try you gave in your question (which I guess from your other hints, is what you finally want), you can try the following as a quick fix:
ifstream file("c:/path.....");
char c[2] = { 0, 0 };
char result[100] = { 0 };
for (int i = 0; file && (i < 99); ++i)
{
c[0] = file.get();
strcat(result,c);
}
Since using strcat() might not be very efficient for this use case, I think a better implementation would directly write to the result buffer:
ifstream file("c:/path.....");
char result[100] = { 0 };
for (int i = 0; file && (i < 99); ++i)
{
result[i] = file.get();
}
In your first code block:
const char* content = str.c_str();
printf("%c", *content);
prints only the first character of the string because *contents dereferences the
pointer to the (first character of) the string, and %c is the printf-format for
a single character. You should replace that by
printf("%s", content);
to print the entire string. Or just use
std::cout << str;
Maybe something like that :
char result[100];
int i = 0;
while (file.good())
{
c= file.get();
if(i<100)
result[i++]=c;
}
result[i]='\0';
There a lot of things to improve in this solution (what would you do if there more then 99 chars in your file, file.good() not the best option for loop condition, and so on...). Also it is much better to use strings.I don't know exactly why you can't use them, but just in case you change your mind you can read your file like that :
std::string line;
while ( getline(stream, line)) {
process(line);
}
You can create std::strings of your characters and concatenate them:
char c1 = 'a';
char c2 = 'b';
std::string concatted = std::string(1, c1) + std::string(1, c2);
This works because of the fill std::string constructor, see the reference.
Obviously, you are new to the c++ or at least you're using some strange terminology.
First of all, I advice you to read some c++ literature for beginners (you can find list of it on stackoverflow). Then you will understand all the conceptions of strings in c++.
Second, use this code to read file content and store it in a char*.
FILE *f = fopen("path to file", "r");
fseek(f, 0, SEEK_END);
long fsize = ftell(f);
fseek(f, 0, SEEK_SET);
char *content = new char[fsize + 1];
fread(content, fsize, 1, f);
fclose(f);
content[fsize] = 0;
cout << "{" << content <<"}";
delete content;