Modifying binary files

Modifying binary files - c++

I'm trying to write a small program which will search a binary file for a few bytes and replace these with another bunch of bytes. But everytime I try running this small app I got message about istream_iterator is not dereferenceable.
Maybe someone have a suggestion how to do this in another way (iterators are a little bit a new subject for me).
#include <fstream>
#include <iterator>
#include <algorithm>
using namespace std;
int main() {
typedef istream_iterator<char> input_iter_t;
const off_t SIZE = 4;
char before[SIZE] = { 0x12, 0x34, 0x56, 0x78 };
char after[SIZE] = { 0x78, 0x12, 0x34, 0x65 };
fstream filestream("numbers.exe", ios::binary | ios::in | ios::out);
if (search(input_iter_t(filestream), input_iter_t(), before, before + SIZE) != input_iter_t()) {
filestream.seekp(-SIZE, ios::cur);
filestream.write(after, SIZE);
}
return 0;
}
This is my second attempt to do this but also something is wrong. With small files looks like works OK but with bigger (around 2MB) it works very slowly and never find pattern what I'm looking for.
#include <iostream>
#include <cstdlib>
#include <string>
#include <fstream>
#include <iterator>
#include <vector>
#include <algorithm>
#include <windows.h>
using namespace std;
int main() {
const off_t Size = 4;
unsigned char before[Size] = { 0x12, 0x34, 0x56, 0x78 };
unsigned char after[Size] = { 0x90, 0xAB, 0xCD, 0xEF };
vector<char> bytes;
{
ifstream iFilestream( "numbers.exe", ios::in|ios::binary );
istream_iterator<char> begin(iFilestream), end;
bytes.assign( begin, end ) ;
}
vector<char>::iterator found = search( bytes.begin(), bytes.end(), before, before + Size );
if( found != bytes.end() )
{
copy( after, after + Size, found );
{
ofstream oFilestream( "number-modified.exe" );
copy( bytes.begin(), bytes.end(), ostream_iterator<unsigned char>(oFilestream) );
}
}
return 0;
}
Cheers,
Thomas

Read a larger part of the file to memory, replace it in memory and then dump the bunch to the disk. Reading one byte at a time is very slow.
I also suggest you read about mmap (or MapViewOfFile in win32).

search won't work on an istream_iterator because of the nature of the iterator. It's an input iterator, which means it simply moves forward - this is because it reads from the stream, and once it's read from the stream, it can't go back. search requires a forward iterator, which is an input iterator where you can stop, make a copy, and move one forward while keeping the old one. An example of a forward iterator is a singly-linked list. You can't go backwards, but you can remember where you are and restart from there.
The speed issue is because vector is truly terrible at handling unknown data. Every time it runs out of room, it copies the whole buffer over to new memory. Replace it with a deque, which can handle data arriving one by one. You will also likely get improved performance trying to read from the stream in blocks at a time, as character-by-character access is a pretty bad way to load an entire file into memory.

Assuming the file isn't too large, just read the file into memory, then modify the memory buffer as you see fit, then write it back out to a file.
E.g. (untested):
FILE *f_in = fopen("inputfile","rb");
fseek(f_in,0,SEEK_END);
long size = ftell(f_in);
rewind(f_in);
char* p_buffer = (char*) malloc (size);
fread (p_buffer,size,1,f_in);
fclose(f_in);
unsigned char *p= (unsigned char*)p_buffer;
// process.
FILE *f_out = fopen("outoutfile","wb");
fwrite(p_buffer,size,1,f_out);
fclose(f_out);
free(p_buffer);

Related

Reading multiple bytes from file and storing them for comparison in C++

I want to binary read a photo in 1460 bytes increments and compare consecutive packets for corrupted transmission. I have a python script that i wrote and want to translate in C++, however I'm not sure that what I intend to use is correct.
for i in range(0, fileSize-1):
buff=f.read(1460) // buff stores a packet of 1460 bytes where f is the opened file
secondPacket=''
for j in buff:
secondPacket+="{:02x}".format(j)
if(secondPacket==firstPacket):
print(f'Packet {i+1} identical with {i}')
firstPacket=secondPacket
I have found int fseek ( FILE * stream, long int offset, int origin ); but it's unclear if it reads the first byte that is located offset away from origin or everything in between.
Thanks for clarifications.
#include <iostream>
#include <fstream>
#include <array>
std::array<char, 1460> firstPacket;
std::array<char, 1460> secondPacket;
int i=0;
int main() {
std::ifstream file;
file.open("photo.jpg", std::ios::binary);
while (file.read(firstPacket.data(), firstPacket.size())){
++i;
if (firstPacket==secondPacket)
std::cout<<"Packet "<<i<<" is a copy of packet "<<i-1<<std::endl;
memcpy(&secondPacket, &firstPacket, firstPacket.size());
}
std::cout<<i; //tested to check if i iterate correctly
return 0;
}
This is the code i have so far which doesn't work.

fseek
doesn't read, it just moves the point where the next read operation should begin. If you read the file from start to end you don't need this.
To read binary data you want the aptly named std::istream::read. You can use it like this wih a fixed size buffer:
// char is one byte, could also be uint8_t, but then you would need a cast later on
std::array<char, 1460> bytes;
while(myInputStream.read(bytes.data(), bytes.size())) {
// do work with the read data
}

C++ reading large files part by part

I've been having a problem that I not been able to solve as of yet. This problem is related to reading files, I've looked at threads even on this website and they do not seem to solve the problem. That problem is reading files that are larger than a computers system memory. Simply when I asked this question a while ago I was referred too using the following code.
string data("");
getline(cin,data);
std::ifstream is (data);//, std::ifstream::binary);
if (is)
{
// get length of file:
is.seekg (0, is.end);
int length = is.tellg();
is.seekg (0, is.beg);
// allocate memory:
char * buffer = new char [length];
// read data as a block:
is.read (buffer,length);
is.close();
// print content:
std::cout.write (buffer,length);
delete[] buffer;
}
system("pause");
This code works well apart from the fact that it eats memory like fat kid in a candy store.
So after a lot of ghetto and unrefined programing, I was able to figure out a way to sort of fix the problem. However I more or less traded one problem for another in my quest.
#include <iostream>
#include <vector>
#include <string>
#include <fstream>
#include <stdio.h>
#include <stdlib.h>
#include <iomanip>
#include <windows.h>
#include <cstdlib>
#include <thread>
using namespace std;
/*======================================================*/
string *fileName = new string("tldr");
char data[36];
int filePos(0); // The pos of the file
int tmSize(0); // The total size of the file
int split(32);
char buff;
int DNum(0);
/*======================================================*/
int getFileSize(std::string filename) // path to file
{
FILE *p_file = NULL;
p_file = fopen(filename.c_str(),"rb");
fseek(p_file,0,SEEK_END);
int size = ftell(p_file);
fclose(p_file);
return size;
}
void fs()
{
tmSize = getFileSize(*fileName);
int AX(0);
ifstream fileIn;
fileIn.open(*fileName, ios::in | ios::binary);
int n1,n2,n3;
n1 = tmSize / 32;
// Does the processing
while(filePos != tmSize)
{
fileIn.seekg(filePos,ios_base::beg);
buff = fileIn.get();
// To take into account small files
if(tmSize < 32)
{
int Count(0);
char MT[40];
if(Count != tmSize)
{
MT[Count] = buff;
cout << MT[Count];// << endl;
Count++;
}
}
// Anything larger than 32
else
{
if(AX != split)
{
data[AX] = buff;
AX++;
if(AX == split)
{
AX = 0;
}
}
}
filePos++;
}
int tz(0);
filePos = filePos - 12;
while(tz != 2)
{
fileIn.seekg(filePos,ios_base::beg);
buff = fileIn.get();
data[tz] = buff;
tz++;
filePos++;
}
fileIn.close();
}
void main ()
{
fs();
cout << tmSize << endl;
system("pause");
}
What I tried to do with this code is too work around the memory issue. Rather than allocating memory for a large file that simply does not exist on a my system, I tried to use the memory I had instead which is about 8gb, but I only wanted to use maybe a few Kilobytes of it if at all possible.
To give you a layout of what I am talking about I am going to write a line of text.
"Hello my name is cake please give me cake"
Basically what I did was read said piece of text letter by letter. Then I put those letters into a box that could store 32 of them, from there I could use something like xor and then write them onto another file.
The idea in a way works but it is horribly slow and leaves off parts of files.
So basically how can I make something like this work without going slow or cutting off files. I would love to see how xor works with very large files.
So if anyone has a better idea than what I have, then I would be very grateful for the help.

To read and process the file piece-by-piece, you can use the following snippet:
// Buffer size 1 Megabyte (or any number you like)
size_t buffer_size = 1<<20;
char *buffer = new char[buffer_size];
std::ifstream fin("input.dat");
while (fin)
{
// Try to read next chunk of data
fin.read(buffer, buffer_size);
// Get the number of bytes actually read
size_t count = fin.gcount();
// If nothing has been read, break
if (!count)
break;
// Do whatever you need with first count bytes in the buffer
// ...
}
delete[] buffer;
The buffer size of 32 bytes, as you are using, is definitely too small. You make too many calls to library functions (and the library, in turn, makes calls (although probably not every time) to OS, which are typically slow, since they cause context-switching). There is also no need of tell/seek.

If you don't need all the file content simultaneously, reduce the working set first - like a set of about 32 words, but since XOR can be applied sequentially, you may further simplify the working set with constant size, like 4 kilo-bytes.
Now, you have the option to use file reader is.read() in a loop and process a small set of data each iteration, or use memmap() to map the file content as memory pointer which you can perform both read and write operations.

Buffer size for reading a UTF-8-encoded file using ICU (ICU4C)

I am trying to read a UTF-8-encoded file using ICU4C on Windows with msvc11. I need to determine the size of the buffer to build a UnicodeString. Since there is no fseek-like function in the ICU4C API I thought I could use an underlying C-file:
#include <unicode/ustdio.h>
#include <stdio.h>
/*...*/
UFILE *in = u_fopen("utfICUfseek.txt", "r", NULL, "UTF-8");
FILE* inFile = u_fgetfile(in);
fseek(inFile, 0, SEEK_END); /* Access violation here */
int size = ftell(inFile);
auto uChArr = new UChar[size];
There are two problems with this code:
It "throws" access violation at the fseek() line for some reason (Unhandled exception at 0x000007FC5451AB00 (ntdll.dll) in test.exe: 0xC0000005: Access violation writing location 0x0000000000000024.)
The size returned by the ftell function will not be the size I want because UTF-8 can use up to 4 bytes for a code point (a u8"tю" string will be of length 3).
So the questions are:
How do I determine a buffer size for a UnicodeString if I know that the input file is UTF-8-encoded?
Is there a portable way to use iostream/fstream for both reading and writing ICU's UnicodeStrings?
Edit:
Here is the possible solution (tested on msvc11 and gcc 4.8.1) based on the first answer and C++11 Standard. A few things from ISO IEC 14882 2011:
"The fundamental storage unit in the C++ memory model is the byte. A
byte is at least large enough to contain any member of the basic
execution character set (2.3) and the eight-bit code units of the
Unicode UTF-8 encoding form..."
"The basic source character set consists of 96 characters...", - 7 bits needed already
"The basic execution character set and the basic execution
wide-character set shall each contain all the members of the basic
source character set..."
"Objects declared as characters (char) shall be large enough to
store any member of the implementation’s basic character set."
So, to make this portable for platforms where the implementation defined size of char is 1 byte = 8 bits (don't know where this isn't true) we can read Unicode characters into chars using unformatted input operation:
std::ifstream is;
is.open("utfICUfSeek.txt");
is.seekg(0, is.end);
int strSize = is.tellg();
auto inputCStr = new char[strSize + 1];
inputCStr[strSize] = '\0'; //add null-character at the end
is.seekg(0, is.beg);
is.read(inputCStr, strSize);
is.seekg(0, is.beg);
UnicodeString uStr = UnicodeString::fromUTF8(inputCStr);
is.close();
What troubles me is that I have to create an additional buffer for chars and only then convert them to the required UnicodeString.

This is an alternative to using ICU.
Using the standard std::fstream you can read the whole/ part of the file into a standard std::string then iterate over that with a unicode aware iterator. http://code.google.com/p/utf-iter/
std::string get_file_contents(const char *filename)
{
std::ifstream in(filename, std::ios::in | std::ios::binary);
if (in)
{
std::string contents;
in.seekg(0, std::ios::end);
contents.reserve(in.tellg());
in.seekg(0, std::ios::beg);
contents.assign((std::istreambuf_iterator<char>(in)), std::istreambuf_iterator<char>());
in.close();
return(contents);
}
throw(errno);
}
Then in your code
std::string myString = get_file_contents( "foobar" );
unicode::iterator< std::string, unicode::utf8 /* or utf16/32 */ > iter = myString.begin();
while ( iter != myString.end() )
{
...
++iter;
}

Well, either you want to read in the whole file at once for some kind of postprocessing, in which case icu::UnicodeString is not really the best container...
#include <iostream>
#include <fstream>
#include <sstream>
int main()
{
std::ifstream in( "utfICUfSeek.txt" );
std::stringstream buffer;
buffer << in.rdbuf();
in.close();
// ...
return 0;
}
...or what you really want is to read into icu::UnicodeString just like into any other string object but went the long way around...
#include <iostream>
#include <fstream>
#include <unicode/unistr.h>
#include <unicode/ustream.h>
int main()
{
std::ifstream in( "utfICUfSeek.txt" );
icu::UnicodeString uStr;
in >> uStr;
// ...
in.close();
return 0;
}
...or I am completely missing what your problem really is about. ;)

Read file to memory, loop through data, then write file [duplicate]

This question already has answers here:
How to read line by line after i read a text into a buffer?
(4 answers)
Closed 10 years ago.
I'm trying to ask a similar question to this post:
C: read binary file to memory, alter buffer, write buffer to file
but the answers didn't help me (I'm new to c++ so I couldn't understand all of it)
How do I have a loop access the data in memory, and go through line by line so that I can write it to a file in a different format?
This is what I have:
#include <fstream>
#include <iostream>
#include <string>
#include <sstream>
#include <vector>
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <stdlib.h>
using namespace std;
int main()
{
char* buffer;
char linearray[250];
int lineposition;
double filesize;
string linedata;
string a;
//obtain the file
FILE *inputfile;
inputfile = fopen("S050508-v3.txt", "r");
//find the filesize
fseek(inputfile, 0, SEEK_END);
filesize = ftell(inputfile);
rewind(inputfile);
//load the file into memory
buffer = (char*) malloc (sizeof(char)*filesize); //allocate mem
fread (buffer,filesize,1,inputfile); //read the file to the memory
fclose(inputfile);
//Check to see if file is correct in Memory
cout.write(buffer,filesize);
free(buffer);
}
I appreciate any help!
Edit (More info on the data):
My data is different files that vary between 5 and 10gb. There are about 300 million lines of data. Each line looks like
M359
T359 3520 359
M400
A3592 zng 392
Where the first element is a character, and the remaining items could be numbers or characters. I'm trying to read this into memory since it will be a lot faster to loop through line by line, than reading a line, processing, and then writing. I am compiling in 64bit linux. Let me know if I need to clarify further. Again thank you.
Edit 2
I am using a switch statement to process each line, where the first character of each line determines how to format the rest of the line. For example 'M' means millisecond, and I put the next three numbers into a structure. Each line has a different first character that I need to do something different for.

So pardon the potentially blatantly obvious, but if you want to process this line by line, then...
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main(int argc, char *argv[])
{
// read lines one at a time
ifstream inf("S050508-v3.txt");
string line;
while (getline(inf, line))
{
// ... process line ...
}
inf.close();
return 0;
}
And just fill in the body of the while loop? Maybe I'm not seeing the real problem (a forest for the trees kinda thing).
EDIT
The OP is inline with using a custom streambuf which may not necessarily be the most portable thing in the world, but he's more interested in avoiding flipping back and forh between input and output files. With enough RAM, this should do the trick.
#include <iostream>
#include <fstream>
#include <iterator>
#include <memory>
using namespace std;
struct membuf : public std::streambuf
{
membuf(size_t len)
: streambuf()
, len(len)
, src(new char[ len ] )
{
setg(src.get(), src.get(), src.get() + len);
}
// direct buffer access for file load.
char * get() { return src.get(); };
size_t size() const { return len; };
private:
std::unique_ptr<char> src;
size_t len;
};
int main(int argc, char *argv[])
{
// open file in binary, retrieve length-by-end-seek
ifstream inf(argv[1], ios::in|ios::binary);
inf.seekg(0,inf.end);
size_t len = inf.tellg();
inf.seekg(0, inf.beg);
// allocate a steam buffer with an internal block
// large enough to hold the entire file.
membuf mb(len+1);
// use our membuf buffer for our file read-op.
inf.read(mb.get(), len);
mb.get()[len] = 0;
// use iss for your nefarious purposes
std::istream iss(&mb);
std::string s;
while (iss >> s)
cout << s << endl;
return EXIT_SUCCESS;
}

You should look into fgets and scanf, in which you can pull out matched pieces of data so it is easier to manipulate, assuming that is what you want to do. Something like this could look like:
FILE *input = fopen("file.txt", "r");
FILE *output = fopen("out.txt","w");
int bufferSize = 64;
char buffer[bufferSize];
while(fgets(buffer,bufferSize,input) != EOF){
char data[16];
sscanf(buffer,"regex",data);
//manipulate data
fprintf(output,"%s",data);
}
fclose(output);
fclose(input);
That would be more of the C way to do it, C++ handles things a little more eloquently by using an istream:
http://www.cplusplus.com/reference/istream/istream/

If I had to do this, I'd probably use code something like this:
std::ifstream in("S050508-v3.txt");
std::istringstream buffer;
buffer << in.rdbuf();
std::string data = buffer.str();
if (check_for_good_data(data))
std::cout << data;
This assumes you really need the entire contents of the input file in memory at once to determine whether it should be copied to output or not. If (for example) you can look at the data one byte at a time, and determine whether that byte should be copied without looking at the others, you could do something more like:
std::ifstream in(...);
std::copy_if(std::istreambuf_iterator<char>(in),
std::istreambuf_iterator<char>(),
std::ostream_iterator<char>(std::cout, ""),
is_good_char);
...where is_good_char is a function that returns a bool saying whether that char should be included in the output or not.
Edit: the size of files you're dealing with mostly rules out the first possibility I've given above. You're also correct that reading and writing large chunks of data will almost certainly improve speed over working on one line at a time.

Converting double to unsigned char?

I'm trying to convert a matrix of (doubles) into an unsigned char so I can then write to a .pmg file... But it isn't working.
void writePNG(vector<double>& matrix)
{
vector<unsigned char> image;
ofstream myfile;
myfile.open("newFile.txt", ios::out); // writing to .txt file for now for testing.
if(!myfile.is_open())
{
cout << "Cannot open file";
}
for(int i=0; (i < 512*512); i++)
{
image[i] = (unsigned char) matrix[i];
}
myfile.close();
}
It won't convert the data. Any ideas?? Thanks :)

bug: You are creating a vector of size 0, and then writing to its non-existent elements.
bug: You never write the data to a file
style: You close the file needlessly. It will be closed when the fstream object goes out of scope
style: You copy the data in a loop. Using vector::vector displays your intent more clearly.
potential bug: You create an output vector of 512x512, regardless of the size of the input vector.
SSCCE Your testcase is incomplete.
Try this:
#include <vector>
#include <iostream>
#include <fstream>
void writePNG(const std::vector<double>& matrix)
{
std::ofstream myfile("newFile.txt", std::ios::out);
if(!myfile.is_open())
{
std::cout << "Cannot open file";
}
std::vector<unsigned char> image (matrix.begin(), matrix.end());
myfile.write(reinterpret_cast<const char*>(&image[0]), image.size());
}
int main () {
writePNG({
72, 101, 108.1, 108.2,
111, 0x2c, 0x20, 0x77,
0x6f, 0x72, 0x6c,
100.3, 10.4});
}

You are creating image using the default constructor of vector, which initializes the vector as empty (containing no elements). The subscript notation (image[i]) does not create an element, only assigns to an already exising one.
You have (at least) two ways to fix it:
declare image using the ctor of vector that allocates the necessary size: vector<unsigned char> image(512*512) -- this will populate the vector with 512*512 elements of default value (0 for unsigned char)
add the elements one-by-one using the push_back method: image.push_back((unsigned char) matrix[i]);
You also will have to write the contents of image to myfile eventually.
Note: it is a good habit to use static_cast<unsigned char>(...) instead of the C-style (unsigned char) ... as the former can find errors that the latter will not flag; this is not an issue in this particular case, though

You image vector has zero size - you would have to at least do a push_back to add an element. Also, the size of a double is not the same size of a char so you are going to lose information.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Modifying binary files - c++

Read a larger part of the file to memory, replace it in memory and then dump the bunch to the disk. Reading one byte at a time is very slow. I also suggest you read about mmap (or MapViewOfFile in win32).

Related

Reading multiple bytes from file and storing them for comparison in C++

C++ reading large files part by part

Buffer size for reading a UTF-8-encoded file using ICU (ICU4C)

Read file to memory, loop through data, then write file [duplicate]

Converting double to unsigned char?

Categories

Resources