Reading bytes one by one from binary file - c++

this is my question, i want to open a .jpg file and write each byte as a decimal number (0-255) separated with a comma, into another .txt file. now it should be able to build the .jpf file again using that txt file. this is how i tried to do it.
#include<iostream>
#include<fstream>
using namespace std;
int main()
{
long x;
char *s;
ifstream ifs("image.jpg",ios::binary);
ifs.seekg(0,ios::end);
x=ifs.tellg();
ifs.seekg(0,ios::beg);
s=new char[x];
ifs.read(s,x);
ifs.close();
ofstream is("image.txt");
for(int i=0;i<x;i++){
is<<(unsigned int)s[i]<<",";
}
now this program creats image.txt with decimal numbers as follows,
4294967295,4294967256,4294967295,4294967264,0,16,74,70,73,70,0,1,......
here some numbers seems to be 4bytes long, s[i] refers only one byte, so how can (int)s[i] return a large number than 255. please can some one help me on this.... thanks..

It seems on your machine char is signed. So when you cast a negative number to unsigned int, you get a big value. The big values in the output are negative values when representing them using char. Note that when char is signed, its value can be -128 to 127 but a byte can be between 0 to 255. So any value greater than 127 would become negative between the range -128 to -1.
Use unsigned char as:
unsigned char *s;
Or do this:
is<< static_cast<unsigned int> (static_cast<unsigned char>(s[i]) )<<",";
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
casting to unsigned char first
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
then casting to unsigned int
That is, cast first char to unsigned char, then to unsigned int.
Well that is all about the issue you're facing. Now some notes on style and idioms. In C++, you should avoid using new as much as possible. In your case, you can use std::vector as:
//define file stream object, and open the file
std::ifstream file("image.jpg",ios::binary);
//prepare iterator pairs to iterate the file content!
std::istream_iterator<unsigned char> begin(file), end;
//reading the file content using the iterator!
std::vector<unsigned char> buffer(begin,end);
The last line reads all the data from the file into buffer. Now you can print them as:
std::copy(buffer.begin(),
buffer.end(),
std::ostream_iterator<unsigned int>(std::cout, ","));
For all these to work, you need to include the following headers in addition to what you have already added in your code:
#include <vector> //for vector
#include <iterator> //for std::istream_iterator and std::ostream_iterator
#include <algorithm> //for std::copy
As you can see, this idiomatic solution doesn't use pointer and new, neither does it use cast!

Related

How to use std::string to store bytes (unsigned chars) in a right way?

I'm coding LZ77 compression algorithm, and I have trouble storing unsigned chars in a string. To compress any file, I use its binary representation and then read it as chars (because 1 char is equal to 1 byte, afaik) to a std::string. Everything works perfectly fine with chars. But after some time googling I learned that char is not always 1 byte, so I decided to swap it for unsigned char. And here things start to get tricky:
When compressing plain .txt, everything works as expected, I get equal files before and after decompression (I assume it should, because we basically work with text before and after byte conversion)
However, when trying to compress .bmp, decompressed file loses 3 bytes compared to input file (I lose these 3 bytes when trying to save unsigned chars to a std::string)
So, my question is – is there a way to properly save unsigned chars to
a string?
I tried to use typedef basic_string<unsigned char> ustring and swap all related functions for their basic alternatives to use with unsigned char, but I still lose 3 bytes.
UPDATE: I found out that 3 bytes (symbols) are lost not because of
std::string, but because of std::istream_iterator (that I use instead
of std::istreambuf_iterator) to create string of unsigned chars
(because std::istreambuf_iterator's argument is char, not unsigned
char)
So, are there any solutions to this particular problem?
Example:
std::vector<char> tempbuf(std::istreambuf_iterator<char>(file), {}); // reads 112782 symbols
std::vector<char> tempbuf(std::istream_iterator<char>(file), {}); // reads 112779 symbols
Sample code:
void LZ77::readFileUnpacked(std::string& path)
{
std::ifstream file(path, std::ios::in | std::ios::binary);
if (file.is_open())
{
// Works just fine with char, but loses 3 bytes with unsigned
std::string tempstring = std::string(std::istreambuf_iterator<char>(file), {});
file.close();
}
else
throw std::ios_base::failure("Failed to open the file");
}
char in all of its forms (and std::byte, which is isomorphic with unsigned char) is always the smallest possible type that a system supports. The C++ standard defines that sizeof(char) and its variations shall always be exactly 1.
"One" what? That's implementation-defined. But every type in the system will be some multiple of sizeof(char) in size.
So you shouldn't be too concerned over systems where char is not one byte. If you're working under a system where CHAR_BITS isn't 8, then that system can't handle 8-bit bytes directly at all. So unsigned char won't be any different/better for this purpose.
As to the particulars of your problem, istream_iterator is fundamentally different from istreambuf_iterator iterator. The purpose of the latter is to allow iterator access to the actual stream as a sequence of values. The purpose of istream_iterator<T> is to allow access to a stream as if by performing a repeated sequence of operator >> calls with a T value.
So if you're doing istream_iterator<char>, then you're saying that you want to read the stream as if you did stream >> some_char; variable for each iterator access. That isn't actually isomorphic with accessing the stream's characters directly. Specifically, FormattedInputFunctions like operator>> can do things like skip whitespace, depending on how you set up your stream.
istream_iterator is reading using operator>> which usually skip white spaces as part of its function. If you want to disable that behavior, you'll have to do
#include <ios>
file >> std::noskipws;

Parsing binary files in C++

I'm trying to read a binary format using C++
For some reason I'm able to parse only the first variable.
The header sequence is:
[2 byte integer][1 byte integer][1byte integer]
#include <iostream>
#include <fstream>
using namespace std;
struct HDR {
unsigned short int signature;
unsigned char version;
unsigned char tricnt;
} header;
int main(){
ifstream infile("1.mdl",ios::in | ios::binary);
if(!infile){
cout<<"Error\n";
return 1;
}
infile.read(reinterpret_cast<char *>(&header),sizeof(HDR));
cout<<"SIG "<<header.signature<<endl;
cout<<"VER "<<header.version<<endl;
cout<<"TRI "<<header.tricnt<<endl;
return 0;
}
For some reason I'm able to parse only the signature, the rest of the structure is empty.
Unless you have specific knowledge of the padding used by your implementation you should read into the members individually.
infile.read(reinterpret_cast<char *>(&header.signature), sizeof header.signature);
infile.read(reinterpret_cast<char *>(&header.version), sizeof header.version);
infile.read(reinterpret_cast<char *>(&header.tricnt), sizeof header.tricnt);
Of course, you are still relying on unsigned short being 2 bytes on your platform and the representation in the file having the same endianness as your machine but at least you aren't making assumptions about structure padding.
Naturally, when you're printing unsigned char the character represented will be printed. If you want to see the numeric value you should cast to a non-char integer type. ASCII 1 (start of header) and 3 (end of text) are control characters and not usually visible when printed.
cout<<"VER "<< static_cast<int>(header.version) <<endl;
cout<<"TRI "<< static_cast<int>(header.tricnt) <<endl;

C++ and binary files - newbie question

I have the following code and i am trying to write some data in a binary file.
The problem is that i don't have any experience with binary files and i cant understand what i am doing.
#include <iostream>
#include <fstream>
#include <string>
#define RPF 5
using namespace std;
int write_header(int h_len, ofstream& f)
{
int h;
for(h=0;h<h_len;h++)
{
int num = 0;
f.write((char*)&num,sizeof(char));
}
return 0;
}
int new_file(const char* name)
{
ofstream n_file(name,ofstream::binary);
write_header(RPF,n_file);
n_file.close();
return 0;
}
int main(int argc, char **argv)
{
ofstream file("file.dat",ofstream::binary);
file.seekp(10);
file.write("this is a message",3);
new_file("file1.dat");
cin.get();
return 0;
}
1. As you can see i am opening file.dat and writing inside the word "thi". Then i open the file and i see the ASCII value of it. Why does this happen?
Then i make a new file file1.dat and i try to write in it the number 0 five times.
What i am supposed to use?
this
f.write((char*)&num,sizeof(char));
or this
f.write((char*)&num,sizeof(int));
and why i cant write the value of the number as is and i have to cast it as a char*?
Is this because write() works like this or i am able to write only chars to a binary file?
Can anyone help me understand what's happening?
Function write() that a pointer to your data buffer and the length in bytes of the data to be streamed to the file. So when you say
file.write("this is a message",3);
you tell the write function to write 3 bytes in the file. And that is "thi".
This
f.write((char*)&num,sizeof(char));
tells the write function to put sizeof(char) bytes in the file. That is 1 byte. You probably want it
f.write((char*)&num,sizeof(int));
as num is a int variable.
You are writing the ASCII string "thi" to file.dat. If you opened the file in a hex editor, you would see "74 68 69", which is the numeric representations of those characters. But if you open file.dat in an editor that understands ASCII, it will most likely translate those values back to their ASCII representation to make it easier to view. Opening the ofstream in ios::binary mode means that data is output to file as is, and no transformations may be applied by the stream before hand.
The function ofstream::write(const char *data, streamsize len) has two parameters. data is like this so that write is operating on individual bytes. That is why you have to cast num to a char* first. The second parameter, len, indicates how many bytes, starting from data that will be written to the file. My advise would be to use write(static_cast<char*>(num), sizeof(num)), then set num to be a type big enough to store the data required. If you declare int num, then on a 32bit platform, 20 zero bytes would be written to the file. If you only want 5 zero bytes, then declare as char num.

Equivalent of a python generator in C++ for buffered reads

Guido Van Rossum demonstrates the simplicity of Python in this article and makes use of this function for buffered reads of a file of unknown length:
def intsfromfile(f):
while True:
a = array.array('i')
a.fromstring(f.read(4000))
if not a:
break
for x in a:
yield x
I need to do the same thing in C++ for speed reasons! I have many files containing sorted lists of unsigned 64 bit integers that I need to merge. I have found this nice piece of code for merging vectors.
I am stuck on how to make an ifstream for a file of unknown length present itself as a vector which can be happily iterated over until the end of the file is reached. Any suggestions? Am I barking up the correct tree with an istreambuf_iterator?
In order to disguise an ifstream (or really, any input stream) in a form that acts like an iterator, you want to use the istream_iterator or the istreambuf_iterator template class. The former is useful for files where the formatting is of concern. For example, a file full of whitespace-delimited integers can be read into the vector's iterator range constructor as follows:
#include <fstream>
#include <vector>
#include <iterator> // needed for istream_iterator
using namespace std;
int main(int argc, char** argv)
{
ifstream infile("my-file.txt");
// It isn't customary to declare these as standalone variables,
// but see below for why it's necessary when working with
// initializing containers.
istream_iterator<int> infile_begin(infile);
istream_iterator<int> infile_end;
vector<int> my_ints(infile_begin, infile_end);
// You can also do stuff with the istream_iterator objects directly:
// Careful! If you run this program as is, this won't work because we
// used up the input stream already with the vector.
int total = 0;
while (infile_begin != infile_end) {
total += *infile_begin;
++infile_begin;
}
return 0;
}
istreambuf_iterator is used to read through files a single character at a time, disregarding the formatting of the input. That is, it will return you all characters, including spaces, newline characters, and so on. Depending on your application, that may be more appropriate.
Note: Scott Meyers explains in Effective STL why the separate variable declarations for istream_iterator are needed above. Normally, you would do something like this:
ifstream infile("my-file.txt");
vector<int> my_ints(istream_iterator<int>(infile), istream_iterator<int>());
However, C++ actually parses the second line in an incredibly bizarre way. It sees it as the declaration of a function named my_ints that takes in two parameters and returns a vector<int>. The first parameter is of type istream_iterator<int> and is named infile (the parantheses are ignored). The second parameter is a function pointer with no name that takes zero arguments (because of the parantheses) and returns an object of type istream_iterator<int>.
Pretty cool, but also pretty aggravating if you're not watching out for it.
EDIT
Here's an example using the istreambuf_iterator to read in a file of 64-bit numbers laid out end-to-end:
#include <fstream>
#include <vector>
#include <algorithm>
#include <iterator>
using namespace std;
int main(int argc, char** argv)
{
ifstream input("my-file.txt");
istreambuf_iterator<char> input_begin(input);
istreambuf_iterator<char> input_end;
// Fill a char vector with input file's contents:
vector<char> char_input(input_begin, input_end);
input.close();
// Convert it to an array of unsigned long with a cast:
unsigned long* converted = reinterpret_cast<unsigned long*>(&char_input[0]);
size_t num_long_elements = char_input.size() * sizeof(char) / sizeof(unsigned long);
// Put that information into a vector:
vector<unsigned long> long_input(converted, converted + num_long_elements);
return 0;
}
Now, I personally rather dislike this solution (using reinterpret_cast, exposing char_input's array), but I'm not familiar enough with istreambuf_iterator to comfortably use one templatized over 64-bit characters, which would make this much easier.

Character by Character Input from a file, in C++

Is there any way to get input from a file one number at a time?
For example I want to store the following integer in an vector of integers since it is so long and can't be held by even a long long int.
12345678901234567900
So how can I read this number from a file so that I can:
vector<int> numbers;
number.push_back(/*>>number goes here<<*/)
I know that the above code isn't really complete but I hope that it explains what I am trying to do.
Also I've tried google and so far it has proved innefective because only tutorials for C are coming up which aren't really helping me all too much.
Thank is advance,
Dan Chevalier
This could be done in a variety of ways, all of them boiling down to converting each char '0'..'9' to the corresponding integer 0..9. Here's how it can be done with a single function call:
#include <string>
#include <iostream>
#include <vector>
#include <iterator>
#include <functional>
#include <algorithm>
int main()
{
std::string s = "12345678901234567900";
std::vector<int> numbers;
transform(s.begin(), s.end(), back_inserter(numbers),
std::bind2nd(std::minus<char>(), '0'));
// output
copy(numbers.begin(), numbers.end(),
std::ostream_iterator<int>(std::cout, " "));
std::cout << '\n';
}
When reading from a file, you could read the string and transform(), or even transform() directly from istream iterators, if there is nothing else in that file besides your number:
std::ifstream f("test.txt");
std::vector<int> numbers;
transform(std::istream_iterator<char>(f),
std::istream_iterator<char>(),
back_inserter(numbers),
std::bind2nd(std::minus<char>(), '0'));
Off the top of my head this should fill up a character array which you can then iterate through. I realize it's not exactly what you were after but it's my preferred method.
void readfile(char *string)
{
ifstream NumberFile;
NumberFile.open("./Number"); //For a unix file-system
NumberFile >> string;
NumberFile.close();
}
Also, to perform operations on the actual numbers you can use:
CharacterArray[ElementNumber] - '0'
and to get the number when it is small enough to fit in a datatype you add each element of the array multiplied by 10 to the power of its index.
You can read a char at a time with char c; cin.get(c); and convert it to the numeral with c -= '0'. But perhaps you can just read it as a string or use something like BigNum.