i am trying to make an compress program that for example write instead regular 8 bits (char) only 1 or 2 bits, depend of the char we are trying to write. i tried to write with:
//I dont know what the function should return
char getBytes(char c)
{
return 0xff;
}
ofstream fout;
fout.open("file.bin", ios::binary | ios::out);
fout << getBytes(c);
but so far i succed writing only chars.
so how can i write for example: '01'? or only '1'? in what function i should use for write into file with only bytes? thanks.
Streams are sequences of bytes. There is no standard interface to write individual bits. If you want to write individual bits you’ll need to create your own abstraction of a bit stream built on top of streams. It would aggregate multiple bits into a byte which is then written to the underlying stream. If you want reasonable efficient writing you’ll probably need to aggregate multiple bytes before writing them to the stream.
A naïve implementation could look something like this:
class bitstream {
std::streambuf* sbuf;
int count = 0;
unsigned char byte = 0;
public:
bitstream(std::ostream& out): sbuf(out.rdbuf()) {}
~bitstream() {
if (this->count) {
this->sbuf->sputc(this->byte);
}
}
bitstream& operator<< (bool b) {
this->byte = (this->byte << 1) | b;
if (++this->count == 8) {
this->sbuf->sputc(this->byte);
this->count = 0;
}
return *this;
}
};
Note that this implementation has rather basic handling for the last byte: if a byte was started it will be written as is. Whether that is the intended behavior or whether it needs to be shifted for the written bits being in the top bits of the last byte depends on how things are being used. Also, there is no error handling for the case that the underlying stream couldn’t be written. (and the code is untest - I don’t have an easy way to compile on my mobile phone)
You’d use the class something like this:
int main() {
std::ofstream file(“bits.txt”, std::ios_base::binary);
bitstream out(file);
out << false << true << false << false
<< false << false << false << true;
}
Unless I messed things up, the above code should write A into the file bits.txt (the ASCII code for A is 65).
Just for context: files are actually organized into blocks of bytes. However, the stream abstraction aggregates the individual bytes written into blocks. Although byte oriented interfaces are provided by all popular operating systems writing anything but blocks of data tends to be rather inefficient.
Related
I'm working on Huffman coding and I have built the chars frequency table with a
std::map<char,int> frequencyTable;
Then I have built the Huffman tree and then i have built the codes table in this way:
std::map<char,std::vector<bool> > codes;
Now I would read the input file, character by character ,and encode them through the codes table but i don't know how write bits into a binary output file.
Any advice?
UPDATE:
Now i'm trying with these functions:
void Encoder::makeFile()
{
char c,ch;
unsigned char ch2;
while(inFile.get(c))
{
ch=c;
//send the Huffman string to output file bit by bit
for(unsigned int i=0;i < codes[ch].size();++i)
{
if(codes[ch].at(i)==false){
ch2=0;
}else{
ch2=1;
}
encode(ch2, outFile);
}
}
ch2=2; // send EOF
encode(ch2, outFile);
inFile.close();
outFile.close();
}
and this:
void Encoder::encode(unsigned char i, std::ofstream & outFile)
{
int bit_pos=0; //0 to 7 (left to right) on the byte block
unsigned char c; //byte block to write
if(i<2) //if not EOF
{
if(i==1)
c |= (i<<(7-bit_pos)); //add a 1 to the byte
else //i==0
c=c & static_cast<unsigned char>(255-(1<<(7-bit_pos))); //add a 0
++bit_pos;
bit_pos%=8;
if(bit_pos==0)
{
outFile.put(c);
c='\0';
}
}
else
{
outFile.put(c);
}
}
but ,I don't know why ,it doesn't work, the loop is never executed and the encode function is never used, why?
You can't write a single bit directly to a file. The I/O unit of reading/writing is a byte (8-bits). So you need to pack your bools into chunks of 8 bits and then write the bytes. See Writing files in bit form to a file in C or How to write single bits to a file in C for example.
The C++ Standard streams support an access of the smallest unit the underlying CPU supports. That's a byte.
There are implementation of a bit stream class in C++ like the
Stanford Bitstream Class.
Another approach could use the std::bitset class.
I have 640*480 numbers. I need to write them into a file. I will need to read them later. What is the best solution? Numbers are between 0 - 255.
For me the best solution is to write them binary(8 bits). I wrote the numbers into txt file and now it looks like 1011111010111110 ..... So there are no questions where the number starts and ends.
How am I supposed to read them from the file?
Using c++
It's not good idea to write bit values like 1 and 0 to text file. The file size will bigger in 8 times. 1 byte = 8 bits. You have to store bytes, 0-255 - is byte. So your file will have size 640*480 bytes instead of 640*480*8. Every symbol in text file has size of 1 byte minimum. If you want to get bits, use binary operators of programming language that you use. To read bytes much easier. Use binary file for saving your data.
Presumably you have some sort of data structure representing your image, which somewhere inside holds the actual data:
class pixmap
{
public:
// stuff...
private:
std::unique_ptr<std::uint8_t[]> data;
};
So you can add a new constructor which takes a filename and reads bytes from that file:
pixmap(const std::string& filename)
{
constexpr int SIZE = 640 * 480;
// Open an input file stream and set it to throw exceptions:
std::ifstream file;
file.exceptions(std::ios_base::badbit | std::ios_base::failbit);
file.open(filename.c_str());
// Create a unique ptr to hold the data: this will be cleaned up
// automatically if file reading throws
std::unique_ptr<std::uint8_t[]> temp(new std::uint8_t[SIZE]);
// Read SIZE bytes from the file
file.read(reinterpret_cast<char*>(temp.get()), SIZE);
// If we get to here, the read worked, so we move the temp data we've just read
// into where we'd like it
data = std::move(temp); // or std::swap(data, temp) if you prefer
}
I realise I've assumed some implementation details here (you might not be using a std::unique_ptr to store the underlying image data, though you probably should be) but hopefully this is enough to get you started.
You can print the number between 0-255 as the char value in the file.
See the below code. in this example I am printing integer 70 as char.
So this result in print as 'F' on the console.
Similarly you can read it as char and then convert this char to integer.
#include <stdio.h>
int main()
{
int i = 70;
char dig = (char)i;
printf("%c", dig);
return 0;
}
This way you can restrict the file size.
This question already has answers here:
How to read little endian integers from file in C++?
(5 answers)
Closed 7 years ago.
Say I have a binary file; it contains positive binary numbers, but written in big endian as 32-bit integers
How do I read this file? I have this right now.
int main() {
FILE * fp;
char buffer[4];
int num = 0;
fp=fopen("file.bin","rb");
while ( fread(&buffer, 1, 4,fp) != 0) {
// I think buffer should be 32 bit integer I read,
// how can I let num equal to 32 bit big endian integer?
}
return 0;
}
Declare your buffer as:
unsigned char buffer[4];
and you may use this to convert endianess:
int num = (int)buffer[0] | (int)buffer[1]<<8 | (int)buffer[2]<<16 | (int)buffer[3]<<24;
BTW
Of course this applies to x86 architectures that are little endian - otherwise your platform endianess may match your file's endianess so no conversion needed. This way you could read directly into your int without convesions.
You need to find out your endianess first:
How can I find Endian-ness of my PC programmatically using C?
Then you need to act accordingly. If you're the same as the file, you can read the value as is and if you are in a different endianess you need to reorder the bytes:
union Num
{
char buffer[4];
int num;
} num ;
void swapChars(char* pChar1, char* pChar2)
{
char temp = *pChar1;
*pChar1 = *pChar2;
*pChar2 = temp;
}
int swapOrder(Num num)
{
swapChar( &num.buffer[0], &num.buffer[3]);
swapChar( &num.buffer[1], &num.buffer[2]);
return num.num;
}
while ( fread(&num.buffer, 1, 4,fp) != 0)
{
int convertedNum;
if (1 == amIBigEndian)
{
convertedNum = num.num
}
else
{
convertedNum = swapOrder(num);
}
// Do what ever you want with convertedNum here...
}
It is operating system and processor architecture specific.
You might perhaps use routines like htonl(3) or ntohl etc...
but you really should have serialized in a well defined format.
On current machines (where I/O is very slow, w.r.t. CPU speed) I am in favor of using textual serialization formats like JSON, YAML, .... But you could also use binary serialization (and libraries) like BSON, XDR, ASN.1 or the s11n library....
If possible, improve the producer code (the one writing your file.bin file), and the consumer code accordingly.
Binary data is inherently brittle, because it is system and architecture specific. At the very least, document extremely well its format, and preferably give some tools to convert it from and to textual formats.
There are several JSON libraries for C++, like jsoncpp and rapidjson and for C like jansson etc etc...
suppose I send a big buffer to ostream::write, but only the beginning part of it is actually successfully written, and the rest is not written
int main()
{
std::vector<char> buf(64 * 1000 * 1000, 'a'); // 64 mbytes of data
std::ofstream file("out.txt");
file.write(&buf[0], buf.size()); // try to write 64 mbytes
if(file.bad()) {
// but suppose only 10 megabyte were available on disk
// how many were actually written to file???
}
return 0;
}
what ostream function can tell me how many bytes were actually written?
You can use .tellp() to know the output position in the stream to compute the number of bytes written as:
size_t before = file.tellp(); //current pos
if(file.write(&buf[0], buf.size())) //enter the if-block if write fails.
{
//compute the difference
size_t numberOfBytesWritten = file.tellp() - before;
}
Note that there is no guarantee that numberOfBytesWritten is really the number of bytes written to the file, but it should work for most cases, since we don't have any reliable way to get the actual number of bytes written to the file.
I don't see any equivalent to gcount(). Writing directly to the streambuf (with sputn()) would give you an indication, but there is a fundamental problem in your request: write are buffered and failure detection can be delayed to the effective writing (flush or close) and there is no way to get access to what the OS really wrote.
This question already has answers here:
How do I convert between big-endian and little-endian values in C++?
(35 answers)
Closed 9 years ago.
I've been looking around how to convert big-endian to little-endians. But I didn't find any good that could solve my problem. It seem to be there's many way you can do this conversion. Anyway this following code works ok in a big-endian system. But how should I write a conversion function so it will work on little-endian system as well?
This is a homework, but it just an extra since the systems at school running big-endian system. It's just that I got curious and wanted to make it work on my home computer also
#include <iostream>
#include <fstream>
using namespace std;
int main()
{
ifstream file;
file.open("file.bin", ios::in | ios::binary);
if(!file)
cerr << "Not able to read" << endl;
else
{
cout << "Opened" << endl;
int i_var;
double d_var;
while(!file.eof())
{
file.read( reinterpret_cast<char*>(&i_var) , sizeof(int) );
file.read( reinterpret_cast<char*>(&d_var) , sizeof(double) );
cout << i_var << " " << d_var << endl;
}
}
return 0;
}
Solved
So Big-endian VS Little-endian is just a reverse order of the bytes. This function i wrote seem to serve my purpose anyway. I added it here in case someone else would need it in future. This is for double only though, for integer either use the function torak suggested or you can modify this code by making it swap 4 bytes only.
double swap(double d)
{
double a;
unsigned char *dst = (unsigned char *)&a;
unsigned char *src = (unsigned char *)&d;
dst[0] = src[7];
dst[1] = src[6];
dst[2] = src[5];
dst[3] = src[4];
dst[4] = src[3];
dst[5] = src[2];
dst[6] = src[1];
dst[7] = src[0];
return a;
}
You could use a template for your endian swap that will be generalized for the data types:
#include <algorithm>
template <class T>
void endswap(T *objp)
{
unsigned char *memp = reinterpret_cast<unsigned char*>(objp);
std::reverse(memp, memp + sizeof(T));
}
Then your code would end up looking something like:
file.read( reinterpret_cast<char*>(&i_var) , sizeof(int) );
endswap( &i_var );
file.read( reinterpret_cast<char*>(&d_var) , sizeof(double) );
endswap( &d_var );
cout << i_var << " " << d_var << endl;
You might be interested in the ntohl family of functions. These are designed to transform data from network to host byte order. Network byte order is big endian, therefore on big endian systems they don't do anything, while the same code compiled on a little endian system will perform the appropriate byte swaps.
Linux provides endian.h, which has efficient endian swapping routines up to 64-bit. It also automagically accounts for your system's endianness. The 32-bit functions are defined like this:
uint32_t htobe32(uint32_t host_32bits); // host to big-endian encoding
uint32_t htole32(uint32_t host_32bits); // host to lil-endian encoding
uint32_t be32toh(uint32_t big_endian_32bits); // big-endian to host encoding
uint32_t le32toh(uint32_t little_endian_32bits); // lil-endian to host encoding
with similarly-named functions for 16 and 64-bit.
So you just say
x = le32toh(x);
to convert a 32-bit integer in little-endian encoding to the host CPU encoding. This is useful for reading little-endian data.
x = htole32(x);
will convert from the host encoding to 32-bit little-endian. This is useful for writing little-endian data.
Note on BSD systems, the equivalent header file is sys/endian.h
Assuming you're going to be going on, it's handy to keep a little library file of helper functions. 2 of those functions should be endian swaps for 4 byte values, and 2 byte values. For some solid examples (including code) check out this article.
Once you've got your swap functions, any time you read in a value in the wrong endian, call the appropriate swap function. Sometimes a stumbling point for people here is that single byte values do not need to be endian swapped, so if you're reading in something like a character stream that represents a string of letters from a file, that should be good to go. It's only when you're reading in a value this is multiple bytes (like an integer value) that you have to swap them.
It is good to add that MS has this supported on VS too check this inline functions:
htond
htonf
htonl
htonll
htons