I am writing a compression program, and need to write bit data to a binary file using c++. If anyone could advise on the write statement, or a website with advice, I would be very grateful.
Apologies if this is a simple or confusing question, I am struggling to find answers on web.
Collect the bits into whole bytes, such as an unsigned char or std::bitset (where the bitset size is a multiple of CHAR_BIT), then write whole bytes at a time. Computers "deal with bits", but the available abstraction – especially for IO – is that you, as a programmer, deal with individual bytes. Bitwise manipulation can be used to toggle specific bits, but you're always handling byte-sized objects.
At the end of the output, if you don't have a whole byte, you'll need to decide how that should be stored. Both iostreams and stdio can write unformatted data using ostream::write and fwrite, respectively.
Instead of a single char or bitset<8> (8 being the most common value for CHAR_BIT), you might consider using a larger block size, such as an array of 4-32, or more, chars or the equivalent sized bitset.
For writing binary, the trick I have found most helpful is to store all the binary as a single array in memory and then move it all over to the hard drive. Doing a bit at a time, or a byte at a time, or an unsigned long long at a time is not as fast as having all the data stored in an array and using one instance of "fwrite()" to store it to the hard drive.
size_t fwrite ( const void * ptr, size_t size, size_t count, FILE * stream );
Ref: http://www.cplusplus.com/reference/clibrary/cstdio/fwrite/
In English:
fwrite( [array* of stored data], [size in bytes of array OBJECT. For unsigned chars -> 1, for unsigned long longs -> 8], [number of instances in array], [FILE*])
Always check your returns for validation of success!
Additionally, an argument can be made that having the object type be as large as possible is the fastest way to go ([unsigned long long] > [char]). While I am not versed in the coding behind "fwrite()", I feel the time to convert from the natural object used in your code to [unsigned long long] will take more time when combined with the writing than the "fwrite()" making due with what you have.
Back when I was learning Huffman Coding, it took me a few hours to realize that there was a difference between [char] and [unsigned char]. Notice for this method that you should always use unsigned variables to store the pure binary.
by below class you can write and read bit by bit
class bitChar{
public:
unsigned char* c;
int shift_count;
string BITS;
bitChar()
{
shift_count = 0;
c = (unsigned char*)calloc(1, sizeof(char));
}
string readByBits(ifstream& inf)
{
string s ="";
char buffer[1];
while (inf.read (buffer, 1))
{
s += getBits(*buffer);
}
return s;
}
void setBITS(string X)
{
BITS = X;
}
int insertBits(ofstream& outf)
{
int total = 0;
while(BITS.length())
{
if(BITS[0] == '1')
*c |= 1;
*c <<= 1;
++shift_count;
++total;
BITS.erase(0, 1);
if(shift_count == 7 )
{
if(BITS.size()>0)
{
if(BITS[0] == '1')
*c |= 1;
++total;
BITS.erase(0, 1);
}
writeBits(outf);
shift_count = 0;
free(c);
c = (unsigned char*)calloc(1, sizeof(char));
}
}
if(shift_count > 0)
{
*c <<= (7 - shift_count);
writeBits(outf);
free(c);
c = (unsigned char*)calloc(1, sizeof(char));
}
outf.close();
return total;
}
string getBits(unsigned char X)
{
stringstream itoa;
for(unsigned s = 7; s > 0 ; s--)
{
itoa << ((X >> s) & 1);
}
itoa << (X&1) ;
return itoa.str();
}
void writeBits(ofstream& outf)
{
outf << *c;
}
~bitChar()
{
if(c)
free(c);
}
};
for example
#include <iostream>
#include <sstream>
#include <fstream>
#include <string>
#include <stdlib.h>
using namespace std;
int main()
{
ofstream outf("Sample.dat");
ifstream inf("Sample.dat");
string enCoded = "101000001010101010";
//write to file
cout << enCoded << endl ; //print 101000001010101010
bitChar bchar;
bchar.setBITS(enCoded);
bchar.insertBits(outf);
//read from file
string decoded =bchar.readByBits(inf);
cout << decoded << endl ; //print 101000001010101010000000
return 0;
}
Related
I am trying to convert some C++ code to C for my compiler that can't run with C++ code. I'd like to create the template below to C. This template converts the decimal integer to hexadecimal, and adds 0 in front of value if the size of the hexadecimal string is smaller than (sizeof(T)*2). Data type T can be unsigned char, char, short, unsigned short, int, unsigned int, long long, and unsigned long long.
template< typename T > std::string hexify(T i)
{
std::stringbuf buf;
std::ostream os(&buf);
os << std::setfill('0') << std::setw(sizeof(T) * 2)
<< std::hex << i;
std::cout<<"sizeof(T) * 2 = "<<sizeof(T) * 2<<" buf.str() = "<<buf.str()<<" buf.str.c_str() = "<<buf.str().c_str()<<std::endl;
return buf.str().c_str();
}
Thank you for tour help.
Edit 1: I have tried to use the declaration
char * hexify (void data, size_t data_size)
but when I call with the int value int_value:
char * result = hexify(int_value, sizeof(int))
it doesn't work because of:
noncompetitive type (void and int).
So in this case, do I have to use a macro? I haven't tried with macro because it's complicated.
C does not have templates. One solution is to pass the maximum width integer supported (uintmax_t, in Value below) and the size of the original integer (in Size). One routine can use the size to determine the number of digits to print. Another complication is C does not provide C++’s std::string with is automatic memory management. A typical way to handle this in C is for the called function to allocate a buffer and return it to the caller, who is responsible for freeing it when done.
The code below shows a hexify function that does this, and it also shows a Hexify macro that takes a single parameter and passes both its size and its value to the hexify function.
Note that, in C, character constants such as 'A' have type int, not char, so some care is needed in providing the desired size. The code below includes an example for that.
#include <inttypes.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
char *hexify(size_t Size, uintmax_t Value)
{
// Allocate space for "0x", 2*Size digits, and a null character.
size_t BufferSize = 2 + 2*Size + 1;
char *Buffer = malloc(BufferSize);
// Ensure a buffer was allocated.
if (!Buffer)
{
fprintf(stderr,
"Error, unable to allocate buffer of %zu bytes in %s.\n",
BufferSize, __func__);
exit(EXIT_FAILURE);
}
// Format the value as "0x" followed by 2*Size hexadecimal digits.
snprintf(Buffer, BufferSize, "0x%0*" PRIxMAX, (int) (2*Size), Value);
return Buffer;
}
/* Provide a macro that passes both the size and the value of its parameter
to the hexify function.
*/
#define Hexify(x) (hexify(sizeof (x), (x)))
int main(void)
{
char *Buffer;
/* Show two examples of using the hexify function with different integer
types. (The examples assume ASCII.)
*/
char x = 'A';
Buffer = hexify(sizeof x, x);
printf("Character '%c' = %s.\n", x, Buffer); // Prints "0x41".
free(Buffer);
int i = 123;
Buffer = hexify(sizeof i, i);
printf("Integer %d = %s.\n", i, Buffer); // Prints "0x00007b".
free(Buffer);
/* Show examples of using the Hexify macro, demonstrating that 'A' is an
int value, not a char value, so it would need to be cast if a char is
desired.
*/
Buffer = Hexify('A');
printf("Character '%c' = %s.\n", 'A', Buffer); // Prints "0x00000041".
free(Buffer);
Buffer = Hexify((char) 'A');
printf("Character '%c' = %s.\n", 'A', Buffer); // Prints "0x41".
free(Buffer);
}
You don't need templates if you step down to raw bits and bytes.
If performance is important, it is also best to roll out the conversion routine by hand, since the string handling functions in C and C++ come with lots of slow overhead. The somewhat well-optimized version would look something like this:
char* hexify_data (char*restrict dst, const char*restrict src, size_t size)
{
const char NIBBLE_LOOKUP[0xF+1] = "0123456789ABCDEF";
char* d = dst;
for(size_t i=0; i<size; i++)
{
size_t byte = size - i - 1; // assuming little endian
*d = NIBBLE_LOOKUP[ (src[byte]&0xF0u)>>4 ];
d++;
*d = NIBBLE_LOOKUP[ (src[byte]&0x0Fu)>>0 ];
d++;
}
*d = '\0';
return dst;
}
This breaks down any passed type byte-by-byte, using a character type. Which is fine, when using character types specifically. It also uses caller allocation for maximum performance. (It can also be made endianess-independent with an extra check per loop.)
We can make the call a bit more convenient with a wrapper macro:
#define hexify(buf, var) hexify_data(buf, (char*)&var, sizeof(var))
Full example:
#include <string.h>
#include <stdint.h>
#include <stdio.h>
#define hexify(buf, var) hexify_data(buf, (char*)&var, sizeof(var))
char* hexify_data (char*restrict dst, const char*restrict src, size_t size)
{
const char NIBBLE_LOOKUP[0xF+1] = "0123456789ABCDEF";
char* d = dst;
for(size_t i=0; i<size; i++)
{
size_t byte = size - i - 1; // assuming little endian
*d = NIBBLE_LOOKUP[ (src[byte]&0xF0u)>>4 ];
d++;
*d = NIBBLE_LOOKUP[ (src[byte]&0x0Fu)>>0 ];
d++;
}
*d = '\0';
return dst;
}
int main (void)
{
char buf[50];
int32_t i32a = 0xABCD;
puts(hexify(buf, i32a));
int32_t i32b = 0xAAAABBBB;
puts(hexify(buf, i32b));
char c = 5;
puts(hexify(buf, c));
uint8_t u8 = 100;
puts(hexify(buf, u8));
}
Output:
0000ABCD
AAAABBBB
05
64
an optional solution is to use format string like printf
note that you can't return pointer to local variable, but you can get the buffer as argument, (here it is without boundaries check).
char* hexify(char* result, const char* format, void* arg)
{
int size = 0;
if(0 == strcmp(format,"%d") || 0 == strcmp(format,"%u"))
{
size=4;
sprintf(result,"%08x",arg);
}
else if(0 == strcmp(format,"%hd") || 0 == strcmp(format,"%hu"))
{
size=2;
sprintf(result,"%04x",arg);
}
else if(0 == strcmp(format,"%hhd")|| 0 == strcmp(format,"%hhu"))
{
size=1;
sprintf(result,"%02x",arg);
}
else if(0 == strcmp(format,"%lld") || 0 == strcmp(format,"%llu") )
{
size=8;
sprintf(result,"%016x",arg);
}
//printf("size=%d", size);
return result;
}
int main()
{
char result[256];
printf("%s", hexify(result,"%hhu", 1));
return 0;
}
I am trying to read a binary file into memory, and then use it like so:
struct myStruct {
std::string mystring; // is 40 bytes long
uint myint1; // is 4 bytes long
};
typedef unsigned char byte;
byte *filedata = ReadFile(filename); // reads file into memory, closes the file
myStruct aStruct;
aStruct.mystring = filedata.????
I need a way of accessing the binary file with an offset, and getting a certain length at that offset.
This is easy if I store the binary file data in a std::string, but i figured that using that to store binary data is not as good way of doing things. (filedata.substr(offset, len))
Reasonably extensive (IMO) searching hasn't turned anything relevant up, any ideas? I am willing to change storage type (e.g. to std::vector) if you think it is necessary.
If you're not going to use a serialization library, then I suggesting adding serialization support to each class:
struct My_Struct
{
std::string my_string;
unsigned int my_int;
void Load_From_Buffer(unsigned char const *& p_buffer)
{
my_string = std::string(p_buffer);
p_buffer += my_string.length() + 1; // +1 to account for the terminating nul character.
my_int = *((unsigned int *) p_buffer);
p_buffer += sizeof(my_int);
}
};
unsigned char * const buffer = ReadFile(filename);
unsigned char * p_buffer = buffer;
My_Struct my_variable;
my_variable.Load_From_Buffer(p_buffer);
Some other useful interface methods:
unsigned int Size_On_Stream(void) const; // Returns the size the object would occupy in the stream.
void Store_To_Buffer(unsigned char *& p_buffer); // Stores object to buffer, increments pointer.
With templates you can extend the serialization functionality:
void Load_From_Buffer(std::string& s, unsigned char *& p_buffer)
{
s = std::string((char *)p_buffer);
p_buffer += s.length() + 1;
}
void template<classtype T> Load_From_Buffer(T& object, unsigned char *& p_buffer)
{
object.Load_From_Buffer(p_buffer);
}
Edit 1: Reason not to write structure directly
In C and C++, the size of a structure may not be equal to the sum of the size of its members.
Compilers are allowed to insert padding, or unused space, between members so that the members are aligned on an address.
For example, a 32-bit processor likes to fetch things on 4 byte boundaries. Having one char in a structure followed by an int would make the int on relative address 1, which is not a multiple of 4. The compiler would pad the structure so that the int lines up on relative address 4.
Structures may contain pointers or items that contain pointers.
For example, the std::string type may have a size of 40, although the string may contain 3 characters or 300. It has a pointer to the actual data.
Endianess.
With multibyte integers some processors like the Most Significant Byte (MSB), a.k.a. Big Endian, first (the way humans read numbers) or the Least Significant Byte first, a.k.a. Little Endian. The Little Endian format takes less circuitry to read than the Big Endian.
Edit 2: Variant records
When outputting things like arrays and containers, you must decide whether you want to output the full container (include unused slots) or output only the items in the container. Outputting only the items in the container would use a variant record technique.
Two techniques for outputting variant records: quantity followed by items or items followed by a sentinel. The latter is how C-style strings are written, with the sentinel being a nul character.
The other technique is to output the quantity of items, followed by the items. So if I had 6 numbers, 0, 1, 2, 3, 4, 5, the output would be:
6 // The number of items
0
1
2
3
4
5
In the above Load_From_Buffer method, I would create a temporary to hold the quantity, write that out, then follow with each item from the container.
You could overload the std::ostream output operator and std::istream input operator for your structure, something like this:
struct Record {
std::string name;
int value;
};
std::istream& operator>>(std::istream& in, Record& record) {
char name[40] = { 0 };
int32_t value(0);
in.read(name, 40);
in.read(reinterpret_cast<char*>(&value), 4);
record.name.assign(name, 40);
record.value = value;
return in;
}
std::ostream& operator<<(std::ostream& out, const Record& record) {
std::string name(record.name);
name.resize(40, '\0');
out.write(name.c_str(), 40);
out.write(reinterpret_cast<const char*>(&record.value), 4);
return out;
}
int main(int argc, char **argv) {
const char* filename("records");
Record r[] = {{"zero", 0 }, {"one", 1 }, {"two", 2}};
int n(sizeof(r)/sizeof(r[0]));
std::ofstream out(filename, std::ios::binary);
for (int i = 0; i < n; ++i) {
out << r[i];
}
out.close();
std::ifstream in(filename, std::ios::binary);
std::vector<Record> rIn;
Record record;
while (in >> record) {
rIn.push_back(record);
}
for (std::vector<Record>::iterator i = rIn.begin(); i != rIn.end(); ++i){
std::cout << "name: " << i->name << ", value: " << i->value
<< std::endl;
}
return 0;
}
Can I use itoa() for converting long long int to a binary string?
I have seen various examples for conversion of int to binary using itoa. Is there a risk of overflow or perhaps loss of precision, if I use long long int?
Edit
Thanks all of you for replying. I achieved what I was trying to do. itoa() was not useful enough, as it does not support long long int. Moreover I can't use itoa() in gcc as it is not a standard library function.
To convert an integer to a string containing only binary digits, you can do it by checking each bit in the integer with a one-bit mask, and append it to the string.
Something like this:
std::string convert_to_binary_string(const unsigned long long int value,
bool skip_leading_zeroes = false)
{
std::string str;
bool found_first_one = false;
const int bits = sizeof(unsigned long long) * 8; // Number of bits in the type
for (int current_bit = bits - 1; current_bit >= 0; current_bit--)
{
if ((value & (1ULL << current_bit)) != 0)
{
if (!found_first_one)
found_first_one = true;
str += '1';
}
else
{
if (!skip_leading_zeroes || found_first_one)
str += '0';
}
}
return str;
}
Edit:
A more general way of doing it might be done with templates:
#include <type_traits>
#include <cassert>
template<typename T>
std::string convert_to_binary_string(const T value, bool skip_leading_zeroes = false)
{
// Make sure the type is an integer
static_assert(std::is_integral<T>::value, "Not integral type");
std::string str;
bool found_first_one = false;
const int bits = sizeof(T) * 8; // Number of bits in the type
for (int current_bit = bits - 1; current_bit >= 0; current_bit--)
{
if ((value & (1ULL << current_bit)) != 0)
{
if (!found_first_one)
found_first_one = true;
str += '1';
}
else
{
if (!skip_leading_zeroes || found_first_one)
str += '0';
}
}
return str;
}
Note: Both static_assert and std::is_integral is part of C++11, but is supported in both Visual C++ 2010 and GCC from at least 4.4.5.
Yes, you can. As you showed yourself, itoa can be called with base 2, which means binary.
#include <stdio.h>
#include <stdlib.h>
int main()
{
int i;
char str[33];
i = 37; /* Just some number. */
itoa (i, str, 2);
printf("binary: %s\n", str);
return 0;
}
Also, yes, there will be truncation if you use an integer type larger than int, since itoa() takes only plain "int" as a value. long long is on your compiler probably 64 bit while int is probably 32 bit, so the compiler would truncate the 64 bit value to a 32 bit value before conversion.
Your wording is a bit confusing,
normally if you state 'decimal' I would take that to mean: 'a number represented as a string of decimal digits', while you seem to mean 'integer'.
and with 'binary' I would take that to mean: 'a number represented as bytes - as directly usable by the CPU'.
a better way of phrasing your subject would be: converting 64-bit integer to string of binary digits.
some systems have a _i64toa function.
You can use std::bitset for this purpose
template<typename T>
inline std::string to_binary_string(const T value)
{
return std::bitset<sizeof(T)>(value).to_string();
}
std::cout << to_binary_string(10240);
std::cout << to_binary_string(123LL);
I want to build a function to easily convert a string containing hex code (eg. "0ae34e") into a string containing the equivalent ascii values and vice versa.
Do I have to cut the Hex string in pairs of 2 values and gue them together again or is there a convenient way to do that?
thanks
Based on binascii_unhexlify() function from Python:
#include <cctype> // is*
int to_int(int c) {
if (not isxdigit(c)) return -1; // error: non-hexadecimal digit found
if (isdigit(c)) return c - '0';
if (isupper(c)) c = tolower(c);
return c - 'a' + 10;
}
template<class InputIterator, class OutputIterator> int
unhexlify(InputIterator first, InputIterator last, OutputIterator ascii) {
while (first != last) {
int top = to_int(*first++);
int bot = to_int(*first++);
if (top == -1 or bot == -1)
return -1; // error
*ascii++ = (top << 4) + bot;
}
return 0;
}
Example
#include <iostream>
int main() {
char hex[] = "7B5a7D";
size_t len = sizeof(hex) - 1; // strlen
char ascii[len/2+1];
ascii[len/2] = '\0';
if (unhexlify(hex, hex+len, ascii) < 0) return 1; // error
std::cout << hex << " -> " << ascii << std::endl;
}
Output
7B5a7D -> {Z}
An interesting quote from the comments in the source code:
While I was reading dozens of programs that encode or decode the
formats here (documentation? hihi:-) I have formulated Jansen's
Observation:
Programs that encode binary data in ASCII are written in such a style
that they are as unreadable as possible. Devices used include
unnecessary global variables, burying important tables in unrelated
sourcefiles, putting functions in include files, using
seemingly-descriptive variable names for different purposes, calls to
empty subroutines and a host of others.
I have attempted to break with this tradition, but I guess that that
does make the performance sub-optimal. Oh well, too bad...
Jack Jansen, CWI, July 1995.
If you want to use a more c++ native way, you can say
std::string str = "0x00f34" // for example
stringstream ss(str);
ss << hex;
int n;
ss >> n;
The sprintf and sscanf functions can already do that for you. This code is an example that should give you an idea. Please go through the function references and the safe alternatives before you use them
#include <stdio.h>
int main()
{
int i;
char str[80]={0};
char input[80]="0x01F1";
int output;
/* convert a hex input to integer in string */
printf ("Hex number: ");
scanf ("%x",&i);
sprintf (str,"%d",i,i);
printf("%s\n",str);
/* convert input in hex to integer in string */
sscanf(input,"%x",&output);
printf("%d\n",output);
}
I am trying to store two integer value into an char array in C++.
Here is the code..
char data[20];
*data = static_cast <char> (time_delay); //time_delay is of int type
*(data + sizeof(int)) = static_cast<char> (wakeup_code); //wakeup_code is of int type
Now on the other end of the program, I want to reverse this operation. That is, from this char array, I need to obtain the values of time_delay and wakeup_code.
How can I do that??
Thanks,
Nick
P.S: I know this is a stupid way to do this, but trust me its a constraint.
I think when you write static_cast<char>, that value is converted to a 1-byte char, so if it didn't fit in a char to begin with, you'll lose data.
What I'd do is use *((int*)(data+sizeof(int))) and *((int*)(data+sizeof(int))) for both reading and writing ints to the array.
*((int*)(data+sizeof(int))) = wakeup_code;
....
wakeup_code = *((int*)(data+sizeof(int)));
Alternatively, you might also write:
reinterpret_cast<int*>(data)[0]=time_delay;
reinterpret_cast<int*>(data)[1]=wakeup_code;
If you are working on a PC x86 architecture then there are no alignment problems (except for speed) and you can cast a char * to an int * to do the conversions:
char data[20];
*((int *)data) = first_int;
*((int *)(data+sizeof(int))) = second_int;
and the same syntax can be used for reading from data by just swapping sides of =.
Note however that this code is not portable because there are architectures where an unaligned operation may be not just slow but actually illegal (crash).
In those cases probably the nicest approach (that also gives you endianness control in case data is part of a communication protocol between different systems) is to build the integers explicitly in code one char at a time:
first_uint = ((unsigned char)data[0] |
((unsigned char)data[1] << 8) |
((unsigned char)data[2] << 16) |
((unsigned char)data[3] << 24));
data[4] = second_uint & 255;
data[5] = (second_uint >> 8) & 255;
data[6] = (second_uint >> 16) & 255;
data[7] = (second_uint >> 24) & 255;
I haven't tried it, but the following should work:
char data[20];
int value;
memcpy(&value,data,sizeof(int));
Try the following:
union IntsToChars {
struct {
int time_delay;
int wakeup_value;
} Integers;
char Chars[20];
};
extern char* somebuffer;
void foo()
{
IntsToChars n2c;
n2c.Integers.time_delay = 1;
n2c.Integers.wakeup_value = 2;
memcpy(somebuffer,n2c.Chars,sizeof(n2c)); //an example of using the char array containing the integer data
//...
}
Using such union should eliminate the alignment problem, unless the data is passed to a machine with different architecture.
#include <sstream>
#include <string>
int main ( int argc, char **argv) {
char ch[10];
int i = 1234;
std::ostringstream oss;
oss << i;
strcpy(ch, oss.str().c_str());
int j = atoi(ch);
}