How to write individual bytes to filein C++ - c++

GIven the fact that I generate a string containing "0" and "1" of a random length, how can I write the data to a file as bits instead of ascii text ?
Given my random string has 12 bits, I know that I should write 2 bytes (or add 4 more 0 bits to make 16 bits) in order to write the 1st byte and the 2nd byte.
Regardless of the size, given I have an array of char[8] or int[8] or a string, how can I write each individual group of bits as one byte in the output file?
I've googled a lot everywhere (it's my 3rd day looking for an answer) and didn't understand how to do it.
Thank you.

You don't do I/O with an array of bits.
Instead, you do two separate steps. First, convert your array of bits to a number. Then, do binary file I/O using that number.
For the first step, the types uint8_t and uint16_t found in <stdint.h> and the bit manipulation operators << (shift left) and | (or) will be useful.

You haven't said what API you're using, so I'm going to assume you're using I/O streams. To write data to the stream just do this:
f.write(buf, len);
You can't write single bits, the best granularity you are going to get is bytes. If you want bits you will have to do some bitwise work to your byte buffer before you write it.
If you want to pack your 8 element array of chars into one byte you can do something like this:
char data[8] = ...;
char byte = 0;
for (unsigned i = 0; i != 8; ++i)
{
byte |= (data[i] & 1) << i;
}
f.put(byte);
If data contains ASCII '0' or '1' characters rather than actual 0 or 1 bits replace the |= line with this:
byte |= (data[i] == '1') << i;

Make an unsigned char out of the bits in an array:
unsigned char make_byte(char input[8]) {
unsigned char result = 0;
for (int i=0; i<8; i++)
if (input[i] != '0')
result |= (1 << i);
return result;
}
This assumes input[0] should become the least significant bit in the byte, and input[7] the most significant.

Related

Bits to type from buffer

A file which contains the buffer value. The first 16 bits contain the type. The next 32 bits gives the length of the data. The remaining value in the data.
How can I find the type from the 16 bits (find if it is int or char...)
I'm super stuck in my though process here. Not able to find a way to convert bits to types.
Say you have the homework assignment:
You are given a file where the first bit encodes the type, the
next 7 bits encode the length, and the rest is the data.
The types are encoded in the following way:
0 is for int
1 is for char
Print the ints or chars separated by newlines.
You just use the given information! Since 1 bit is used to encode the type there are two possible types. So you just read the first bit, then do:
if (bit == 0) {
int *i = ...
}
else if (bit == 1) {
char *c = ...
}

Convert a 64bit integer to an array of 7bit-characters

Say I have a function vector<unsigned char> byteVector(long long UID), returning a byte presentation of the UID, a 64bit integer, as a vector. This vector is later on used to write this data to a file.
Now, because I decided I want to read that file with Python, I have to comply to the utf-8 standard, which means I can only use the first 7bits of each char. If the highest significant bit is 1 I can't decode it to a string anymore, because those are only supporting ASCII-characters. Also, I'll have to pass those strings to other processes via a Command Line Interface, which also is only supporting the ASCII-character set.
Before that problem arose, my approach on splitting the 64bit integer up into 8 separate bytes was the following, which worked really great:
vector<unsigned char> outputVector = vector<unsigned char>();
unsigned char * uidBytes = (unsigned char*) &UID_;
for (int i = 0; i < 8; i++){
outputVector.push_back(uidBytes[i]);
}
Of course that doesn't work anymore, as the constrain "HBit may not be 1" limits the maximum value of each unsigned char to 127.
My easiest option now would of course be to replace the one push_back call with this:
outputVector.push_back(uidBytes[i] / 128);
outputVector.push_back(uidBytes[i] % 128);
But this seems kind of wasteful, as the first of each unsigned char pair can only be 0 or 1 and I would be wasting some space (6 bytes) I could otherwise use.
As I need to save 64 bits, and can use 7 bits per byte, I'll need 64//7 + 64%7 = 10 bytes.
It isn't really much (none of the files I write ever even reached the 1kB mark), but I was using 8 bytes before and it seems a bit wasteful to use 16 now when ten (not 9, I'm sorry) would suffice. So:
How do I convert a 64bit integer to a vector of ten 7-bit integers?
This is probably too much optimization, but there could be some very cool solution for this problem (probably using shift operators) and I would be really interested in seeing it.
You can use bit shifts to take 7-bit pieces of the 64-bit integer. However, you need ten 7-bit integers, nine is not enough: 9 * 7 = 63, one bit short.
std::uint64_t uid = 42; // Your 64-bit input here.
std::vector<std::uint8_t> outputVector;
for (int i = 0; i < 10; i++)
{
outputVector.push_back(uid >> (i * 7) & 0x7f);
}
In every iteration, we shift the input bits by a multiple of 7, and mask out a 7-bit part. The most significant bit of the 8-bit numbers will be zero. Note that the numbers in the vector are “reversed”: the least significant bits have the lowest index. This is irrelevant though, if you decode the parts in the correct way. Decoding can be done as follows:
std::uint64_t decoded = 0;
for (int i = 0; i < 10; i++)
{
decoded |= static_cast<std::uint64_t>(outputVector[i]) << (i * 7);
}
Please note that it seems like a bad idea to interpret the resulting vector as UTF-8 encoded text: the sequence can still contain control characters and and \0. If you want to encode your 64-bit integer in printable characters, take a look at base64. In that case, you will need one more character (eleven in total) to encode 64 bits.
I suggest using assembly language.
Many assembly languages have instructions for shifting a bit into a "spare" carry bit and shifting the carry bit into a register. The C language has no convenient or efficient method to do this.
The algorithm:
for i = 0; i < 7; ++i
{
right shift 64-bit word into carry.
right shift carry into character.
}
You should also look into using std::bitset.

checksum code in C++

can someone please explain what this code is doing? i have to interpret this code and use it as a checksum code, but i am not sure if it is absolutely correct. Especially how the overflows are working and what *cp, const char* cp and sum & 0xFFFF mean? The basic idea was to take an input as string from user, convert it to binary form 16 bits at a time. Then sum all the multiple 16 bits together (in binary) and get a 16 bit sum. If there is any overflow bit in the addition, add that to lsb of final sum. Then take a ones complement of the result.
How close is this code to doing the above?
unsigned int packet::calculateChecksum()
{
unsigned int c = 0;
int i;
string j;
int k;
cout<< "enter a message" << message;
getline(cin, message) ; // Some string.
//std::string message =
std::vector<uint16_t> bitvec;
const char* cp = message.c_str()+1;
while (*cp) {
uint16_t bits = *(cp-1)>>8 + *(cp);
bitvec.push_back(bits);
cp += 2;
}
uint32_t sum=0;
uint16_t overflow=0;
uint32_t finalsum =0;
// Compute the sum. Let overflows accumulate in upper 16 bits.
for(auto j = bitvec.begin(); j != bitvec.end(); ++j)
sum += *j;
// Now fold the overflows into the lower 16 bits. Loop until no overflows.
do {
sum = (sum & 0xFFFF) + (sum >> 16);
} while (sum > 0xFFFF);
// Return the 1s complement sum in finalsum
finalsum = 0xFFFF & sum;
//cout<< "the finalsum is" << c;
c = finalsum;
return c;
}
I see several issues in the code:
cp is a pointer to zero ended char array holding the input message. The while(*cp) will have problem as inside the while loop body cp is incremented by 2!!! So it's fairly easy to skip the ending \0 of the char array (e.g. the input message has 2 characters) and result in a segmentation fault.
*(cp) and *(cp-1) fetch the two neighbouring characters (bytes) in the input message. But why the two-bytes word is formed by *(cp-1)>>8 + *(cp)? I think it would make sense to formed the 16bits word by *(cp-1)<<8 + *(cp) i.e. the preceding character sits on the higher byte and the following character sits on the lower byte of the 16bits word.
To answer your question sum & 0xFFFF just means calculating a number where the higher 16 bits are zero and the lower 16 bits are the same as in sum. the 0xFFFF is a bit mask.
The funny thing is, even the above code might not doing the exact thing you mentioned as requirement, as long as the sending and receiving party are using the same piece of incorrect code, your checksum creation and verification will pass, as both ends are consistent with each other:)

Printing integers as a set of 4 bytes arranged in little endian?

I have an array of 256 unsigned integers called frequencies[256] (one integer for each ascii value). My goal is to read through an input and for each character i increment the integer in the array that corresponds to it (for example the character 'A' will cause the frequencies[65] integer to increase by one) and when the input is over I must output each integer as 4 characters in little endian form.
So far I have made a loop that goes through the input and increases each corresponding integer in the array. But i am very confused on how to output each integer in little endian form. I understand that each byte of the four bytes of each integer should be output as a character (for instance the unsigned integer 1 in little endian is "00000001 00000000 00000000 00000000" which i would want to output as the 4 ascii characters that correspond to those bytes).
But how do i get at the binary representation of an unsigned integer in my code and how would i go about chopping it up and rearranging it?
Thanks for the help.
For hardware portability, please use the following solution:
int freqs[256];
for (int i = 0; i < 256; ++i)
printf("%02x %02x %02x %02x\n", (freqs[i] >> 0 ) & 0xFF
, (freqs[i] >> 8 ) & 0xFF
, (freqs[i] >> 16) & 0xFF
, (freqs[i] >> 24) & 0xFF);
You can use memcpy which copies a block of memory.
char tab[4] ;
memcpy(tab, frequencies+i, sizeof(int));
now, tab[0], tab[1], etc. will be your characters.
A program to swap from big to little endian: Little Endian - Big Endian Problem.
To understand if your system is little or big endian: https://stackoverflow.com/a/1024954/2436175.
Transform your chars/integers in a set of printable bits: https://stackoverflow.com/a/7349767/2436175
It's not really clear what you mean by "little endian" here.
Integers don't have endianness per se; endianness only comes
into play when you cut them up into smaller pieces. So which
smaller pieces to you mean: bytes or characters. If characters,
just convert in the normal way, and reverse the generated
string. If bytes (or any other smaller piece), each individual
byte can be represented as a function of the int: i & 0xFF
calculates the low order byte, (i >> 8) & 0xFF the next
lowest, and so forth. (If the bytes aren't 8 bits, then change
the shift value and the mask correspondingly.)
And with regards to your second paragraph: a single byte of an
int doesn't necessarily correspond to a character, regardless
of the encodig. For the four bytes you show, for example, none
of them corresponds to a character in any of the usual
encodings.
With regards to the last paragraph: to get the binary
representation of an unsigned integer, use the same algorithm
that you would use for any representation:
std::string
asText( unsigned int value, int base, int minDigits = 1 )
{
static std::string digits( "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ" );
assert( base >= 2 && base <= digits.size() );
std::string results;
while ( value != 0 || minDigits > 0 ) {
results += digits[ value % base ];
value /= base;
-- minDigits;
}
// results is now little endian. For the normal big-endian
std::reverse( results.begin(), results.end() );
return results;
}
Called with base equal to 2, this will give you your binary
representation.

How can i get utf-8 char number from binary in c++?

For example, I have: 11100011 10000010 10100010. It is the binary of: ア;
its number in UTF-8 is:12450
How can I get this number from binary?
The byte sequence you're showing is the UTF-8 encoded version of the character.
You need to decode the UTF-8 to get to the Unicode code point.
For this exact sequence of bytes, the following bits make up the code point:
11100011 10000010 10100010
**** ****** ******
So, concatenating the asterisked bits we get the number 0011000010100010, which equals 0x30a2 or 12450 in decimal.
See the Wikipedia description for details on how to interpret the encoding.
In a nutshell: if bit 7 is set in the first byte, the number of adjacent bits (call it m) that are also set (2) gives the number of bytes that follow for this code point. The number of bits to extract from each byte is (8 - 1 - 1 - m) for the first byte, and 6 bits from each subsequent byte. So here we got (8 - 1 - 1 - 2) = 4 + 2 * 6 = 16 bits.
As pointed out in comments, there are plenty of libraries for this, so you might not need to implement it yourself.
working from the wikipedia page, I came up with this:
unsigned utf8_to_codepoint(const char* ptr) {
if( *ptr < 0x80) return *ptr;
if( *ptr < 0xC0) throw unicode_error("invalid utf8 lead byte");
unsigned result=0;
int shift=0;
if( *ptr < 0xE0) {result=*ptr&0x1F; shift=1;}
if( *ptr < 0xF0) {result=*ptr&0x0F; shift=2;}
if( *ptr < 0xF8) {result=*ptr&0x07; shift=3;}
for(; shift>0; --shift) {
++ptr;
if (*ptr<0x7F || *ptr>=0xC0)
throw unicode_error("invalid utf8 continuation byte");
result <<= 6;
result |= *ptr&0x6F;
}
return result;
}
Note that this is a very poor implementation (I highly doubt it even compiles), and parses a lot of invalid values that it probably shouldn't. I put this up merely to show that it's a lot harder than you'd think, and that you should use a good unicode library.