Reading bits from file

Reading bits from file - c++

I can read for example , 4 bytes from file using
ifstream r(filename , ios::binary | ios::in)
uint_32 readHere;
r.read( (char*)&readHere, 4 )
But how could i read 4.5 bytes = 4bytes and 4 bits.
What came to my mind is
ifstream r(filename , ios::binary | std::in)
uint_64t readHere;
r.read( (char*)&readHere, 5 ) // reading 5 bytes ;
uint_64t tmp = readHere & 11111111 // extract 5th bytes
tmp = tmp >> 4 // get first half of the bites
readHere = (( readHere >> 8 ) << 8) | tmp // remove 5th byte then add 4 bits
But im not sure how shouldi take half of byte , if first or last 4.
Is there some better way how to retrieve it?

The smallest unit that you can read or write be it in file, or in memory is a char (a byte on common systems (*)). You can browse longer element byte wise, and effectively endianness matters here.
uint32_t u = 0xaabbccdd;
char *p = static_cast<char *>(&u);
char c = p[0]; // c is 0xdd on a little endian system and 0xaa on a big endian one
But as soon as you are inside a byte all you can do is to use bitwise ands and shifts to extract the low order or high order bits. There is no longer endianness here except if you decide to use one convention.
BTW, if you read on a network interface or even on a serial line where bits are individually transfered, you get one full byte at a time, and there is no way to read only 4 bits on one read and the 4 others on next one.
(*) older systems (CDC in the 80's) used to have 6bits per character - but C++ did not exist at that time and I'm unsure whether C compilers existed there

It's still not clear whether this is a file format that you control, or if it's something else. Anyway, let's assume you have some integer data type that can hold a 36-bit unsigned value:
typedef uint64_t u36;
Now, regardless of whether your system uses big-endian or little-endian, you can write the value to a binary stream in a predictable order by doing them one byte at a time. Let's use big-endian, because it's slightly easier to picture the bits assembling together to create a value.
You can just use naive shifting and masking into a small buffer. The only thing to decide is where to truncate the half-byte. But if you follow the pattern of shifting each value by another 8 bits, then the remainder naturally falls in the high-order.
ostream & write_u36( ostream & s, u36 val )
{
char bytes[5] = {
(val >> 28) & 0xff,
(val >> 20) & 0xff,
(val >> 12) & 0xff,
(val >> 4 ) & 0xff,
(val << 4 ) & 0xf0
};
return s.write( bytes, 5 );
}
But this isn't how you'd actually write a bunch of these numbers. You'd have to hold off the 5th byte until you were finished or you could pack the next value into it. Or you would always write two values at a time:
ostream & write_u36_pair( ostream & s, u36 a, u36 b )
{
char bytes[9] = {
(a >> 28) & 0xff,
(a >> 20) & 0xff,
(a >> 12) & 0xff,
(a >> 4 ) & 0xff,
(a << 4 ) & 0xf0 | (b >> 32) & 0x0f,
(b >> 24) & 0xff,
(b >> 16) & 0xff,
(b >> 8) & 0xff,
b & 0xff
};
return s.write( bytes, 9 );
}
And so now, you might see how to go about reading values and deserialising them back into integers. The simplest way is to read two at a time.
istream & read_u36_pair( istream & s, u36 & a, u36 & b )
{
char bytes[9];
if( s.read( bytes, 9 ) )
{
a = (u36)bytes[0] << 28
| (u36)bytes[1] << 20
| (u36)bytes[2] << 12
| (u36)bytes[3] << 4
| (u36)bytes[4] >> 4;
b = ((u36)bytes[4] & 0x0f) << 32
| (u36)bytes[5] << 24
| (u36)bytes[6] << 16
| (u36)bytes[7] << 8
| (u36)bytes[8];
}
return s;
}
If you wanted to read them one at a time, you'd need to keep track of some state so you knew how many bytes to read (either 5 or 4), and which shift operations to apply. Something naive like this:
struct u36deser {
char bytes[5];
int which = 0;
};
istream & read_u36( istream & s, u36deser & state, u36 & val )
{
if( state.which == 0 && s.read( state.bytes, 5 ) )
{
val = (u36)state.bytes[0] << 28
| (u36)state.bytes[1] << 20
| (u36)state.bytes[2] << 12
| (u36)state.bytes[3] << 4
| (u36)state.bytes[4] >> 4;
state.which = 1;
}
else if( state.which == 1 && s.read( state.bytes, 4 ) )
{
val = ((u36)state.bytes[4] & 0x0f) << 32 // byte left over from previous call
| (u36)state.bytes[0] << 24
| (u36)state.bytes[1] << 16
| (u36)state.bytes[2] << 8
| (u36)state.bytes[3];
state.which = 0;
}
return s;
}
All of this is purely hypothetical, which seems to be the point of your question anyway. There are many other ways to serialise bits, and some of them are not at all obvious.

Related

Split parts of a uint32_t hex value into smaller parts in C++

I have a uint32_t as follows:
uint32_t midiData=0x9FCC00;
I need to separate this uint32_t into smaller parts so that 9 becomes its own entity, F becomes its own entity, and CC becomes its own entity. If you're wondering what I am doing, I am trying to break up the parts of a MIDI message so that they are easier to manage in my program.
I found this solution, but the problem is I don't know how to apply it to the CC section, and that I am not sure that this method works with C++.
Here is what I have so far:
uint32_t midiData=0x9FCC00;
uint32_t status = 0x0FFFFF & midiData; // Retrieve 9
uint32_t channel = (0xF0FFFF & midiData)>>4; //Retrieve F
uint32_t note = (0xFF00FF & midiData) >> 8; //Retrieve CC
Is this correct for C++? Reason I ask is cause I have never used C++ before and its syntax of using the > and < has always confused me (thus why I tend to avoid it).

You can use bit shift operator >> and bit masking operator & in C++ as well.
There are, however, some issues on how you use it:
Operator v1 & v2 gives a number built from those bits that are set in both v1 and v2, such that, for example, 0x12 & 0xF0 gives 0x10, not 0x02. Further, bit shift operator takes the number of bits, and a single digit in a hex number (which is usually called a nibble), consists of 4 bits (0x0..0xF requires 4 bits). So, if you have 0x12 and want to get 0x01, you have to write 0x12 >>4.
Hence, your shifts need to be adapted, too:
#define BITS_OF_A_NIBBLE 4
unsigned char status = (midiData & 0x00F00000) >> (5*BITS_OF_A_NIBBLE);
unsigned char channel = (midiData & 0x000F0000) >> (4*BITS_OF_A_NIBBLE);
unsigned char note = (midiData & 0x0000FF00) >> (2*BITS_OF_A_NIBBLE);
unsigned char theRest = (midiData & 0x000000FF);

You have it backwards, in a way.
In boolean logic (the & is a bitwise-AND), ANDing something with 0 will exclude it. Knowing that F in hex is 1111 in binary, a line like 0x9FCC00 & 0x0FFFFF will give you all the hex digits EXCEPT the 9, the opposite of what you want.
So, for status:
uint32_t status = 0xF000000 & midiData; // Retrieve 9
Actually, this will give you 0x900000. If you want 0x9 (also 9 in decimal), you need to bitshift the result over.
Now, the right bitshift operator (say, X >> 4) means move X 4 bits to the right; dividing by 16. That is 4 bits, not 4 hex digits. 1 hex digit == 4 bits, so to get 9 from 0x900000, you need 0x900000 >> 20.
So, to put them together, to get a status of 9:
uint32_t status = (0xF000000 & midiData) >> 20;
A similar process will get you the remaining values you want.

In general I'd recommend shift first, then mask - it's less error prone:
uint8_t cmd = (midiData >> 16) & 0xff;
uint8_t note = (midiData >> 8) & 0x7f; // MSB can't be set
uint8_t velocity = (midiData >> 0) & 0x7f; // ditto
and then split the cmd variable:
uint8_t status = (cmd & 0xf0); // range 0x00 .. 0xf0
uint8_t channel = (cmd & 0x0f); // range 0 .. 15
I personally wouldn't bother mapping the status value back into the range 0 .. 15 - it's commonly understood that e.g. 0x90 is a "note on", and not the plain value 9.

C++ write nibbles in file

Good evening,
I'm new to C++ and encountered a problem that I wasn't able to solve despite reading numerous pages here. I've got a file with hexvalues that need to be read and compressed, then written in a new file. An example sequence looks like this:
C9 CB FF 01 06 (each byte [8 bit] represent a number)
Compression starts with the first number, then only writing the difference to the next number (differences are a nibble [4 bit]). Example from C9 to CB: difference = 2. If the difference is greater than 7, thus can't be represented by a nibble, we use a 0x8 to mark a new start. 0xFF-0xCB > 7 so the sequence would look like this (entire compressed code):
C9 28 FF 15 (mixture of entire bytes (0xC9 and 0xFF) representing numbers and nibbles representing differences to the next number. Now to my problem. I'm using fstream and put to write bytes to a new file, nibbles are stored to combine with an other nibble to a byte which can be written to the file. However it only works with bytes smaller than 128 so I can't write values greater than 0x7F into a file. I prepared a file with notepad++ starting with the value 0xFF - reading that value works great but dest.put(source.get()); doesn't in that specific case. How can I work with (signed) nibbles [for negative differences] and binary presentations of numbers in C++? By the way using negative numbers in file.put() results in strange behavior as 2 bytes are written rather than one. Here's my code, I hope you understand my problem and I really appreciate your help
int lastValue = s.get();
d.put((char)lastValue);
char highNibble = 0;
bool nibbleSet = false;
int diff = 0;
for (int c = s.get(); c != -1; c = s.get()) {
diff = (char)((unsigned char)c - (unsigned char)lastValue);
if (abs(diff) > 7) {
if (nibbleSet) {
d.put(highNibble << 4 | 8);
d.put((char)c);
nibbleSet = false;
}
else {
cout << (8 << 4 | (c & 0xF0) >> 4) << endl;
d.put(8 << 4 | (c & 0xF0) >> 4);
highNibble = c & 0x0F;
nibbleSet = true;
}
}
else {
if (nibbleSet) {
d.put(((char)highNibble << 4) & 0xF0 | ((char)diff) & 0x0F);
nibbleSet = false;
}
else {
highNibble = (char)diff;
nibbleSet = true;
}
}
lastValue = c;
}

putting two bytes together

I am reading a bytes from file. For this example, I read two bytes (represented in hexa)
94 and 73. How can I put these two bytes together, for them to look like
9470 ?
I can use 73 >> 4 to make 70 out of 73 But how can i "put" them together?
I tried using (94 << 8) & ( 73 >> 4 ) but it always returns 0.
I have found nothing about working with bytes like this. (Basicly reading one and half byte in this example), reading 2 bytes at once
code example
uint64_t bytes;
output.read( (char *)&bytes, 2 ); // read 2 bytes
uint64_t tmp = ( cutIt << ( 64 - 8) ) >> ( 64 - 8) ;
uint64_t tmp_two = (( cutIt >> 8) & 11110000 ) >> 4;
uint64_t tmp_three = (tmp << 8) & tmp_two ;

((94 << 8)+74) & (FFF0)
will give you the output you want. for this you need to think binary.
((10010100 <<8) + 01110100) & (1111111111110000)
the 4 zeroes at the end will zero out your LSB thanks to the logical AND and maintain your word legth.
To answer the commentqustion: you simply chose the nuber of bits you want to use by changing the ammount of zeroes. For your example this would mean the number you use for the logical AND would be FFFC in hex or in binary
1111111111111100.

byte b1 = 0xAB;
byte b2 = 0xCD;
...
short s = (short)(b1<<8) | ((short)(b2<<4) & 0xF0);
//s = ABC0
Use or(|) instead of and (&) to merge the shifted values together otherwise always 0.

Bit shifts and their logical operators

This program below moves the last (junior) and the penultimate bytes variable i type int. I'm trying to understand why the programmer wrote this
i = (i & LEADING_TWO_BYTES_MASK) | ((i & PENULTIMATE_BYTE_MASK) >> 8) | ((i & LAST_BYTE_MASK) << 8);
Can anyone explain to me in plain English whats going on in the program below.
#include <stdio.h>
#include <cstdlib>
#define LAST_BYTE_MASK 255 //11111111
#define PENULTIMATE_BYTE_MASK 65280 //1111111100000000
#define LEADING_TWO_BYTES_MASK 4294901760 //11111111111111110000000000000000
int main(){
unsigned int i = 0;
printf("i = ");
scanf("%d", &i);
i = (i & LEADING_TWO_BYTES_MASK) | ((i & PENULTIMATE_BYTE_MASK) >> 8) | ((i & LAST_BYTE_MASK) << 8);
printf("i = %d", i);
system("pause");
}

Since you asked for plain english: He swaps the first and second bytes of an integer.

The expression is indeed a bit convoluted but in essence the author does this:
// Mask out relevant bytes
unsigned higher_order_bytes = i & LEADING_TWO_BYTES_MASK;
unsigned first_byte = i & LAST_BYTE_MASK;
unsigned second_byte = i & PENULTIMATE_BYTE_MASK;
// Switch positions:
unsigned first_to_second = first_byte << 8;
unsigned second_to_first = second_byte >> 8;
// Concatenate back together:
unsigned result = higher_order_bytes | first_to_second | second_to_first;
Incidentally, defining the masks using hexadecimal notation is more readable than using decimal. Furthermore, using #define here is misguided. Both C and C++ have const:
unsigned const LEADING_TWO_BYTES_MASK = 0xFFFF0000;
unsigned const PENULTIMATE_BYTE_MASK = 0xFF00;
unsigned const LAST_BYTE_MASK = 0xFF;
To understand this code you need to know what &, | and bit shifts are doing on the bit level.

It's more instructive to define your masks in hexadecimal rather than decimal, because then they correspond directly to the binary representations and it's easy to see which bits are on and off:
#define LAST 0xFF // all bits in the first byte are 1
#define PEN 0xFF00 // all bits in the second byte are 1
#define LEAD 0xFFFF0000 // all bits in the third and fourth bytes are 1
Then
i = (i & LEAD) // leave the first 2 bytes of the 32-bit integer the same
| ((i & PEN) >> 8) // take the 3rd byte and shift it 8 bits right
| ((i & LAST) << 8) // take the 4th byte and shift it 8 bits left
);
So the expression is swapping the two least significant bytes while leaving the two most significant bytes the same.

C Bitwise Operation Question

Can someone help me understand whats going on with this code. It looks like it is making an integer from an array of bits. Im not sure how its doing that. Why is there a bitwise & operation on OxFF? Inst this just going to produce the same result?
//first take the first 4 bytes read out of the socket into an array and
//make them a 32 bit integer
long ltemp =0;
long ltemp2 = 0;
ltemp = ltemp | (unsigned char)(analog_val_ptr[0] & 0xff);
ltemp = ltemp << 24;
ltemp2 = ltemp2 | (unsigned char)(analog_val_ptr[1] & 0xff);
ltemp2 = ltemp2 << 16;
ltemp = ltemp2 | ltemp;
ltemp2 =0;
ltemp2 = ltemp2 | (unsigned char)(analog_val_ptr[2] & 0xff);
ltemp2 = ltemp2 << 8;
ltemp = ltemp2 | ltemp;
ltemp = ltemp | (unsigned char)(analog_val_ptr[3] & 0xff);
///then convert that integer into a float, passing

That's a very long-winded way of just converting four 8-bit bytes into a 32-bit long.
The anding with 0xff is just ensuring that only the lower 8 bits of each value are used (0xff == binary 11111111).
The bit-shifting (in multiples of 8) is just to get each character into the right position.
The whole thing could be replaced with something like:
unsigned long ltemp = (unsigned char)(analog_val_ptr[0] & 0xff);
ltemp = (ltemp << 8) | (unsigned char)(analog_val_ptr[1] & 0xff);
ltemp = (ltemp << 8) | (unsigned char)(analog_val_ptr[2] & 0xff);
ltemp = (ltemp << 8) | (unsigned char)(analog_val_ptr[3] & 0xff);
Or, alternatively (and assuming they're available), use the correct tools for the job, specifically htonl() and ntohl().

It looks like it's building an integer from an array of bytes. It may be that analog_val_ptr[] is an array of int or short values, and this code is designed to treat each entry as a byte. The masking is to prevent the sign bit from flooding the destination variable.

looks like it is going for an endian independent conversion.

var = 0x ? ? ? ? ? ? ? ?
& & & & & & & &
0x 0 0 0 0 0 0 f f
------------------
0 0 0 0 0 0 ? ?
After the AND operation the lower 8 bits will be found with var & 0xff. Its a way to only cut out the needed portion, masking.
The code above simply pastes the lower bytes of 4 array elements into the variable ltemp as a long int.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Reading bits from file - c++

Related

Split parts of a uint32_t hex value into smaller parts in C++

C++ write nibbles in file

putting two bytes together

Bit shifts and their logical operators

C Bitwise Operation Question

Categories

Resources