Good evening,
I'm new to C++ and encountered a problem that I wasn't able to solve despite reading numerous pages here. I've got a file with hexvalues that need to be read and compressed, then written in a new file. An example sequence looks like this:
C9 CB FF 01 06 (each byte [8 bit] represent a number)
Compression starts with the first number, then only writing the difference to the next number (differences are a nibble [4 bit]). Example from C9 to CB: difference = 2. If the difference is greater than 7, thus can't be represented by a nibble, we use a 0x8 to mark a new start. 0xFF-0xCB > 7 so the sequence would look like this (entire compressed code):
C9 28 FF 15 (mixture of entire bytes (0xC9 and 0xFF) representing numbers and nibbles representing differences to the next number. Now to my problem. I'm using fstream and put to write bytes to a new file, nibbles are stored to combine with an other nibble to a byte which can be written to the file. However it only works with bytes smaller than 128 so I can't write values greater than 0x7F into a file. I prepared a file with notepad++ starting with the value 0xFF - reading that value works great but dest.put(source.get()); doesn't in that specific case. How can I work with (signed) nibbles [for negative differences] and binary presentations of numbers in C++? By the way using negative numbers in file.put() results in strange behavior as 2 bytes are written rather than one. Here's my code, I hope you understand my problem and I really appreciate your help
int lastValue = s.get();
d.put((char)lastValue);
char highNibble = 0;
bool nibbleSet = false;
int diff = 0;
for (int c = s.get(); c != -1; c = s.get()) {
diff = (char)((unsigned char)c - (unsigned char)lastValue);
if (abs(diff) > 7) {
if (nibbleSet) {
d.put(highNibble << 4 | 8);
d.put((char)c);
nibbleSet = false;
}
else {
cout << (8 << 4 | (c & 0xF0) >> 4) << endl;
d.put(8 << 4 | (c & 0xF0) >> 4);
highNibble = c & 0x0F;
nibbleSet = true;
}
}
else {
if (nibbleSet) {
d.put(((char)highNibble << 4) & 0xF0 | ((char)diff) & 0x0F);
nibbleSet = false;
}
else {
highNibble = (char)diff;
nibbleSet = true;
}
}
lastValue = c;
}
Related
What would be the fastest way possible to reverse the nibbles (e.g digits) of a hexadecimal number in C++?
Here's an example of what I mean : 0x12345 -> 0x54321
Here's what I already have:
unsigned int rotation (unsigned int hex) {
unsigned int result = 0;
while (hex) {
result = (result << 4) | (hex & 0xF);
hex >>= 4;
}
return result;
}
This problem can be split into two parts:
Reverse the nibbles of an integer. Reverse the bytes, and swap the nibble within each byte.
Shift the reversed result right by some amount to adjust for the "variable length". There are std::countl_zero(x) & -4 (number of leading zeroes, rounded down to a multiple of 4) leading zero bits that are part of the leading zeroes in hexadecimal, shifting right by that amount makes them not participate in the reversal.
For example, using some of the new functions from <bit>:
#include <stdint.h>
#include <bit>
uint32_t reverse_nibbles(uint32_t x) {
// reverse bytes
uint32_t r = std::byteswap(x);
// swap adjacent nibbles
r = ((r & 0x0F0F0F0F) << 4) | ((r >> 4) & 0x0F0F0F0F);
// adjust for variable-length of input
int len_of_zero_prefix = std::countl_zero(x) & -4;
return r >> len_of_zero_prefix;
}
That requires C++23 for std::byteswap which may be a bit optimistic, you can substitute it with some other byteswap.
Easily adaptable to uint64_t too.
i would do it without loops based on the assumption that the input is 32 bits
result = (hex & 0x0000000f) << 28
| (hex & 0x000000f0) << 20
| (hex & 0x00000f00) << 12
....
dont know if faster, but I find it more readable
I can read for example , 4 bytes from file using
ifstream r(filename , ios::binary | ios::in)
uint_32 readHere;
r.read( (char*)&readHere, 4 )
But how could i read 4.5 bytes = 4bytes and 4 bits.
What came to my mind is
ifstream r(filename , ios::binary | std::in)
uint_64t readHere;
r.read( (char*)&readHere, 5 ) // reading 5 bytes ;
uint_64t tmp = readHere & 11111111 // extract 5th bytes
tmp = tmp >> 4 // get first half of the bites
readHere = (( readHere >> 8 ) << 8) | tmp // remove 5th byte then add 4 bits
But im not sure how shouldi take half of byte , if first or last 4.
Is there some better way how to retrieve it?
The smallest unit that you can read or write be it in file, or in memory is a char (a byte on common systems (*)). You can browse longer element byte wise, and effectively endianness matters here.
uint32_t u = 0xaabbccdd;
char *p = static_cast<char *>(&u);
char c = p[0]; // c is 0xdd on a little endian system and 0xaa on a big endian one
But as soon as you are inside a byte all you can do is to use bitwise ands and shifts to extract the low order or high order bits. There is no longer endianness here except if you decide to use one convention.
BTW, if you read on a network interface or even on a serial line where bits are individually transfered, you get one full byte at a time, and there is no way to read only 4 bits on one read and the 4 others on next one.
(*) older systems (CDC in the 80's) used to have 6bits per character - but C++ did not exist at that time and I'm unsure whether C compilers existed there
It's still not clear whether this is a file format that you control, or if it's something else. Anyway, let's assume you have some integer data type that can hold a 36-bit unsigned value:
typedef uint64_t u36;
Now, regardless of whether your system uses big-endian or little-endian, you can write the value to a binary stream in a predictable order by doing them one byte at a time. Let's use big-endian, because it's slightly easier to picture the bits assembling together to create a value.
You can just use naive shifting and masking into a small buffer. The only thing to decide is where to truncate the half-byte. But if you follow the pattern of shifting each value by another 8 bits, then the remainder naturally falls in the high-order.
ostream & write_u36( ostream & s, u36 val )
{
char bytes[5] = {
(val >> 28) & 0xff,
(val >> 20) & 0xff,
(val >> 12) & 0xff,
(val >> 4 ) & 0xff,
(val << 4 ) & 0xf0
};
return s.write( bytes, 5 );
}
But this isn't how you'd actually write a bunch of these numbers. You'd have to hold off the 5th byte until you were finished or you could pack the next value into it. Or you would always write two values at a time:
ostream & write_u36_pair( ostream & s, u36 a, u36 b )
{
char bytes[9] = {
(a >> 28) & 0xff,
(a >> 20) & 0xff,
(a >> 12) & 0xff,
(a >> 4 ) & 0xff,
(a << 4 ) & 0xf0 | (b >> 32) & 0x0f,
(b >> 24) & 0xff,
(b >> 16) & 0xff,
(b >> 8) & 0xff,
b & 0xff
};
return s.write( bytes, 9 );
}
And so now, you might see how to go about reading values and deserialising them back into integers. The simplest way is to read two at a time.
istream & read_u36_pair( istream & s, u36 & a, u36 & b )
{
char bytes[9];
if( s.read( bytes, 9 ) )
{
a = (u36)bytes[0] << 28
| (u36)bytes[1] << 20
| (u36)bytes[2] << 12
| (u36)bytes[3] << 4
| (u36)bytes[4] >> 4;
b = ((u36)bytes[4] & 0x0f) << 32
| (u36)bytes[5] << 24
| (u36)bytes[6] << 16
| (u36)bytes[7] << 8
| (u36)bytes[8];
}
return s;
}
If you wanted to read them one at a time, you'd need to keep track of some state so you knew how many bytes to read (either 5 or 4), and which shift operations to apply. Something naive like this:
struct u36deser {
char bytes[5];
int which = 0;
};
istream & read_u36( istream & s, u36deser & state, u36 & val )
{
if( state.which == 0 && s.read( state.bytes, 5 ) )
{
val = (u36)state.bytes[0] << 28
| (u36)state.bytes[1] << 20
| (u36)state.bytes[2] << 12
| (u36)state.bytes[3] << 4
| (u36)state.bytes[4] >> 4;
state.which = 1;
}
else if( state.which == 1 && s.read( state.bytes, 4 ) )
{
val = ((u36)state.bytes[4] & 0x0f) << 32 // byte left over from previous call
| (u36)state.bytes[0] << 24
| (u36)state.bytes[1] << 16
| (u36)state.bytes[2] << 8
| (u36)state.bytes[3];
state.which = 0;
}
return s;
}
All of this is purely hypothetical, which seems to be the point of your question anyway. There are many other ways to serialise bits, and some of them are not at all obvious.
I am reading a bytes from file. For this example, I read two bytes (represented in hexa)
94 and 73. How can I put these two bytes together, for them to look like
9470 ?
I can use 73 >> 4 to make 70 out of 73 But how can i "put" them together?
I tried using (94 << 8) & ( 73 >> 4 ) but it always returns 0.
I have found nothing about working with bytes like this. (Basicly reading one and half byte in this example), reading 2 bytes at once
code example
uint64_t bytes;
output.read( (char *)&bytes, 2 ); // read 2 bytes
uint64_t tmp = ( cutIt << ( 64 - 8) ) >> ( 64 - 8) ;
uint64_t tmp_two = (( cutIt >> 8) & 11110000 ) >> 4;
uint64_t tmp_three = (tmp << 8) & tmp_two ;
((94 << 8)+74) & (FFF0)
will give you the output you want. for this you need to think binary.
((10010100 <<8) + 01110100) & (1111111111110000)
the 4 zeroes at the end will zero out your LSB thanks to the logical AND and maintain your word legth.
To answer the commentqustion: you simply chose the nuber of bits you want to use by changing the ammount of zeroes. For your example this would mean the number you use for the logical AND would be FFFC in hex or in binary
1111111111111100.
byte b1 = 0xAB;
byte b2 = 0xCD;
...
short s = (short)(b1<<8) | ((short)(b2<<4) & 0xF0);
//s = ABC0
Use or(|) instead of and (&) to merge the shifted values together otherwise always 0.
Okay, I have the following problem: I have a set of 8 (unsigned) numbers that are all 17bit (a.k.a. none of them are any bigger than 131071). Since 17bit numbers are annoying work work with (keeping them in a 32-bit int is a waste of space), I would like to turn these into 17 8-bit numbers, like so:
If I have these 8 17-bit integers:
[25409, 23885, 24721, 23159, 25409, 23885, 24721, 23159]
I would turn them into a base 2 representationL
["00110001101000001", "00101110101001101", "00110000010010001", "00101101001110111", "00110001101000001", "00101110101001101", "00110000010010001", "00101101001110111"]
Then join that into one big string:
"0011000110100000100101110101001101001100000100100010010110100111011100110001101000001001011101010011010011000001001000100101101001110111"
Then split that into 17 strings, each with 8 chars:
["00110001", "10100000", "10010111", "01010011", "01001100", "00010010", "00100101", "10100111", "01110011", "00011010", "00001001", "01110101", "00110100", "11000001", "00100010", "01011010", "01110111"]
And, finally, convert the binary representations back into integers
[49, 160, 151, 83, 76, 18, 37, 167, 115, 26, 9, 117, 52, 193, 34, 90, 119]
This method works, but it's not very efficient, I am looking for something more efficient than this, preferrably coded in C++, since that's the language I am working with. I just can't think of any way to do this more efficient, and 17-bit numbers aren't exactly easy to work with (16-bit numbers would be much nicer to work with).
Thanks in advance, xfbs
Store the lowest 16 bits of each number as-is (i.e. in two bytes). This leaves the most significant bit of each number. Since there are eight such numbers, simply combine the eight bits into one extra byte.
This will require exactly the same amount of memory as your method, but will involve a lot less bit twiddling.
P.S. Regardless of the storage method, you should be using bit-manipulation operators (<<, >>, &, | and so on) to do the job; there should not be any intermediate string-based representations involved.
Have a look at std::bitset<N>. May be you can stuff them into that?
Efficiently? Then don't use string conversions, bitfields, etc. Manage to do shifts yourself to achieve that. (Note that the arrays must be unsigned so that we don't encounter problems when shifting).
uint32 A[8]; //Your input, unsigned int
ubyte B[17]; //Output, unsigned byte
B[0] = (ubyte)A[0];
B[1] = (ubyte)(A[0] >> 8);
B[2] = (ubyte)A[1];
B[3] = (ubyte)(A[1] >> 8);
.
:
And for the last one, we do what ajx said. We take the most significant digit of each number (shifting them 16 bits to the right leaves the 17th digit) and fill the bits of our output by shifting each of the most significant digits from 0 to 7 to the left:
B[16] = (A[0] >> 16) | ((A[1] >> 16) << 1) | ((A[2] >> 16) << 2) | ((A[3] >> 16) << 3) | ... | ((A[7] >> 16) << 7);
Well, "efficient" was this. Other easier methods exist, too.
Though you say they are 17-bit numbers, they must be stored into an array of 32bit integers, where only the less significant 17 bits are used. You can extract from the first directly two bytes (dst[0] = src[0] >> 9 is the first, dst[1] = (src[0] >> 1) & 0xff the second); then you "push" the first bit as the 18th bit of the second, so that
dst[2] = (src[0] & 1) << 7 | src[1] >> 10;
dst[3] = (src[1] >> 2) & 0xff;
if you generalize it, you will see that this "formula" may be applied
dst[2*i] = src[i] >> (9+i) | (src[i-1] & BITS(i)) << (8-i);
dst[2*i + 1] = (src[i] >> (i+1)) & 0xff;
and for the last one: dst[16] = src[7] & 0xff;.
The whole code could look like
dst[0] = src[0] >> 9;
dst[1] = (src[0] >> 1) & 0xff;
for(i = 1; i < 8; i++)
{
dst[2*i] = src[i] >> (9+i) | (src[i-1] & BITS(i)) << (8-i);
dst[2*i + 1] = (src[i] >> (i+1)) & 0xff;
}
dst[16] = src[7] & 0xff;
Likely analysing better the loops, optimizations can be done so that we don't need to treat in a special manner the cases on the boundaries. The BITS macro create a mask of N bits set to 1 (least significant bits). Something like (to be checked for a better way, if any)
#define BITS(I) (~((~0)<<(I)))
ADD
Here I supposed src is e.g. int32_t and dst int8_t or alike.
This is in C, so you can use vector instead.
#define srcLength 8
#define destLength 17
int src[srcLength] = { 25409, 23885, 24721, 23159, 25409, 23885, 24721, 23159 };
unsigned char dest[destLength] = { 0 };
int srcElement = 0;
int bits = 0;
int i = 0;
int j = 0;
do {
while( bits >= srcLength ) {
dest[i++] = srcElement >> (bits - srcLength);
srcElement = srcElement & ((1 << bits) - 1);
bits -= srcLength;
}
if( j < srcLength ) {
srcElement <<= destLength;
bits += destLength;
srcElement |= src[j++];
}
} while (bits > 0);
Disclaimer: if you literally have seventeen integers (and not 100000 groups by 17), you should forget these optimizations as long as your program doesn't run veeery slowly.
I'd probably go about it this way. I don't want to deal with weird types when I'm doing my processing. Maybe I need to store them in some funky formatting due to legacy problems though. The values that are hard-coded should probably be based off of the 17 value, just didn't bother.
struct int_block {
static const uint32 w = 17;
static const uint32 m = 131071;
int_block() : data(151, 0) {} // w * 8 + (sizeof(uint32) - w)
uint32 get(size_t i) const {
uint32 retval = *reinterpret_cast<const uint32 *>( &data[i*w] );
retval &= m;
return retval;
}
void set(size_t i, uint32 val) {
uint32 prev = *reinterpret_cast<const uint32 *>( &data[i*w] );
prev &= ~m;
val |= prev;
*reinterpret_cast<uint32 *>( &data[i*w] ) = val;
}
std::vector<char> data;
};
TEST(int_block_test) {
int_block ib;
for (uint32 i = 0; i < 8; i++)
ib.set(i, i+25);
for (uint32 i = 0; i < 8; i++)
CHECK_EQUAL(i+25, ib.get(i));
}
You'd be able to break this by giving it bad values, but I'll leave that as an exercise for the reader. :))
Quite honestly, I think you'd be happier off representing them as 32-bit integers and just writing conversion functions. But I suspect you don't have control over that.
I have run into an interesting problem lately:
Lets say I have an array of bytes (uint8_t to be exact) of length at least one. Now i need a function that will get a subsequence of bits from this array, starting with bit X (zero based index, inclusive) and having length L and will return this as an uint32_t. If L is smaller than 32 the remaining high bits should be zero.
Although this is not very hard to solve, my current thoughts on how to do this seem a bit cumbersome to me. I'm thinking of a table of all the possible masks for a given byte (start with bit 0-7, take 1-8 bits) and then construct the number one byte at a time using this table.
Can somebody come up with a nicer solution? Note that i cannot use Boost or STL for this - and no, it is not a homework, its a problem i run into at work and we do not use Boost or STL in the code where this thing goes. You can assume that: 0 < L <= 32 and that the byte array is large enough to hold the subsequence.
One example of correct input/output:
array: 00110011 1010 1010 11110011 01 101100
subsequence: X = 12 (zero based index), L = 14
resulting uint32_t = 00000000 00000000 00 101011 11001101
Only the first and last bytes in the subsequence will involve some bit slicing to get the required bits out, while the intermediate bytes can be shifted in whole into the result. Here's some sample code, absolutely untested -- it does what I described, but some of the bit indices could be off by one:
uint8_t bytes[];
int X, L;
uint32_t result;
int startByte = X / 8, /* starting byte number */
startBit = 7 - X % 8, /* bit index within starting byte, from LSB */
endByte = (X + L) / 8, /* ending byte number */
endBit = 7 - (X + L) % 8; /* bit index within ending byte, from LSB */
/* Special case where start and end are within same byte:
just get bits from startBit to endBit */
if (startByte == endByte) {
uint8_t byte = bytes[startByte];
result = (byte >> endBit) & ((1 << (startBit - endBit)) - 1);
}
/* All other cases: get ending bits of starting byte,
all other bytes in between,
starting bits of ending byte */
else {
uint8_t byte = bytes[startByte];
result = byte & ((1 << startBit) - 1);
for (int i = startByte + 1; i < endByte; i++)
result = (result << 8) | bytes[i];
byte = bytes[endByte];
result = (result << (8 - endBit)) | (byte >> endBit);
}
Take a look at std::bitset and boost::dynamic_bitset.
I would be thinking something like loading a uint64_t with a cast and then shifting left and right to lose the uninteresting bits.
uint32_t extract_bits(uint8_t* bytes, int start, int count)
{
int shiftleft = 32+start;
int shiftright = 64-count;
uint64_t *ptr = (uint64_t*)(bytes);
uint64_t hold = *ptr;
hold <<= shiftleft;
hold >>= shiftright;
return (uint32_t)hold;
}
For the sake of completness, i'am adding my solution inspired by the comments and answers here. Thanks to all who bothered to think about the problem.
static const uint8_t firstByteMasks[8] = { 0xFF, 0x7F, 0x3F, 0x1F, 0x0F, 0x07, 0x03, 0x01 };
uint32_t getBits( const uint8_t *buf, const uint32_t bitoff, const uint32_t len, const uint32_t bitcount )
{
uint64_t result = 0;
int32_t startByte = bitoff / 8; // starting byte number
int32_t endByte = ((bitoff + bitcount) - 1) / 8; // ending byte number
int32_t rightShift = 16 - ((bitoff + bitcount) % 8 );
if ( endByte >= len ) return -1;
if ( rightShift == 16 ) rightShift = 8;
result = buf[startByte] & firstByteMasks[bitoff % 8];
result = result << 8;
for ( int32_t i = startByte + 1; i <= endByte; i++ )
{
result |= buf[i];
result = result << 8;
}
result = result >> rightShift;
return (uint32_t)result;
}
Few notes: i tested the code and it seems to work just fine, however, there may be bugs. If i find any, i will update the code here. Also, there are probably better solutions!