How would I put 4 chars into a single int? [closed] - c++

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
I am working on a replica GM buffer system in C++ to get familiar with bits and such, and it's not bad, but I've run into a problem. How do I push 4 different chars into an int? I am not the best at bitwise stuff, I've never used it. I've got no idea how to.
In the thing, I have an array of chars; of size byteArraySize and when I call the grab int function, it would take the bytes from bufferPointer + 4 to bufferPointer; backwards to grab the int properly.
I read a bit on bitshifting (lol), and I thought I could like bitshift every char's bits i to the right. I've just got no clue where to start.
Any help is greatly appreciated.

Pedantically, in pure standard C++14 or C++11, you probably cannot.
AFAIK, nothing forbids an hypothetical C++14 implementation to have all of char, short, unsigned short, int, unsigned int, long, unsigned long, long long, unsigned long long to be the same types (at least the same internal representation), and be all 64 bits, (or 96 bits, or 128 bits) and all of sizeof 1. The recent C and C++ standards mandate that long long has at least 64 bits.
IIRC, some weird C implementation above some Common Lisp is doing similar things.
But of course, there is no such C++14 implementation in practice.
In practice, on most implementations, char-s are 8 bits bytes (perhaps signed perhaps unsigned) and int-s are often 32 bits words (e.g. are std::int32_t), and you obviously could code
inline int pack4chars(char c1, char c2, char c3, char c4) {
return ((int)(((unsigned char)c1) << 24)
| (int)(((unsigned char)c2) << 16)
| (int)(((unsigned char)c3) << 8)
| (int)((unsigned char)c4));
}
The cast to (unsigned char) is needed because some implementations have signed char-s and others have unsigned ones.
Read also about endianness, serialization, htonl(3)

Yes, you can pack 4 chars (actually sizeof(int) chars) into an int. Here's how you could do it:
unsigned int packChars(unsigned char *c)
{
unsigned int val = u0;
for (size_t idx = 0; idx < sizeof(unsigned int); ++idx) {
val |= c[idx] << (idx * CHAR_BIT);
}
}
I'm using unsigned types, because bit shifting gets tricky when sign bits are involved. Also not that the code above is intentionally generic in the sizes used: sizeof(unsigned int) gives you the number of char units which fit into unsigned int, and CHAR_BIT specifies the number of bits in a char.

First of all you should be aware, that sizeof(int) does not have to be 4 *sizeof(char). Standard only guarantees, that sizeof(int) >= sizeof(char) and nothing more.
In fact int can be the same size with char's size (or bigger), But you never know, unless you find this out.

One possible solution is to use a union which have all its members aligned from the same offset in memory.
Example:
union Color
{
std::uint32_t m_rgba;
struct
{
std::uint8_t m_a;
std::uint8_t m_b;
std::uint8_t m_g;
std::uint8_t m_r;
};
};
Color white = { 0xffffffff };

Related

Portable conversion from unsigned to signed in C++

I have std::vector<unsigned short int> vals for which I need to invert the order of the bytes (assume 2) and store them as short int. I was doing it as:
std::vector<short int> result;
for(unsigned short int& x : vals) {
x = ((x << 8) | (x >> 8));
result.push_back(static_cast<short int>(x));
}
Reading online, I find that static_cast has implementation-defined behavior. I also found std::bit_cast, which preserves the bits and interprets them in the new type.
Does that mean that using std::bit_cast<short int>(x) above should be preferred over static_cast?
I tried, and both give the same results for me. Is it correct to assume that bit_cast will give the same results to anyone else using my code, while static_cast could give them different results?
If you want to handle bytes - you should not convert them to a non byte type (especially not short int and friends because they does not need to be 16 bits exactly).
You should read the bytes as an array of char or std::byte. Then you can swap those values in safe and portable manner.
Converting those bytes to a numeric type is not doable in portable way.

Narrowing conversion in C++

In Beej's Guide to Network Programming, there is a function that was meant to provide a portable way to serialize a 16-bit integer.
/*
** packi16() -- store a 16-bit int into a char buffer (like htons())
*/
void packi16(unsigned char *buf, unsigned int i)
{
*buf++ = i>>8; *buf++ = i;
}
I don't understand why the statement *buf++ = i; is portable, as the assignment of an unsigned integer (i) to an unsigned character (*buf) would result in a narrowing conversion.
Does the C++ standard guarantees that in such a conversion, the unsigned int is always truncated and its least significant 8 bits are retained in the unsigned char?
If not, is there any preferred way to fix the issue? Is it adequate to change the function body to the following?
*buf++ = (i>>8) & 0xFFFFU; *buf++ = i & 0xFFFFU;
The code assumes an 8-bit byte, and that is not portable.
E.g. some Texas Instruments digital signal processors have 16-bit byte.
The number of bits per byte is given by CHAR_BIT from <limits.h>.
Also, the code assumes that unsigned is 16 bits, which is not portable.
In summary the code is not portable.
Re
” Does the C++ standard guarantees that in such a conversion, the unsigned int is always truncated and its least significant 8 bits are retained in the unsigned char?
No, since the C++ standard does not guarantee that the number of bits per byte is 8.
The only guarantee is that it's at least 8 bits.
Unsigned arithmetic is guaranteed modular, however.
Re
” If not, is there any preferred way to fix the issue?
Use a simple loop, iterating sizeof(unsigned) times.
The code in question appears to have been distilled from such a loop, since the post-increment in *buf++ = i; is totally meaningless (this is the last use of buf).
Yes, out-of-range assignments to unsigned types adjust the value modulo one greater than the maximum value representable in the type. In this case, mod UCHAR_MAX+1.
No fix is required. Some people like to write *buf++ = i % 0x100; or equivalent, to make it clear that this was intentional narrowing.

Packing bools with bit field (C++)

I'm trying to interface with Ada code using C++, so I'm defining a struct using bit fields, so that all the data is in the same place in both languages. The following is not precisely what I'm doing, but outlines the problem. The following is also a console application in VS2008, but that's not super relevant.
using namespace System;
int main() {
int array1[2] = {0, 0};
int *array2 = new int[2]();
array2[0] = 0;
array2[1] = 0;
#pragma pack(1)
struct testStruct {
// Word 0 (desired)
unsigned a : 8;
unsigned b : 1;
bool c : 1;
unsigned d : 21;
bool e : 1;
// Word 1 (desired)
int f : 32;
// Words 2-3 (desired)
int g[2]; //Cannot assign bit field but takes 64 bits in my compiler
};
testStruct test;
Console::WriteLine("size of char: {0:D}", sizeof(char) * 8);
Console::WriteLine("size of short: {0:D}", sizeof(short) * 8);
Console::WriteLine("size of int: {0:D}", sizeof(int) * 8);
Console::WriteLine("size of unsigned: {0:D}", sizeof(unsigned) * 8);
Console::WriteLine("size of long: {0:D}", sizeof(long) * 8);
Console::WriteLine("size of long long: {0:D}", sizeof(long long) * 8);
Console::WriteLine("size of bool: {0:D}", sizeof(bool) * 8);
Console::WriteLine("size of int[2]: {0:D}", sizeof(array1) * 8);
Console::WriteLine("size of int*: {0:D}", sizeof(array2) * 8);
Console::WriteLine("size of testStruct: {0:D}", sizeof(testStruct) * 8);
Console::WriteLine("size of test: {0:D}", sizeof(test) * 8);
Console::ReadKey(true);
delete[] array2;
return 0;
}
(If it wasn't clear, in the real program, the basic idea is that the program gets a void* from something communicating with the Ada code and casts it to a testStruct* to access the data.)
With #pragma pack(1) commented out, the output is:
size of char: 8
size of short: 16
size of int: 32
size of unsigned: 32
size of long: 32
size of long long: 64
size of bool: 8
size of int[2]: 64
size of int*: 32
size of testStruct: 224
size of test: 224
Obviously 4 words (indexed 0-3) should be 448 = 32*4 = 128 bits, not 224. The other output lines were to help confirm the size of types under the VS2008 compiler.
With #pragma pack(1) uncommented, that number (on the last two lines of output) is reduced to 176, which is still greater than 128. It seems that the bools aren't being packed together with the unsigned ints in "Word 0".
Note: a&b, c, d, e, f, packaged in different words would be 5, +2 for the array = 7 words, times 32 bits = 224, the number we get with #pragma pack(1) commented out. If c and e (the bools) instead take up 8 bits each, as opposed to 32, we get 176, which is the number we get with #pragma pack(1) uncommented. It seems #pragma pack(1) is only allowing the bools to be packed into single bytes by themselves, instead of words, but not the bools with the unsigned ints at all.
So my question, in one sentence: Is there a way to force the compiler to pack a through e into one word? Related is this question: C++ bitfield packing with bools , but that doesn't answer my question; it only points out the behavior I'm trying to force to go away.
If there is literally no way to do this, does anyone have any ideas for workarounds? I'm at a loss, because:
I was asked to avoid changing the struct format that I'm copying (no re-ordering).
I don't want to change the bools to unsigned ints because it may cause problems down the road with constantly having to re-cast it to bool and maybe accidentally using the wrong version of an overloaded function, not to mention making the code more obscure for others who read it later.
I don't want to declare them as private unsigned ints then make public accessors or something because all other members of all other structs in the project are accessed directly without () afterward, so it would seem a bit hacky and obtuse, and one would almost NEED the IntelliSense or trial-and-error to remember which needs () and which doesn't.
I would like to avoid creating another struct type just for the data conversion (and e.g. make a constructor for testStruct that takes in a single testStructImport-type object) because the actual struct is very long with lots of bit-field-specified variables.
I recommend that you create a "normal" structure without any bit packing. Use default POD types for the members.
Create interface functions for loading the "normal" fields from a buffer (uint8_t), and storing to a buffer.
This will allow you to use the data members in a sane method in your program. The bit packing and unpacking will be handled by the interface function. The bit twiddling should use bitwise AND and bitwise OR functions and not rely on the bit field notation in a structure. This will allow you to adjust the bit twiddling and be more portable among compilers.
This is how I designed my protocol classes. And I don't have to worry about bit field positioning, Endianess or things of that sort.
Also, I can use block I/O for reading and writing the buffer.
Try packing in this way:
#pragma pack( push, 1 )
struct testStruct {
// Word 0 (desired)
unsigned a : 8;
unsigned b : 1;
unsigned c : 1;
unsigned d : 21;
unsigned e : 1;
// Word 1 (desired)
unsigned f : 32;
// Words 2-3 (desired)
unsigned g[2]; //Cannot assign bit field but takes 64 bits in my compiler
};
#pragma pack(pop)
There is no easy, elegant method without using accessors or an interface layer. Unfortunately, there is nothing like a #pragma thing to fix this. I ended up just converting the bools to unsigned int and renaming variables from e.g. f to f_flag or f_bool to encourage correct usage and make it clear what the variables contained. It's lower-effort than Thomas's solution, but not as robust, obviously, and still gets around some of the main drawbacks with any of the easier methods.
Years after I posted this question, user #WaltK added this comment to the linked, related question:
"If you want to have more control over the layout of bit field
structures in memory, consider using this bit field facility,
implemented as a library header file."

Bitwise operations. Is this code safe and portable?

I need to compute the Hamming distance between bitsets that are represented as char arrays. This is a core operation, so it must be as fast as possible. I have something like this:
const int N = 32; // 32 always
// returns the number of bits that are ones in a char
int countOnes_uchar8(unsigned char v);
// pa and pb point to arrays of N items
int hamming(const unsigned char *pa, const unsigned char *pb)
{
int ret = 0;
for(int i = 0; i < N; ++i, ++pa, ++pb)
{
ret += countOnes_uchar8(*pa ^ *pb);
}
return ret;
}
After profiling, I noticed that operating on ints is faster, so I wrote:
const int N = 32; // 32 always
// returns the number of bits that are ones in a int of 32 bits
int countOnes_int32(unsigned int v);
// pa and pb point to arrays of N items
int hamming(const unsigned char *pa, const unsigned char *pb)
{
const unsigned int *qa = reinterpret_cast<const unsigned int*>(pa);
const unsigned int *qb = reinterpret_cast<const unsigned int*>(pb);
int ret = 0;
for(int i = 0; i < N / sizeof(unsigned int); ++i, ++qa, ++qb)
{
ret += countOnes_int32(*qa ^ *qb);
}
return ret;
}
Questions
1) Is that cast from unsigned char * to unsigned int * safe?
2) I work on a 32-bit machine, but I would like the code to work on a 64-bit machine. Does sizeof(unsigned int) returns 4 in both machines, or is it 8 on a 64-bit one?
3) If sizeof(unsigned int) returned 4 in a 64-bit machine, how would I be able to operate on a 64-bit type, with long long?
Is that cast from unsigned char * to unsigned int * safe?
Formally, it gives undefined behaviour. Practically, it will work on just about any platform if the pointer is suitably aligned for unsigned int. On some platforms, it may fail, or perform poorly, if the alignment is wrong.
Does sizeof(unsigned int) returns 4 in both machines, or is it 8 on a 64-bit one?
It depends. Some platforms have 64-bit int, and some have 32-bit. It would probably make sense to use uint64_t regardless of platform; on a 32-bit platform, you'd effectively be unrolling the loop (processing two 32-bit values per iteration), which might give a modest improvement.
how would I be able to operate on a 64-bit type, with long long?
uint64_t, if you have a C++11 or C99 library. long long is at least 64 bits, but might not exist on a pre-2011 implementation.
1) No, it is not safe/portable, it is undefined behavior. There are systems where char is larger than one byte and there is no guarantee that the char pointer is properly aligned.
2) sizeof(int) might in theory be anything on a 64 bit machine. In practice, it will be either 4 or 8.
3) long long is most likely 64 bits but there are no guarantees there either. If you want guarantees, use uint64_t. However, for your specific algorithm I don't see why the sizeof() the data chunk would matter.
Consider using the types in stdint.h instead, they are far more suitable for portable code. Instead of char, int or long long, use uint_fast8_t. This will let the compiler pick the fastest integer for you, in a portable manner.
As a sidenote, you should consider implementing "countOnes" as a lookup table, working on 4, 8 or 32 bit level, depending on what is most optimal for your system. This will increase program size but reduce execution time. Maybe try to implement some form of adaptive lookup table which depends on sizeof(uint_fast8_t).

Forcing unaligned bitfield packing in MSVC

I've a struct of bitfields that add up to 48 bits. On GCC this correctly results in a 6 byte structure, but in MSVC the structure comes out 8 bytes. I need to find some way to force MSVC to pack the struct properly, both for interoperability and because it's being used in a memory-critical environment.
The struct seen below consists of three 15-bit numbers, one 2-bit number, and a 1-bit sign. 15+15+15+2+1 = 48, so in theory it should fit into six bytes, right?
struct S
{
unsigned short a:15;
unsigned short b:15;
unsigned short c:15;
unsigned short d:2;
unsigned short e:1;
};
However, compiling this on both GCC and MSVC results in sizeof(S) == 8. Thinking that this might have to do with alignment, I tried using #pragma pack(1) before the struct declaration, telling the compiler to back to byte, not int, boundaries. On GCC, this worked, resulting in sizeof(S) == 6.
However, on MSVC05, the sizeof still came out to 8, even with pack(1) set! After reading this other SO answer, I tried replacing unsigned short d with unsigned char and unsigned short e with bool. The result is sizeof(S) == 7!
I found that if I split d into two one-bit fields and wedged them in between the other members, the struct finally packed properly.
struct S
{
unsigned short a:15;
unsigned short dHi : 1;
unsigned short b:15;
unsigned short dLo : 1;
unsigned short c:15;
unsigned short e:1;
};
printf( "%d\n", sizeof(S) ); // "6"
But having d split like that is cumbersome and causes trouble for me later on when I have to work on the struct. Is there some way I can force MSVC to pack this struct into 6 bytes, exactly as GCC does?
It is implementation defined how fields will be placed in the structure. Visual Studio will fit consecutive bitfields into an underlying type, if it can, and waste the leftover space. (C++ Bit Fields in VS)
If you use the type "unsigned __int64" to declare all elements of the structure, you'll get an object with sizeof(S)=8, but the last two bytes will be unused and the first six will contain the data in the format you want.
Alternatively, if you can accept some structure reordering, this will work
#pragma pack(1)
struct S3
{
unsigned int a:15;
unsigned int b:15;
unsigned int d:2;
unsigned short c:15;
unsigned short e:1;
};
I don't think so, and I think it's MSVC's behavior that is actually correct, and GCC that deviates from the standard.
AFAIK, the standard does not permit bitfields to cross word boundaries of the underlying type.