c bitfields strange behaviour with long int in struct - c++

i am observing strange behaviour when i run the following code.
i create a bitfield by using a struct, where i want to use 52 bits, so i use long int.
The size of long int is 64 bits on my system, i check it inside the code.
Somehow when i try to set one bit, it alwas sets two bits. one of them is the one i wanted to set and the second one is the index of the first one plus 32.
Cann anybody tell me, why is that?
#include <stdio.h>
typedef struct foo {
long int x:52;
long int:12;
};
int main(){
struct foo test;
int index=0;
printf("%ld\n",sizeof(test));
while(index<64){
if(test.x & (1<<index))
printf("%i\n",index);
index++;
}
test.x=1;
index=0;
while(index<64){
if(test.x & (1<<index))
printf("%i\n",index);
index++;
}
return 0;
}
Sry forgot to post the output, so my question was basicly not understandable...
The Output it gives me is the following:
8
0
32

index is of type int, which is probably 32 bits on your system. Shifting a value by an amount greater than or equal to the number of bits in its type has undefined behavior.
Change index to unsigned long (bit-shifting signed types is ill-advised). Or you can change 1<<index to 1L << index, or even 1LL << index.
As others have pointed out, test is uninitialized. You can initialize it to all zeros like this:
struct foo test = { 0 };
The correct printf format for size_t is %zu, not %ld.
And it wouldn't be a bad idea to modify your code so it doesn't depend on the non-portable assumption that long is 64 bits. It can be as narrow as 32 bits. Consider using the uint_N_t types defined in <stdint.h>.
I should also mention that bit fields of types other than int, unsigned int, signed int, and _Bool (or bool) are implementation-defined.

You have undefined behavior in your code, as you check the bits in text.x without initializing the structure. Because you don't initialize the variable, it will contain random data.

Related

Cast char to int by zero extending in C++

I want to write a function
int char_to_int(char c);
that converts given char to int by zero extending the value. So if the char has N bits and int has M bits, M >= N, then the M-N most significant bits of the int value should be zero and the N least significant bits of the int value should match the bits of the char value.
This seems like a simple task, but I'm not sure how to write it relying only on standard behavior. No UB, no implementation-defined behavior. Without relying on char being 8 bit, int being 32 bit, char being unsigned and any other common assumptions I make that are not guaranteed by standard.
The reason I want to know this, is that I have done this conversion several times in the past, but recently I became aware about the limited guarantees C++ gives about it's data types. So now I'm curious what is the correct, standard compliant approach.
I don't suppose
return (int) c;
is good enough, is it?
There's no hurt in being extra clear:
return int((unsigned char)c);
That way you tell the compiler exactly what you want: the int that contains the char value, read as unsigned. So char 255 will become int 255.

Why doesn't this snippet of code print the number normally?

Why do I get two different results? Unsigned long is big enough to handle such number, and it can't be an overflow of some kind, right?
I am deliberately trying to make it show in decimal form, but it just doesn't work.
What could be the reason?
#include <iostream>
using namespace std;
void Print(unsigned long num)
{
cout<<dec<<num<<endl;
}
int main()
{
Print(9110865112);
cout<<dec<<9110865112;
return 0;
}
Edit
It outputs:
520930520
9110865112
unsigned long is not always sufficiently large. With 32 bits it can occupy integers from 0 up to and including 232-1, which is about four billions. 9'110'865'112 is nine billions and would thus not fit into unsigned long.
Try outputting sizeof unsigned long and see what you get.
Also, consider your output: 9110865112 mod 232 is 520930520, which basically proves that unsigned long is 32 bit large on your machine.
The problem is that the numeric literal that you specify is too large to fit in an unsigned long.
When you use the literal directly, the compiler treats it as long long, and chooses the proper overload for operator <<.
To fix this problem, use unsigned long long in the signature of the Print function:
void Print(unsigned long long num)
{
cout<<dec<<num<<endl;
}
Demo.
Because 9,110,865,112 is greater than 32 bits, the method is only accepting 32 of the bits even though you're trying to pass it more.
To fix this, you should use an unsigned long long data type for you num parameter. When you print it directly written as a constant, the code prints out find because the compiler says that constant is an unsigned long long, but when you pass it as an unsigned long, the compiler says that constant should be an unsigned long. Because it's not an unsigned long, it drops some of the bits. (I'm suprised your compiler didn't print out a warning.)
As a reference, an unsigned long can hold values between 0 and 4,294,967,295 (inclusive). Any value great than this should be assigned a larger data type. An unsigned long long can hold values between 0 and 18,446,744,073,709,551,615 (inclusive).
It is worth noting that frequently the data types uint32_t and uint64_t are used in place of unsigned long and unsigned long long respectively. The u denotes that the number is unsigned (if the u is left out, the number is assumed to be signed). The number (64 and 32 in this case) states how many bytes the number should have. And _t at the end just indicates that this is a data type. So (u)int#_t is a common way to write numeric data types; # can be 8, 16, 32, or 64 in standard C++ depending on the number of bits you need.
To summarize: You're throwing a number that's too large at the function. You need to change your function's parameters to support this number:
void Print(uint64_t num){
cout << dec << num << endl;
}

arrays + unions + structs containing bit fields C++

I was just playing around with bit fields and came across something that I can't quite figure out how to get around.
(Note about the platform: size of an int = 2bytes, long = 4bytes, long long = 8bytes - thought it worth mentioning as I know it can vary. Also the 'byte' type is defined as an 'unsigned char')
I would like to be able to make an array of two 36 bit variables and put them into a union with an array of 9 bytes. This is what I came up with:
typedef union {
byte bytes[9];
struct {
unsigned long long data:36;
} integers[2];
} Colour;
I was working on the theory that the compiler would realise there was supposed to be two bitfields as part of the anonymous struct and put them together into the space of 9 bytes. However it turns out that they get aligned at a byte boundary so the union occupies 10 bytes not 9, which makes perfect sense.
The question is then, is there a way to create an array of two bit fields like this? I considered the 'packed' attribute, but the compiler just ignores it.
While this works as expected (sizeof() returns 9):
typedef union {
byte bytes[9];
struct {
unsigned long long data0:36;
unsigned long long data1:36;
} integers;
} Colour;
It would be preferable to have it accessible as an array.
Edit:
Thanks to cdhowie for his explanation of why this won't work.
Fortunately I thought of a way to achieve what I want:
typedef union {
byte bytes[9];
struct {
unsigned long long data0:36;
unsigned long long data1:36;
unsigned long long data(byte which){
return (which?data1:data0);
}
void data(byte which, unsigned long long _data){
if(which){
data1 = _data;
} else {
data0 = _data;
}
}
} integers;
} Colour;
You can't directly do this using arrays, if you want each bitfield to be exactly 36 bits wide.
Pointers must be aligned to byte boundaries, that's just the way pointers are. Since arrays function like pointers in most cases (with exceptions), this is just not possible with bitfields that contain a number of bits not evenly divisible by 8. (What would you expect &(((Colour *) 0)->integers[1]) to return if the bitfields were packed? What value would make sense?)
In your second example, the bitfields can be tightly-packed because there is no pointer math going on under the hood. For things to be addressable by pointer, they must fall on a byte boundary, since bytes are the units used to "measure" pointers.
You will note that if you try to take the address of (((Colour *) 0)->integers.data0) or data1 in the second example, the compiler will issue an error, for exactly this reason.

can anyone explain why size_t type is used with an example?

I was wondering why this size_t is used where I can use say int type. Its said that size_t is a return type of sizeof operator. What does it mean? like if I use sizeof(int) and store what its return to an int type variable, then it also works, it's not necessary to store it in a size_t type variable. I just clearly want to know the basic concept of using size_t with a clearly understandable example.Thanks
size_t is guaranteed to be able to represent the largest size possible, int is not. This means size_t is more portable.
For instance, what if int could only store up to 255 but you could allocate arrays of 5000 bytes? Clearly this wouldn't work, however with size_t it will.
The simplest example is pretty dated: on an old 16-bit-int system with 64 k of RAM, the value of an int can be anywhere from -32768 to +32767, but after:
char buf[40960];
the buffer buf occupies 40 kbytes, so sizeof buf is too big to fit in an int, and it needs an unsigned int.
The same thing can happen today if you use 32-bit int but allow programs to access more than 4 GB of RAM at a time, as is the case on what are called "I32LP64" models (32 bit int, 64-bit long and pointer). Here the type size_t will have the same range as unsigned long.
You use size_t mostly for casting pointers into unsigned integers of the same size, to perform calculations on pointers as if they were integers, that would otherwise be prevented at compile time. Such code is intended to compile and build correctly in the context of different pointer sizes, e.g. 32-bit model versus 64-bit.
It is implementation defined but on 64bit systems you will find that size_t is often 64bit while int is still 32bit (unless it's ILP64 or SILP64 model).
depending on what architecture you are on (16-bit, 32-bit or 64-bit) an int could be a different size.
if you want a specific size I use uint16_t or uint32_t .... You can check out this thread for more information
What does the C++ standard state the size of int, long type to be?
size_t is a typedef defined to store object size. It can store the maximum object size that is supported by a target platform. This makes it portable.
For example:
void * memcpy(void * destination, const void * source, size_t num);
memcpy() copies num bytes from source into destination. The maximum number of bytes that can be copied depends on the platform. So, making num as type size_t makes memcpy portable.
Refer https://stackoverflow.com/a/7706240/2820412 for further details.
size_t is a typedef for one of the fundamental unsigned integer types. It could be unsigned int, unsigned long, or unsigned long long depending on the implementation.
Its special property is that it can represent the size of (in bytes) of any object (which includes the largest object possible as well!). That is one of the reasons it is widely used in the standard library for array indexing and loop counting (that also solves the portability issue). Let me illustrate this with a simple example.
Consider a vector of length 2*UINT_MAX, where UINT_MAX denotes the maximum value of unsigned int (which is 4294967295 for my implementation considering 4 bytes for unsigned int).
std::vector vec(2*UINT_MAX,0);
If you would want to fill the vector using a for-loop such as this, it would not work because unsigned int can iterate only upto the point UINT_MAX (beyond which it will start again from 0).
for(unsigned int i = 0; i<2*UINT_MAX; ++i) vec[i] = i;
The solution here is to use size_t since it is guaranteed to represent the size of any object (and therefore our vector vec too!) in bytes. Note that for my implementation size_t is a typedef for unsigned long and therefore its max value = ULONG_MAX = 18446744073709551615 considering 8 bytes.
for(size_t i = 0; i<2*UINT_MAX; ++i) vec[i] = i;
References: https://en.cppreference.com/w/cpp/types/size_t

Forcing unaligned bitfield packing in MSVC

I've a struct of bitfields that add up to 48 bits. On GCC this correctly results in a 6 byte structure, but in MSVC the structure comes out 8 bytes. I need to find some way to force MSVC to pack the struct properly, both for interoperability and because it's being used in a memory-critical environment.
The struct seen below consists of three 15-bit numbers, one 2-bit number, and a 1-bit sign. 15+15+15+2+1 = 48, so in theory it should fit into six bytes, right?
struct S
{
unsigned short a:15;
unsigned short b:15;
unsigned short c:15;
unsigned short d:2;
unsigned short e:1;
};
However, compiling this on both GCC and MSVC results in sizeof(S) == 8. Thinking that this might have to do with alignment, I tried using #pragma pack(1) before the struct declaration, telling the compiler to back to byte, not int, boundaries. On GCC, this worked, resulting in sizeof(S) == 6.
However, on MSVC05, the sizeof still came out to 8, even with pack(1) set! After reading this other SO answer, I tried replacing unsigned short d with unsigned char and unsigned short e with bool. The result is sizeof(S) == 7!
I found that if I split d into two one-bit fields and wedged them in between the other members, the struct finally packed properly.
struct S
{
unsigned short a:15;
unsigned short dHi : 1;
unsigned short b:15;
unsigned short dLo : 1;
unsigned short c:15;
unsigned short e:1;
};
printf( "%d\n", sizeof(S) ); // "6"
But having d split like that is cumbersome and causes trouble for me later on when I have to work on the struct. Is there some way I can force MSVC to pack this struct into 6 bytes, exactly as GCC does?
It is implementation defined how fields will be placed in the structure. Visual Studio will fit consecutive bitfields into an underlying type, if it can, and waste the leftover space. (C++ Bit Fields in VS)
If you use the type "unsigned __int64" to declare all elements of the structure, you'll get an object with sizeof(S)=8, but the last two bytes will be unused and the first six will contain the data in the format you want.
Alternatively, if you can accept some structure reordering, this will work
#pragma pack(1)
struct S3
{
unsigned int a:15;
unsigned int b:15;
unsigned int d:2;
unsigned short c:15;
unsigned short e:1;
};
I don't think so, and I think it's MSVC's behavior that is actually correct, and GCC that deviates from the standard.
AFAIK, the standard does not permit bitfields to cross word boundaries of the underlying type.