Characters in structures

Characters in structures - c++

A quick question : What is the meaning of char c:4 in the structure given below
struct s
{
char c:4;
}
Thanks in advance.

This is a bit field consisting of a four-bit portion of a char. You can define more bit fields to subdivide a larger type into "nibblets", like this:
struct s
{
char c:4;
char d:2;
char e:2;
};
This struct defines three fields, all "packed" into a single char. The c field can hold sixteen distinct values; fields d and e can hold four values each.

The char c:4 means that it is a char sized variable of which 4 bits are the variable c. Put another way, c does not refer to the entire 8 bit char memory space, only four bits of it. This is a mechanism for bit-packing flags and uncommon data sizes into a structure.

Bit-fields are a quirky feature of the C language whose exact specifications and behavior have been somewhat inconsistent over the years; later versions of the C standard have nailed down some aspects of the behavior enough to prevent compilers from doing some useful things, but not enough to make them portable in any useful fashion.
If two or more consecutive members of a structure have the same underlying integer type, and have their field name followed by a colon and a number, the compiler will attempt to pack each one after the first into the same storage element as the one before. If it cannot do so, it will advance to the next storage element.
Thus, on a machine where unsigned short is sixteen bits, a declaration:
struct {
unsigned short a1 : 5;
unsigned short a2 : 5;
unsigned short a3 : 7;
unsigned short a4 : 5;
unsigned short a5 : 4;
unsigned short a6 : 6;
}
the compiler will pack a1 and a2 into a 16-bit integer; since a3 won't fit in the same integer as a1 and a2 (the total would be 17 bits), it will start a new one. Although a4 would bit with a1 and a2, the fact that a3 is in the way means that it will be put with a3. Then a5 will be placed with a3 and a4, filling up the second spot. A6 will be given a 16-bit spot all its own.
Note that if the underlying type of all those structure elements had been a 32-bit unsigned int, everything could have been packed into one such item rather than three 16-bit ones.
Frankly, I really dislike the rules around bitfields. They rules are sufficiently vague that the underlying representation one compiler generates for bitfield data may not be readable by another, but they're sufficiently specific that compilers are forbidden from generating what could otherwise be useful data structures (e.g. it should be possible to pack four six-bit numbers into a 3-byte (24-bit) structure, but unless a compiler happens to have a 24-bit integer type there is way to request that). Still, the rules are what they are.

Related

Multiple adjacent bit fields in C++

I saw multiple adjacent bit fields while browsing cppreference.
unsigned char b1 : 3, : 2, b2 : 6, b3 : 2;
So,
What is the purpose of it?
When and where should I use it?

Obviously, to consume less memory working with bitwise operations. That may be significant for embedded programming, for example.
You can also use std::vector<bool> which can (and usually does) have bitfield implementation.

Multiple adjacent bit fields are usually packed together (although this behavior is implementation-defined)
If you want compiler to not add padding or perform structure alignment during multiple bit field allocation you can compile them in a single variable.
struct x
{
unsigned char b1 : 4; // compiler will add padding after this. adding to the structure size.
unsigned char b2 : 3; // compiler will add padding here too! sizeof(x) is 2.
}
struct y
{
unsigned char b1 : 4, : 3; // padding will be added only once. sizeof(y) is 1
}
Or if you want to allocate bigger bit field in a single variable
struct x
{
unsigned char b1 :9; //warning: width of 'x::b1' exceeds its type
};
struct y
{
unsigned char b1 :6, :3; //no warning
};

According to c++draft :
3 A memory location is either an object of scalar type or a maximal
sequence of adjacent bit-fields all having nonzero
width.[ Note: Various features of the language, such as references and
virtual functions, might involve additional memory locations that are
not accessible to programs but are managed by the implementation.— end
note ]Two or more threads of execution ([intro.multithread]) can
access separate memory locations without interfering with each other.

One reason is to reduce memory consumption when you don't need full integer type range. It only matters if you have large arrays.
struct Bytes
{
unsigned char b1; // Eight bits, so values 0-255
unsigned char b2;
};
Bytes b[1000000000]; // sizeof(b) == 2000000000
struct Bitfields
{
unsigned char b1 : 4; // Four bits, so only accommodates values 0-15
unsigned char b2 : 4;
};
Bitfields f[1000000000]; // sizeof(f) == 1000000000
The other reason is matching binary layout of certain hardware or file formats. Such code is not portable.

How are the values of fundamental C/C++ types physically stored? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I am playing with state-space, which requires very effective storage of explored states. That means, I need to store multiple information in a variable that's as small as possible.
Let's have a very simple example: Imagine I'd like to store two coordinates. I can create a struct of two integers, but each integer is (correct me if I am wrong) 32b. But neither of my coordinates is greater than 15. With zero, it is 16 = 2^4 distinct values, which means I only need 8b store both of them. So with some bitwise operator magic, I am able to store these two values inside a single char:
unsigned int x, y; // initialized!!!!!
char my_code = (x << 4) | y;
Of course, this code will work only, if x and y are stored in "straight-code" (I am not sure about this name. It is simple binary representation of number, from biggest bit 2^n to 2^1 )
So my question is: which binary codes are used to store which fundamental C/C++ variables?
EDIT: Premature optimalization? No. My current task is small and it is preparation for bigger problem, where I need to store 4 coordinates from 0 to 7. Thoose coordinates are positions on board 8x8. So I need to keep track of many many unique combinations - because state-space searching is based on generating new states, which weren't already explored.
There is no possible way of storing multiple ints and using custom comparator function and set. For big problems like this, my memory would bleed and keeping track of what I already visited wouldn't be nice either. Bitset with size of possible combinations is probably the best way. (You might say, that problem, that I described is too big for bitset that large, but there is neat trick to handle it, UNIMPORTANT for this question.) So, I need some sort of "hash", which can be created many ways - using modular arithmetics (one type of answers) or bitwise operations. Complexity between these two solutions aren't much different for todays computer to matter. Because I am curious, I wanted to use more exotic second way. But for that to work, I needed to know, how are numbers stored at binary level - if there is some verid coding, which would my idea make absolutely unusable.
My question wasn't about size of variables either - those are well documented.

Of course, this code will work only, if x and y are stored in "straight-code"
I'm guessing the term you're looking for is endianness. However, regardless of the endianness of your system, (x << 4) | y gives you the same value. Math is endian-agnostic. Math is just math. The only difference is what the memory layout is - and for a single byte, even that doesn't matter.
We can work through an example. Let's say x is 0x0A0B0C0D and y is 0x01020304. If your system is big-endian, that means the memory layout is:
x : 0A 0B 0C 0D
y : 01 02 03 04
x << 4 : A0 B0 C0 D0
(x << 4) | y : A1 B2 C3 D4
to char : D4
If it was little-endian:
x : 0D 0C 0B 0A
y : 04 03 02 01
x << 4 : D0 C0 B0 A0
(x << 4) | y : D4 C3 B2 A1
to char : D4
Either way, 0xD4.
Although, one thing you do have to worry about is the actual conversion to char. From [conv.integral]:
If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source
integer (modulo 2n where n is the number of bits used to represent the unsigned type). [ Note: In a two’s
complement representation, this conversion is conceptual and there is no change in the bit pattern (if there
is no truncation). —end note ]
If the destination type is signed, the value is unchanged if it can be represented in the destination type;
otherwise, the value is implementation-defined.
If char is unsigned, that part is well-defined. If it's signed, it's not. So prefer to use unsigned char for my_code.

Don't try to hack things yourself. The compiler is perfectly capable of doing this directly:
struct Packed {
unsigned x : 4;
unsigned y : 4;
};
You'd better have a few million of those, else the saving isn't really worthwhile.

C does not mandate sizes for short, int, or long; it specifies the ranges of values they must be able to represent. An unsigned int must be able to represent at least values in the range 0 to 65535, meaning it must be at least 16 bits wide, although it may (and often is) wider. A signed int must represent at least the range -32767 to 32767. Note that C does not mandate two's complement representation, which is why the lower value is -32767 and not -32768.
Also, these types may have padding bits in addition to value bits. I've never worked on an implementation that used padding bits in integral types.
Integral values are stored in the value bits using regular binary representation (i.e., N bits, leftmost bit is the most significant bit).
C99 added the stdint.h header, which defines integer types that have exact widths and no padding bits. If your values really do range from 0 to 15, then I'd suggest you use uint8_t (8 bits, unsigned) for each. Since these types fit into a single byte, you won't have any endianness issues.

C/C++ standards do not enforce particular representation for integral types. Actually, the standards allow three different representations -- two's complement, ones's complement, and signed magnitude representation. Luckily, positive numbers look the same in all of the representations.
Anyway, you do not have to care about it -- simply avoid bit manipulation operators as follows. If the range of your coordinates is [0, N), you can pack them into a wider data type as:
code = N * x + y;
or
code = N * N * x + N * y + z;
For N = 16, you can pack two coordinates into a single unsigned char as:
unsigned char code = 16*x + y;

size of a structure containing bit fields [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Why isn't sizeof for a struct equal to the sum of sizeof of each member?
I was trying to understand the concept of bit fields.
But I am not able to find why the size of the following structure in CASE III is coming out as 8 bytes.
CASE I:
struct B
{
unsigned char c; // +8 bits
} b;
sizeof(b); // Output: 1 (because unsigned char takes 1 byte on my system)
CASE II:
struct B
{
unsigned b: 1;
} b;
sizeof(b); // Output: 4 (because unsigned takes 4 bytes on my system)
CASE III:
struct B
{
unsigned char c; // +8 bits
unsigned b: 1; // +1 bit
} b;
sizeof(b); // Output: 8
I don't understand why the output for case III comes as 8. I was expecting 1(char) + 4(unsigned) = 5.

You can check the layout of the struct by using offsetof, but it will be something along the lines of:
struct B
{
unsigned char c; // +8 bits
unsigned char pad[3]; //padding
unsigned int bint; //your b:1 will be the first byte of this one
} b;
Now, it is obvious that (in a 32-bit arch.) the sizeof(b) will be 8, isn't it?
The question is, why 3 bytes of padding, and not more or less?
The answer is that the offset of a field into a struct has the same alignment requirements as the type of the field itself. In your architecture, integers are 4-byte-aligned, so offsetof(b, bint) must be multiple of 4. It cannot be 0, because there is the c before, so it will be 4. If field bint starts at offset 4 and is 4 bytes long, then the size of the struct is 8.
Another way to look at it is that the alignment requirement of a struct is the biggest of any of its fields, so this B will be 4-byte-aligned (as it is your bit field). But the size of a type must be a multiple of the alignment, 4 is not enough, so it will be 8.

I think you're seeing an alignment effect here.
Many architectures require integers to be stored at addresses in memory that are multiple of the word size.
This is why the char in your third struct is being padded with three more bytes, so that the following unsigned integer starts at an address that is a multiple of the word size.

Char are by definition a byte. ints are 4 bytes on a 32 bit system. And the struct is being padded the extra 4.
See http://en.wikipedia.org/wiki/Data_structure_alignment#Typical_alignment_of_C_structs_on_x86 for some explanation of padding

To keep the accesses to memory aligned the compiler is adding padding if you pack the structure it will no add the padding.

I took another look at this and here's what I found.
From the C book, "Almost everything about fields is implementation-dependant."
On my machine:
struct B {
unsigned c: 8;
unsigned b: 1;
}b;
printf("%lu\n", sizeof(b));
print 4 which is a short;
You were mixing bit fields with regular struct elements.
BTW, a bit fields is defined as: "a set of adjacent bits within a sindle implementation-defined storage unit" So, I'm not even sure that the ':8' does what you want. That would seem to not be in the spirit of bit fields (as it's not a bit any more)

The alignment and total size of the struct are platform and compiler specific. You cannot not expect straightforward and predictable answers here. Compiler can always have some special idea. For example:
struct B
{
unsigned b0: 1; // +1 bit
unsigned char c; // +8 bits
unsigned b1: 1; // +1 bit
};
Compiler can merge fields b0 and b1 into one integer and may not. It is up to compiler. Some compilers have command line keys that control this, some compilers not. Other example:
struct B
{
unsigned short c, d, e;
};
It is up to compiler to pack/not pack the fields of this struct (asuming 32 bit platform). Layout of the struct can differ between DEBUG and RELEASE builds.
I would recommend using only the following pattern:
struct B
{
unsigned b0: 1;
unsigned b1: 7;
unsigned b2: 2;
};
When you have sequence of bit fields that share the same type, compiler will put them into one int. Otherwise various aspects can kick in. Also take into account that in a big project you write piece of code and somebody else will write and rewrite the makefile; move your code from one dll into another. At this point compiler flags will be set and changed. 99% chance that those people will have no idea of alignment requirements for your struct. They will not even open your file ever.

How can I get bitfields to arrange my bits in the right order?

To begin with, the application in question is always going to be on the same processor, and the compiler is always gcc, so I'm not concerned about bitfields not being portable.
gcc lays out bitfields such that the first listed field corresponds to least significant bit of a byte. So the following structure, with a=0, b=1, c=1, d=1, you get a byte of value e0.
struct Bits {
unsigned int a:5;
unsigned int b:1;
unsigned int c:1;
unsigned int d:1;
} __attribute__((__packed__));
(Actually, this is C++, so I'm talking about g++.)
Now let's say I'd like a to be a six bit integer.
Now, I can see why this won't work, but I coded the following structure:
struct Bits2 {
unsigned int a:6;
unsigned int b:1;
unsigned int c:1;
unsigned int d:1;
} __attribute__((__packed__));
Setting b, c, and d to 1, and a to 0 results in the following two bytes:
c0 01
This isn't what I wanted. I was hoping to see this:
e0 00
Is there any way to specify a structure that has three bits in the most significant bits of the first byte and six bits spanning the five least significant bits of the first byte and the most significant bit of the second?
Please be aware that I have no control over where these bits are supposed to be laid out: it's a layout of bits that are defined by someone else's interface.

(Note that all of this is gcc-specific commentary - I'm well aware that the layout of bitfields is implementation-defined).
Not on a little-endian machine: The problem is that on a little-endian machine, the most significant bit of the second byte isn't considered "adjacent" to the least significant bits of the first byte.
You can, however, combine the bitfields with the ntohs() function:
union u_Bits2{
struct Bits2 {
uint16_t _padding:7;
uint16_t a:6;
uint16_t b:1;
uint16_t c:1;
uint16_t d:1;
} bits __attribute__((__packed__));
uint16_t word;
}
union u_Bits2 flags;
flags.word = ntohs(flag_bytes_from_network);
However, I strongly recommend you avoid bitfields and instead use shifting and masks.

Usually you can't do strong assumptions on how the union will be packed, every compiler implementation may choose to pack it differently (to save space or align bitfields inside bytes).
I would suggest you to just work out with masking and bitwise operators..
from this link:
The main use of bitfields is either to allow tight packing of data or to be able to specify the fields within some externally produced data files. C gives no guarantee of the ordering of fields within machine words, so if you do use them for the latter reason, you program will not only be non-portable, it will be compiler-dependent too. The Standard says that fields are packed into ‘storage units’, which are typically machine words. The packing order, and whether or not a bitfield may cross a storage unit boundary, are implementation defined. To force alignment to a storage unit boundary, a zero width field is used before the one that you want to have aligned.

C/C++ has no means of specifying the bit by bit memory layout of structs, so you will need to do manual bit shifting and masking on 8 or 16bit (unsigned) integers (uint8_t, uint16_t from <stdint.h> or <cstdint>).
Of the good dozen of programming languages I know, only very few allow you to specify bit-by-bit memory layout for bit fields: Ada, Erlang, VHDL (and Verilog).
(Community wiki if you want to add more languages to that list.)

Define smallest possible datatype in c++ that can hold six values

I want to define my own datatype that can hold a single one of six possible values in order to learn more about memory management in c++. In numbers, I want to be able to hold 0 through 5. Binary, It would suffice with three bits (101=5), although some (6 and 7) wont be used. The datatype should also consume as little memory as possible.
Im not sure on how to accomplish this. First, I tried an enum with defined values for all the fields. As far as I know, the values are in hex there, so one "hexbit" should allow me to store 0 through 15. But comparing it to a char (with sizeof) it stated that its 4 times the size of a char, and a char holds 0 through 255 if Im not misstaken.
#include <iostream>
enum Foo
{
a = 0x0,
b = 0x1,
c = 0x2,
d = 0x3,
e = 0x4,
f = 0x5,
};
int main()
{
Foo myfoo = a;
char mychar = 'a';
std::cout << sizeof(myfoo); // prints 4
std::cout << sizeof(mychar); // prints 1
return 1;
}
Ive clearly misunderstood something, but fail to see what, so I turn to SO. :)
Also, when writing this post I realised that I clearly lack some parts of the vocabulary. Ive made this post a community wiki, please edit it so I can learn the correct words for everything.

A char is the smallest possible type.
If you happen to know that you need several such 3 bit values in a single place you get use a structure with bitfield syntax:
struct foo {
unsigned int val1:3;
unsigned int val2:3;
};
and hence get 2 of them within one byte. In theory you could pack 10 such fields into a 32-bit "int" value.

C++ 0x will contain Strongly typed enumerations where you can specify the underlying datatype (in your example char), but current C++ does not support this. The standard is not clear about the use of a char here (the examples are with int, short and long), but they mention the underlying integral type and that would include char as well.
As of today Neil Butterworth's answer to create a class for your problem seems the most elegant, as you can even extend it to contain a nested enumeration if you want symbolical names for the values.

C++ does not express units of memory smaller than bytes. If you're producing them one at a time, That's the best you can do. Your own example works well. If you need to get just a few, You can use bit-fields as Alnitak suggests. If you're planning on allocating them one at a time, then you're even worse off. Most archetectures allocate page-size units, 16 bytes being common.
Another choice might be to wrap std::bitset to do your bidding. This will waste very little space, if you need many such values, only about 1 bit for every 8.
If you think about your problem as a number, expressed in base-6, and convert that number to base two, possibly using an Unlimited precision integer (for example GMP), you won't waste any bits at all.
This assumes, of course, that you're values have a uniform, random distribution. If they follow a different distribution, You're best bet will be general compression of the first example, with something like gzip.

You can store values smaller than 8 or 32 bits. You just need to pack them into a struct (or class) and use bit fields.
For example:
struct example
{
unsigned int a : 3; //<Three bits, can be 0 through 7.
bool b : 1; //<One bit, the stores 0 or 1.
unsigned int c : 10; //<Ten bits, can be 0 through 1023.
unsigned int d : 19; //<19 bits, can be 0 through 524287.
}
In most cases, your compiler will round up the total size of your structure to 32 bits on a 32 bit platform. The other problem is, like you pointed out, that your values may not have a power of two range. This will make for wasted space. If you read the entire struct as one number, you will find values that will be impossible to set, if your input ranges aren't all powers of 2.
Another feature you may find interesting is a union. They work like a struct, but share memory. So if you write to one field it overwrites the others.
Now, if you are really tight for space, and you want to push each bit to the maximum, there is a simple encoding method. Let's say you want to store 3 numbers, each can be from 0 to 5. Bit fields are wasteful, because if you use 3 bits each, you'll waste some values (i.e. you could never set 6 or 7, even though you have room to store them). So, lets do an example:
//Here are three example values, each can be from 0 to 5:
const int one = 3, two = 4, three = 5;
To pack them together most efficiently, we should think in base 6 (since each value is from 0-5). So packed into the smallest possible space is:
//This packs all the values into one int, from 0 - 215.
//pack could be any value from 0 - 215. There are no 'wasted' numbers.
int pack = one + (6 * two) + (6 * 6 * three);
See how it looks like we're encoding in base six? Each number is multiplied by it's place like 6^n, where n is the place (starting at 0).
Then to decode:
const int one = pack % 6;
pack /= 6;
const int two = pack % 6;
pack /= 6;
const int three = pack;
Theses schemes are extremely handy when you have to encode some fields in a bar code or in an alpha numeric sequence for human typing. Just saying those few partial bits can make a huge difference. Also, the fields don't all have to have the same range. If one field is from 0 through 7, you'd use 8 instead of 6 in the proper place. There is no requirement that all fields have the same range.

Minimal size what you can use - 1 byte.
But if you use group of enum values ( writing in file or storing in container, ..), you can pack this group - 3 bits per value.

You don't have to enumerate the values of the enum:
enum Foo
{
a,
b,
c,
d,
e,
f,
};
Foo myfoo = a;
Here Foo is an alias of int, which on your machine takes 4 bytes.
The smallest type is char, which is defined as the smallest addressable data on the target machine. The CHAR_BIT macro yields the number of bits in a char and is defined in limits.h.
[Edit]
Note that generally speaking you shouldn't ask yourself such questions. Always use [unsigned] int if it's sufficient, except when you allocate quite a lot of memory (e.g. int[100*1024] vs char[100*1024], but consider using std::vector instead).

The size of an enumeration is defined to be the same of an int. But depending on your compiler, you may have the option of creating a smaller enum. For example, in GCC, you may declare:
enum Foo {
a, b, c, d, e, f
}
__attribute__((__packed__));
Now, sizeof(Foo) == 1.

The best solution is to create your own type implemented using a char. This should have sizeof(MyType) == 1, though this is not guaranteed.
#include <iostream>
using namespace std;
class MyType {
public:
MyType( int a ) : val( a ) {
if ( val < 0 || val > 6 ) {
throw( "bad value" );
}
}
int Value() const {
return val;
}
private:
char val;
};
int main() {
MyType v( 2 );
cout << sizeof(v) << endl;
cout << v.Value() << endl;
}

It is likely that packing oddly sized values into bitfields will incur a sizable performance penalty due to the architecture not supporting bit-level operations (thus requiring several processor instructions per operation). Before you implement such a type, ask yourself if it is really necessary to use as little space as possible, or if you are committing the cardinal sin of programming that is premature optimization. At most, I would encapsulate the value in a class whose backing store can be changed transparently if you really do need to squeeze every last byte for some reason.

You can use an unsigned char. Probably typedef it into an BYTE. It will occupy only one byte.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js