Simple C++ syntax question about colon - c++

I just saw a code snippet with a piece of syntax that I have never seen before.
What does bool start : 1; mean? I found it inside a class definition in a header file.

struct record {
char *name;
int refcount : 4;
unsigned dirty : 1;
};
Those are bit-fields; the number gives the exact size of the field, in bits. (See any complete book on C for the details.) Bit-fields can be used to save space in structures having several binary flags or other small fields, and they can also be used in an attempt to conform to externally-imposed storage layouts. (Their success at the latter task is mitigated by the fact that bit-fields are assigned left-to-right on some machines and right-to-left on others).
Note that the colon notation for specifying the size of a field in bits is only valid in structures (and in unions); you cannot use this mechanism to specify the size of arbitrary variables.
References: K&R1 Sec. 6.7 pp. 136-8
K&R2 Sec. 6.9 pp. 149-50
ISO Sec. 6.5.2.1
H&S Sec. 5.6.5 pp. 136-8

it's a bitfield. : 1 means one bit is used.
see for example http://msdn.microsoft.com/en-us/library/ewwyfdbe(VS.71).aspx

It means that start is 1 bit wide, as opposed to the normal bool which is 1 byte long. You can pack multiple smaller variables into a larger variable and the compiler will generate all the and/or code necessary to read/write it for you. You will take a (noticeable) performance hit, but, if used right, you'll use a lot less memory.

See the Wikipedia entry about Bit Fields. It tells the compiler how many bits the structure member should occupy.

It makes the member start into a bit-field, with 1 bit of space reserved.
It's only valid for struct/class members, you can't have a bit-field variable.

This is the syntax for bit fields
Essentially, you define a field in a struct to have only a few bits of a full byte or short or int.
Several bit fields may share the same int so this method can be used as a clever way to avoid some bit manipulations in constructing values.

This is the syntax for describing bit fields. This is a way of packing more information into a smaller amount of storage. Whereas normally a bool would take at least a byte (probably more) to represent, by using bit fields, you can combine several bools into one byte with a simple syntax.
Be careful though. As one of lesser-known areas of the language, you may run into corner cases when using them. For example, the data structures thus produced are probably not portable between processor types.

It's a bit-field. But I've never tried making bit-fields on boolean.

Related

C++ UE4 - bool vs. uint8 : 1 vs. uint32 : 1 - pros and cons of each?

So, I'm familiar with the concept of packing a bunch of Boolean values using a single bit inside of an integer (bit masking I think its called), and thus you conserve memory because a Boolean is a byte and you fit more than one Boolean in an byte long integer. Thus, if you have enough Booleans, packing them together can make a big difference, and we see that in the native Unreal source code this particular optimization is used quite heavily. What I'm not clear on however, is what are the downsides of this? There are places where many regular Booleans are used instead. Also, why in some paces are uint32 used and some places unint8 are used? I've read there may be some read write related inefficiencies or something?
The biggest problem is that there is no pointer to "packed bool" - like you have an int32 that packs 32 booleans then you cannot make bool* or bool& that refers to any of them in a proper way. This is due to the fact that byte is a minimal memory unit.
In STL they made std::vector<bool> that saved space and had the same interface semantically as other vectors. To do so they had to make special proxy class that is returned from operator [] so one do stuff like boolVec[5] = true. Unfortunately, this over-complication resulted in many problems in performance and usage of std::vector<bool>.
Even simple instructions on packed booleans tend to be composite and thus heavier than if the booleans represented via bool and took a whole byte. Additionally, modifying values of packed boolean is could be causing data-racing in multi-threaded environment.
Basically, hardware simply doesn't support booleans too well.
Next, image POV of OS designer and you create common interface of shared libraries (aka dll). How to treat booleans now? Byte is a minimal memory unit so to pass a single boolean one would still need to use at least a single byte. So why not simply forget about existence of bool and simply pass it via a single byte? So we don't even need to implement this needless type of bool and it will save lots of time for all compiler writers of all languages.
uint8 vs uint32; Also, note that Windows' COM (component object model - not serial port) uses int16 for boolean. In general, it is inherently unimportant as when passing values to a shared library's function that does complex stuff will not make any noticeable difference in performance as you are already calling a much heavier function. Still why is it so? I imagine they had some reasons a long time ago when they designed it and everybody has already forgotten why and they simply keep it unchanged as changing it will result in complete disaster in terms of backwards compatibility.
In C99 _Bool was introduced for booleans but it is just a different name for an unsigned int. I imagine from this originated usage of uint32 for booleans. In general, int is supposedly the most efficient integer type in terms of performance (which it why its size is not strictly defined) - so the C committee chose the supposedly most efficient type to represent booleans.

C++ BOOL (typedef int) vs bool for performance

I read somewhere that using BOOL (typedef int) is better than using the standard c++ type bool because the size of BOOL is 4 bytes (i.e. a multiple of 4) and it saves alignment operations of variables into registers or something along those lines...
Is there any truth to this? I imagine that the compiler would pad the stack frames in order to keep alignments of multiple of 4s even if you use bool (1 byte)?
I'm by no means an expert on the underlying workings of alignments, registers, etc so I apologize in advance if I've got this completely wrong. I hope to be corrected. :)
Cheers!
First of all, sizeof(bool) is not necessarily 1. It is implementation-defined, giving the compiler writer freedom to choose a size that's suitable for the target platform.
Also, sizeof(int) is not necessarily 4.
There are multiple issues that could affect performance:
alignment;
memory bandwidth;
CPU's ability to efficiently load values that are narrower than the machine word.
What -- if any -- difference that makes to a particular piece of code can only be established by profiling that piece of code.
The only guaranteed size you can get in C++ is with char, unsigned char, and signed char 2), which are always exactly one byte and defined for every platform.0)1)
0) Though a byte does not have a defined size. sizeof(char) is always 1 byte, but might be 40 binary bits in fact
1) Yes, there is uint32_t and friends, but no, their definition is optional for actual C++ implementations. Use them, but you may get compile time errors if they are not available (compile time errors are always good)
2) char, unsigned char, and signed char are distinct types and it is not defined whether char is signed or not. Keep this in mind when overloading functions and writing templates.
There are three commonly accepted performance-driven practices in regards to booleans:
In if-statements order of checking the expressions matters and one needs to be careful about them.
If a check of a boolean expression causes a lot of branch mispredictions, then it should (if possible) be substituted with a bit twiddling hack.
Since boolean is a smallest data type, boolean variables should be declared last in structures and classes, so that padding does not add noticeable holes in the structure memory layout.
I've never heard about any performance gain from substituting a boolean with (unsigned?) integer however.

In C++ what is the proper term for splitting an int into bits

I see in some C++ code things like:
// Header
struct SomeStruct {
uint32_t nibble1:4, bitField1:1, bitField2:1, bitField3:1, bitField4:1,
padding:11, field5Bits:5, byteField:8;
};
What is this called? I typically like to google before asking here, but I have no idea what to even type in. I'm hoping to understand this when it comes to endianness - is bit order something to consider or just byte order? Also, what is the type of each field - bitFieldX should be a bool, while field5Bits should be a uint8_t. At least that's what I would think.
Thanks.
They are called bitfields (MSVC) (GCC)
Endianess usually refers to the order of bytes. However bit order can be important, see the above links.
They behave as an unsigned int (uint32_t) in your case.
In general, the term for selecting several bits out of a larger binary integer representation is masking.
What you posted is a packed structure. The elements within the structure are know as bitfields as others have posted. These are often used to represent communication protocol structures, where the protocol specifies fields that are less than one byte, or not aligned to a byte, half-word or word alignment that would normally take place.
Since there is only one type listed, each member of the structure is the same type, uint_32.
Endianess does matter for anthing that is part of a data type that is larger than 1 byte.

C++ : why bool is 8 bits long?

In C++, I'm wondering why the bool type is 8 bits long (on my system), where only one bit is enough to hold the boolean value ?
I used to believe it was for performance reasons, but then on a 32 bits or 64 bits machine, where registers are 32 or 64 bits wide, what's the performance advantage ?
Or is it just one of these 'historical' reasons ?
Because every C++ data type must be addressable.
How would you create a pointer to a single bit? You can't. But you can create a pointer to a byte. So a boolean in C++ is typically byte-sized. (It may be larger as well. That's up to the implementation. The main thing is that it must be addressable, so no C++ datatype can be smaller than a byte)
Memory is byte addressable. You cannot address a single bit, without shifting or masking the byte read from memory. I would imagine this is a very large reason.
A boolean type normally follows the smallest unit of addressable memory of the target machine (i.e. usually the 8bits byte).
Access to memory is always in "chunks" (multiple of words, this is for efficiency at the hardware level, bus transactions): a boolean bit cannot be addressed "alone" in most CPU systems. Of course, once the data is contained in a register, there are often specialized instructions to manipulate bits independently.
For this reason, it is quite common to use techniques of "bit packing" in order to increase efficiency in using "boolean" base data types. A technique such as enum (in C) with power of 2 coding is a good example. The same sort of trick is found in most languages.
Updated: Thanks to a excellent discussion, it was brought to my attention that sizeof(char)==1 by definition in C++. Hence, addressing of a "boolean" data type is pretty tied to the smallest unit of addressable memory (reinforces my point).
The answers about 8-bits being the smallest amount of memory that is addressable are correct. However, some languages can use 1-bit for booleans, in a way. I seem to remember Pascal implementing sets as bit strings. That is, for the following set:
{1, 2, 5, 7}
You might have this in memory:
01100101
You can, of course, do something similar in C / C++ if you want. (If you're keeping track of a bunch of booleans, it could make sense, but it really depends on the situation.)
I know this is old but I thought I'd throw in my 2 cents.
If you limit your boolean or data type to one bit then your application is at risk for memory curruption. How do you handle error stats in memory that is only one bit long?
I went to a job interview and one of the statements the program lead said to me was, "When we send the signal to launch a missle we just send a simple one bit on off bit via wireless. Sending one bit is extremelly fast and we need that signal to be as fast as possible."
Well, it was a test to see if I understood the concepts and bits, bytes, and error handling. How easy would it for a bad guy to send out a one bit msg. Or what happens if during transmittion the bit gets flipped the other way.
Some embedded compilers have an int1 type that is used to bit-pack boolean flags (e.g. CCS series of C compilers for Microchip MPU's). Setting, clearing, and testing these variables uses single-instruction bit-level instructions, but the compiler will not permit any other operations (e.g. taking the address of the variable), for the reasons noted in other answers.
Note, however, that std::vector<bool> is allowed to use bit-packing, i.e. to store the bits in smaller units than an ordinary bool. But it is not required.

Determining the alignment of C/C++ structures in relation to its members

Can the alignment of a structure type be found if the alignments of the structure members are known?
Eg. for:
struct S
{
a_t a;
b_t b;
c_t c[];
};
is the alignment of S = max(alignment_of(a), alignment_of(b), alignment_of(c))?
Searching the internet I found that "for structured types the largest alignment requirement of any of its elements determines the alignment of the structure" (in What Every Programmer Should Know About Memory) but I couldn't find anything remotely similar in the standard (latest draft more exactly).
Edited:
Many thanks for all the answers, especially to Robert Gamble who provided a really good answer to the original question and the others who contributed.
In short:
To ensure alignment requirements for structure members, the alignment of a structure must be at least as strict as the alignment of its strictest member.
As for determining the alignment of structure a few options were presented and with a bit of research this is what I found:
c++ std::tr1::alignment_of
not standard yet, but close (technical report 1), should be in the C++0x
the following restrictions are present in the latest draft: Precondition:T shall be a complete type, a reference type, or an array of
unknown bound, but shall not be a function type or (possibly
cv-qualified) void.
this means that my presented use case with the C99 flexible array won't work (this is not that surprising since flexible arrays are not standard c++)
in the latest c++ draft it is defined in the terms of a new keyword - alignas (this has the same complete type requirement)
in my opinion, should c++ standard ever support C99 flexible arrays, the requirement could be relaxed (the alignment of the structure with the flexible array should not change based on the number of the array elements)
c++ boost::alignment_of
mostly a tr1 replacement
seems to be specialized for void and returns 0 in that case (this is forbidden in the c++ draft)
Note from developers: strictly speaking you should only rely on the value of ALIGNOF(T) being a multiple of the true alignment of T, although in practice it does compute the correct value in all the cases we know about.
I don't know if this works with flexible arrays, it should (might not work in general, this resolves to compiler intrinsic on my platform so I don't know how it will behave in the general case)
Andrew Top presented a simple template solution for calculating the alignment in the answers
this seems to be very close to what boost is doing (boost will additionally return the object size as the alignment if it is smaller than the calculated alignment as far as I can see) so probably the same notice applies
this works with flexible arrays
use Windbg.exe to find out the alignment of a symbol
not compile time, compiler specific, didn't test it
using offsetof on the anonymous structure containing the type
see the answers, not reliable, not portable with c++ non-POD
compiler intrinsics, eg. MSVC __alignof
works with flexible arrays
alignof keyword is in the latest c++ draft
If we want to use the "standard" solution we're limited to std::tr1::alignment_of, but that won't work if you mix your c++ code with c99's flexible arrays.
As I see it there is only 1 solution - use the old struct hack:
struct S
{
a_t a;
b_t b;
c_t c[1]; // "has" more than 1 member, strictly speaking this is undefined behavior in both c and c++ when used this way
};
The diverging c and c++ standards and their growing differences are unfortunate in this case (and every other case).
Another interesting question is (if we can't find out the alignment of a structure in a portable way) what is the most strictest alignment requirement possible. There are a couple of solutions I could find:
boost (internally) uses a union of variety of types and uses the boost::alignment_of on it
the latest c++ draft contains std::aligned_storage
The value of default-alignment shall be the most stringent alignment requirement for any C++ object type whose size is no greater than Len
so the std::alignment_of< std::aligned_storage<BigEnoughNumber>>::value should give us the maximum alignment
draft only, not standard yet (if ever), tr1::aligned_storage does not have this property
Any thoughts on this would also be appreciated.
I have temporarily unchecked the accepted answer to get more visibility and input on the new sub-questions
There are two closely related concepts to here:
The alignment required by the processor to access a particular object
The alignment that the compiler actually uses to place objects in memory
To ensure alignment requirements for structure members, the alignment of a structure must be at least as strict as the alignment of its strictest member. I don't think this is spelled out explicitly in the standard but it can be inferred from the the following facts (which are spelled out individually in the standard):
Structures are allowed to have padding between their members (and at the end)
Arrays are not allowed to have padding between their elements
You can create an array of any structure type
If the alignment of a structure was not at least as strict as each of its members you would not be able to create an array of structures since some structure members some elements would not be properly aligned.
Now the compiler must ensure a minimum alignment for the structure based on the alignment requirements of its members but it can also align objects in a stricter fashion than required, this is often done for performance reasons. For example, many modern processors will allow access to 32-bit integers in any alignment but accesses may be significantly slower if they are not aligned on a 4-byte boundary.
There is no portable way to determine the alignment enforced by the processor for any given type because this is not exposed by the language, although since the compiler obviously knows the alignment requirements of the target processor it could expose this information as an extension.
There is also no portable way (at least in C) to determine how a compiler will actually align an object although many compilers have options to provide some level of control over the alignment.
I wrote this type trait code to determine the alignment of any type(based on the compiler rules already discussed). You may find it useful:
template <class T>
class Traits
{
public:
struct AlignmentFinder
{
char a;
T b;
};
enum {AlignmentOf = sizeof(AlignmentFinder) - sizeof(T)};
};
So now you can go:
std::cout << "The alignment of structure S is: " << Traits<S>::AlignmentOf << std::endl;
The following macro will return the alignment requirement of any given type (even if it's a struct):
#define TYPE_ALIGNMENT( t ) offsetof( struct { char x; t test; }, test )
Note: I probably borrowed this idea from a Microsoft header at some point way back in my past...
Edit: as Robert Gamble points out in the comments, this macro is not guaranteed to work. In fact, it will certainly not work very well if the compiler is set to pack elements in structures. So if you decide to use it, use it with caution.
Some compilers have an extension that allows you obtain the alignment of a type (for example, starting with VS2002, MSVC has an __alignof() intrinsic). Those should be used when available.
As the others mentioned, its implementation dependant. Visual Studio 2005 uses 8 bytes as the default structure alignment. Internally, items are aligned by their size - a float has 4 byte alignment, a double uses 8, etc.
You can override the behavior with #pragma pack. GCC (and most compilers) have similar compiler options or pragmas.
It is possible to assume a structure alignment if you know more details about the compiler options that are in use. For example, #pragma pack(1) will force alignment on the byte level for some compilers.
Side note: I know the question was about alignment, but a side issue is padding. For embedded programming, binary data, and so forth -- In general, don't assume anything about structure alignment if possible. Rather use explicit padding if necessary in the structures. I've had cases where it was impossible to duplicate the exact alignment used in one compiler to a compiler on a different platform without adding padding elements. It had to do with the alignment of structures inside of structures, so adding padding elements fixed it.
If you want to find this out for a particular case in Windows, open up windbg:
Windbg.exe -z \path\to\somemodule.dll -y \path\to\symbols
Then, run:
dt somemodule!CSomeType
I don't think memory layout is guaranteed in any way in any C standard. This is very much vendor and architect-dependent. There might be ways to do it that work in 90% of cases, but they are not standard.
I would be very glad to be proven wrong, though =)
I agree mostly with Paul Betts, Ryan and Dan. Really, it's up to the developer, you can either keep the default alignment symanic's which Robert noted about (Robert's explanation is just the default behaviour and not by any means enforced or required), or you can setup whatever alignment you want /Zp[##].
What this means is that if you have a typedef with floats', long double's, uchar's etc... various assortments of arrays's included. Then have another type which has some of these oddly shaped members, and a single byte, then another odd member, it will simply be aligned at whatever preference the make/solution file defines.
As noted earlier, using windbg's dt command at runtime you can find out how the compiler laid out the structure in memory.
You can also use any pdb reading tool like dia2dump to extract this info from pdb's statically.
Modified from Peeter Joot's Blog
C structure alignment is based on the biggest size native type in the structure, at least generally (an exception is something like using a 64-bit integer on win32 where only 32-bit alignment is required).
If you have only chars and arrays of chars, once you add an int, that int will end up starting on a 4 byte boundary (with possible hidden padding before the int member). Additionally, if the structure isn’t a multiple of sizeof(int), hidden padding will be added at the end. Same thing for short and 64-bit types.
Example:
struct blah1 {
char x ;
char y[2] ;
};
sizeof(blah1) == 3
struct blah1plusShort {
char x ;
char y[2] ;
// <<< hidden one byte inserted by the compiler here
// <<< z will start on a 2 byte boundary (if beginning of struct is aligned).
short z ;
char w ;
// <<< hidden one byte tail pad inserted by the compiler.
// <<< the total struct size is a multiple of the biggest element.
// <<< This ensures alignment if used in an array.
};
sizeof(blah1plusShort) == 8
I read this answer after 8 years and I feel that the accepted answer from #Robert is generally right, but mathematically wrong.
To ensure alignment requirements for structure members, the alignment of a structure must be at least as strict as the least common multiple of the alignment of its members. Consider an odd example, where the alignment requirements of members are 4 and 10; in which case the alignment of the structure is LCM(4, 10) which is 20, and not 10. Of course, it is odd to see platforms with such alignment requirement which is not a power of 2, and thus for all practical cases, the structure alignment is equal to the maximum alignment of its members.
The reason for this is that, only if the address of the structure starts with the LCM of its member alignments, the alignment of all the members can be satisfied and the padding between the members and the end of the structure is independent of the start address.
Update: As pointed out by #chqrlie in the comment, C standard does not allow the odd values of the alignment. However this answer still proves why structure alignment is the maximum of its member alignments, just because the maximum happens to be the least common multiple, and thus the members are always aligned relative to the common multiple address.