What is the optimal size for a boolean variable - c++

I have come to believe that the optimal size for a boolean variable is the natural width of the data, ie in C/C++ it is int. So for modern processors this is normally 32 bits. At the machine level declaring it as a byte for example requires a 32 bit fetch and then a mask.
However I have seen that a BOOL in iOS is 8 bits. I had assumed that people who used bytes were using left-over ideas from 8 bit processors.
I realise this question depends on the use and for most of the time the language defined boolean is the best bet, but there are times when you need to define your own, such as when you are converting code arriving from an external source or you want to write cross platform code.
It is also significant that if a boolean value is going to be packed into a serial stream, for sending over a serial line such as ethernet or storing it may be optimal to pack the boolean in fewer bits. But I feel that it is likely that it is optimal to pack and unpack from a processor optimal size.
So my question is am I correct in thinking that the optimal size for a boolean on a 32bit processor is 32 bits and if so why does iOS use 8 bits.

Yup you are right it depends. The big advantage of using an 8-bit is that you can pack more into a struct nicely.
Of course you'd be best off using flags in such a case.
The big issue, though, is that with a C/C++ "bool" you don't necessarily know how big it is. This means that you can't make assumptions about a struct (such as binary writing to disk) without the possibility of it breaking on another platform. In such a case using a known sized variable can be very useful and you may as well use as little space as possible if you are going to dump the structure to disk.

The notion of an 8-bit quantity involving a 32-bit fetch followed by hardware masking is mostly obsolete. In reality, a fetch from memory (on a modern processor) will normally be one L2 cache line (typically around 64-128 bytes). That being the case, essentially every size of item you deal with involves fetching a big chunk of data, and then using only some subset of what you fetched (but, assuming your data is more or less contiguous, probably using more of that data subsequently).
C++ attempts (not necessarily successfully) to optimize this a bit for you. An individual bool can be anywhere from one byte on up, though on most typical implementation, it's either one byte or four bytes. The (much reviled) std::vector<bool> uses some tricks to give a (sort of) vector-like interface, but still store each bool in one bit. In the process it loses the ability to be treated as a generic sequence container -- but when you're storing a lot of bools, and can live with the restrictions of using it in an array-like manner, it can actually be a lot more useful than many people believe.
When/if you want to retain normal container semantics and don't mind the extra storage space to keep them their native size, you can use another container (e.g., std::deque<bool>) instead. Especially if you only need to store a small collection of bools, this can often be a superior alternative.

It is architecture dependent, but on many 32 bit architectures 8 bit addressing is no less efficient than 32 bit; the "fetching and masking" as such is performed in hardware logic.
The optimal size in terms of storage space is of course 1 bit. You might for example use bit-fields or bit masking to pack multiple booleans in a single word. Some architectures such as 8051 have bit addressable memory. The more modern ARM Cortex-M architecture employs a technique called bit-banding that allows memory and hardware registers to be bit addressable

Related

Bit fields in C and C++: where are they used?

I am working with C and C++ for some time.
While learning the basics you can bump into such interesting thing as bit fields.
Usage of bit fields in programming practice has somehow controversial character.
In which kind of situations this low-level feature usage provides a real benefit, and are there concrete examples of using bit fields properly?
When working with embedded systems and microcontrollers, individual bits in a register may be associated with a processor setting or input/output. Using bit fields allows these individual bits to be worked with by name instead of doing bitwise operations on the entire register.
It's mostly an aesthetic feature but can increase code readability in some applications.
There are several use cases for bit fields, even on modern machines.
The first would be when you are handling register level logic. This is common when you are setting modes and how certain pieces of hardware work. This is even more common on embedded devices. On Arduino devices, for example, the "PinMode" logic is basically setting individual bits high or low to indicate whether a digital I/O pin is in "input" or "output" mode.
http://arduino.cc/en/Reference/pinMode
Secondly, when writing optimized, in-line assembly code in a C/C++ program. There are times where you want to take advantage of hardware-optimized instructions to speed up your program's execution as much as possible:
http://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly-HOWTO.html
A final common example is when writing packet drivers or implementing specific protocols. I recently just posted a question on this, where it turns out I was using a 32-bit variable instead of an 8-bit variable composed of bitfields, which was causing my code to break:
Basic NTP Client in Windows in Visual C++
So, in short: when talking directly to hardware, or in networking code.
There's probably not much use for bit fields on a modern, high
performance machine, but for smaller machines, they can be very
useful to save memory, if you have large arrays of the
structures. Other than saving memory, however, there's no use
for them.
In addition to the other answers, in some scenarios, using bit fields can improve both memory usage and performance.
Save memory by packing together properties which need few bits to express the range of possible values. Why put 8 bool properties as 8 bool members, when a single byte gives to the ability to store 8 boolean values in each bit - instead of 8 bytes you use only 1, 7 bytes saved is quite significant. Naturally, you would typically use a 32, 64 bit or wider bitfields.
I have a similar scenario with a lots of objects with lots of properties which can be expressed in one or few bits, and for cases with high object count (millions) the memory savings are indeed significant.
Increase performance - although bitfields come with a small performance penalty (to access the actual value with shifting and masking) those operations are really fast. Packing more data and wasting less bits can give you better cache efficiency, which can result in performance gain that is greater than the bit fields access penalty.
Not only it may be faster to access individual bits than fetching another line from the cache and more probable to find the data in the cache if it is packed, but this will pollute the cache less, leaving more available space for other processes.

using 64 bits integers in 64 bits compilers and OSes

I have a doubt about when to use 64 bits integers when targeting 64 bits OSes.
Has anyone done conclusive studies focused on the speed of the generated code?
It is better to use 64 bits integers as params for funcs or methods? (Ex: uint64 myFunc(uint64 myVar))
If we use 64 bits integers as params it takes more memory but maybe it will be more efficient.
What about if we know that some value should be always less than, for example, 10. We still continue using 64 bit integers for this param?
It is better to use 64 bits integers as return types?
Is there some penalty for using 32-bit as return value?
It is better to use 64 bits integers for loops? (for(size_t i=0; i<...)) In this case, I suppose it.
Is there some penalty for using 32-bit variables for loops?
It is better to use 64 bits integers as indexes for pointers? (Ex: myMemory[index]) In this case, I suppose it.
Is there some penalty for using 32-bit variables for indexes?
It is better to use 64 bits integers to store data in classes or structs? (that we won't want to save to disk or something like this)
It is better to use 64 bits for a bool type?
What about conversions between 64 bits integers and floats? Will be better to use doubles now?
Until now doubles are slower than floats.
Is there some penalty every time we access a 32-bit variable?
Regards!
I agree with #MarkB but want to provide more detail on some topics.
On x64, there are more registers available (twice as many). The standard calling conventions have therefore been designed to take more parameters in registers by default. So as long as the number of parameters is not excessive (typically 4 or fewer), their types will make no difference. They will be promoted to 64 bit and passed in registers anyway.
Space will be allocated on the stack for those 64 bit registers even though they are passed in registers. This is by design to make their storage locations simple and contiguous with the those of surplus parameters. The surplus parameters will be placed on the stack regardless, so size may matter in those cases.
This issue is particularly important for memory data structures. Using 64 bit where 32 bit is sufficient will waste memory, and more importantly, occupy space in cache lines. The cache impact is not simple though. If your data access pattern is sequential, that's when you will pay for it by essentially making half of your cache unusable. (Assuming you only needed half of each 64 bit quantity.)
If your access pattern is random, there is no impact on cache performance. This is because every access occupies a full cache line anyway.
There can be a small impact in accessing integers that are smaller than word size. However, pipelining and multiple issue of instructions will make it so that the extra instruction (zero or sign extend) will almost always become completely hidden and go unobserved.
The upshot of all this is simple: choose the integer size that matters for your problem. For parameters, the compiler can promote them as needed. For memory structure, smaller is typically better.
You have managed to cram a ton of questions into one question here. It looks to me like all your questions basically concern micro-optimizations. As such I'm going to make a two-part answer:
Don't worry about size from a performance perspective but instead use types that are indicative of the data that they will contain and trust the compiler's optimizer to sort it out.
If performance becomes a concern at some point during development, profile your code. Then you can make algorithmic adjustments as appropriate and if the profiler shows that integer operations are causing a problem you can compare different sizes side-by-side for comparison purposes.
Use int and trust the platform and compiler authors that they have done their job and chose the most efficient representation for it. On most 64-bit platforms it is 32-bits which means that it's no less efficient than 64-bit types.

Factors influencing the size of primitives,their range and running of binaries on different platforms [duplicate]

Would the size of an integer depend upon the compiler, OS and processor?
The answer to this question depends on how far from practical considerations we are willing to get.
Ultimately, in theory, everything in C and C++ depends on the compiler and only on the compiler. Hardware/OS is of no importance at all. The compiler is free to implement a hardware abstraction layer of any thickness and emulate absolutely anything. There's nothing to prevent a C or C++ implementation from implementing the int type of any size and with any representation, as long as it is large enough to meet the minimum requirements specified in the language standard. Practical examples of such level of abstraction are readily available, e.g. programming languages based on "virtual machine" platform, like Java.
However, C and C++ are intended to be highly efficient languages. In order to achieve maximum efficiency a C or C++ implementation has to take into account certain considerations derived from the underlying hardware. For that reason it makes a lot of sense to make sure that each basic type is based on some representation directly (or almost directly) supported by the hardware. In that sense, the size of basic types do depend on the hardware.
In other words, a specific C or C++ implementation for a 64-bit hardware/OS platform is absolutely free to implement int as a 71-bit 1's-complement signed integral type that occupies 128 bits of memory, using the other 57 bits as padding bits that are always required to store the birthdate of the compiler author's girlfriend. This implementation will even have certain practical value: it can be used to perform run-time tests of the portability of C/C++ programs. But that's where the practical usefulness of that implementation would end. Don't expect to see something like that in a "normal" C/C++ compiler.
Yes, it depends on both processors (more specifically, ISA, instruction set architecture, e.g., x86 and x86-64) and compilers including programming model. For example, in 16-bit machines, sizeof (int) was 2 bytes. 32-bit machines have 4 bytes for int. It has been considered int was the native size of a processor, i.e., the size of register. However, 32-bit computers were so popular, and huge number of software has been written for 32-bit programming model. So, it would be very confusing if 64-bit computer would have 8 bytes for int. Both Linux and Windows remain 4 bytes for int. But, they differ in the size of long.
Please take a look at the 64-bit programming model like LP64 for most *nix and LLP64 for Windows:
http://www.unix.org/version2/whatsnew/lp64_wp.html
http://en.wikipedia.org/wiki/64-bit#64-bit_data_models
Such differences are actually quite embarrassing when you write code that should work both on Window and Linux. So, I'm always using int32_t or int64_t, rather than long, via stdint.h.
Yes, it would. Did they mean "which would it depend on: the compiler or the processor"? In that case the answer is basically "both." Normally, int won't be bigger than a processor register (unless that's smaller than 16 bits), but it could be smaller (e.g. a 32-bit compiler running on a 64-bit processor). Generally, however, you'll need a 64-bit processor to run code with a 64-bit int.
Based on some recent research I have done studying up for firmware interviews:
The most significant impact of the processors bit architecture ie, 8bit, 16bit, 32bit, 64bit is how you need to most efficiently store each byte of information in order to best compute variables in the minimum number of cycles.
The bit size of your processor tells you what the natural word length the CPU is capable of handling in one cycle. A 32bit machine needs 2 cycles to handle a 64bit double if it is aligned properly in memory. Most personal computers were and still are 32bit hence the most likely reason for the C compiler typical affinity for 32bit integers with options for larger floating point numbers and long long ints.
Clearly you can compute larger variable sizes so in that sense the CPU's bit architecture determines how it will have to store larger and smaller variables in order to achieve best possible efficiency of processing but it is in no way a limiting factor in the definitions of byte sizes for ints or chars, that is part of compilers and what is dictated by convention or standards.
I found this site very helpful, http://www.geeksforgeeks.org/archives/9705, for explaining how the CPU's natural word length effects how it will chose to store and handle larger and smaller variable types, especially with regards to bit packing into structs. You have to be very cognizant of how you chose to assign variables because larger variables need to be aligned in memory so they take the fewest number of cycles when divided by the CPU's word length. This will add a lot of potentially unnecessary buffer/empty space to things like structs if you poorly order the assignment of your variables.
The simple and correct answer is that it depends on the compiler. It doesn't mean architecture is irrelevant but the compiler deals with that, not your application. You could say more accurately it depends on the (target) architecture of the compiler for example if its 32 bits or 64 bits.
Consider you have windows application that creates a file where it writes an int plus other things and reads it back. What happens if you run this on both 32 bits and 64 bits windows? What happens if you copy the file created on 32 bits system and open it in 64 bits system?
You might think the size of int will be different in each file but no they will be the same and this is the crux of the question. You pick the settings in compiler to target for 32 bits or 64 bits architecture and that dictates everything.
http://www.agner.org/optimize/calling_conventions.pdf
"3 Data representation" contains good overview of what compilers do with integral types.
Data Types Size depends on Processor, because compiler wants to make CPU easier accessible the next byte. for eg: if processor is 32bit, compiler may not choose int size as 2 bytes[which it supposed to choose 4 bytes] because accessing another 2 bytes of that int(4bytes) will take additional CPU cycle which is waste. If compiler chooses int as 4 bytes CPU can access full 4 bytes in one shot which speeds your application.
Thanks
Size of the int is equal to the word-length that depends upon the underlying ISA. Processor is just the hardware implementation of the ISA and the compiler is just the software-side implementation of the ISA. Everything revolves around the underlying ISA. Most popular ISA is Intel's IA-32 these days. it has a word length of 32bits or 4bytes. 4 bytes could be the max size of 'int' (just plain int, not short or long) compilers. based on IA-32, could use.
size of data type basically depends upon the type of compiler and compilers are designed on the basis of architecture of processors so externally data type can be considered to be compiler dependent.for ex size of integer is 2 byte in 16 bit tc compiler but 4 byte in gcc compiler although they are executed in same processor
Yes , I found that size of int in turbo C was 2 bytes where as in MSVC compiler it was 4 bytes.
Basically the size of int is the size of the processor registers.

Why do integers process faster than bytes on NDS?

i've noticed that my nds application works a little faster when I replace all the instances of bytes with integers. all the examples online put u8/u16 instances whenever possible. is there a specific reason as to why this is the case?
The main processor the Nintendo DS utilizes is ARM9, a 32-bit processor.
Reference: http://en.wikipedia.org/wiki/ARM9
Typically, CPU will conduct operations in word sizes, in this case, 32-bits. Depending on your operations, having to convert the bytes up to integers or vice-versa may be causing additional strain on the processor. This conversion and the potential lack of instructions for values other than 32-bit integers may be causing the lack of speed.
Complementary to what Daniel Li said, memory access on ARM platforms must be word aligned, i.e. memory fetches must be multiple of 32 bits. Fetching a byte variable from memory implies in fetching the whole word containing the relevant byte, and performing the needed bit-wise operations to fit it in the least significant bits of the processor register.
Theses extra instructions are automatically emitted by the compiler, given it knows the actual alignment of your variables.

How does more than one byte value is translated?

A character char maybe of size one byte but when it comes to four bytes value e.g int , how does the cpu differ it from an integer instead of four characters per byte?
The CPU executes code which you made.
This code tells the CPU how to treat the few bytes at a certain memory, like "take the four bytes at address 0x87367, treat them as an integer and add one to the value".
See, it's you who decide how to treat the memory.
Are you asking a question about CPU design?
Each CPU machine instruction is encoded so that the CPU knows how many bits to operate on.
The C++ compiler knows to emit 8-bit instructions for char and 32-bit instructions for int.
In general the CPU by itself knows nothing about the interpretation of values stored at certain memory locations, it's the code that is run (generated, in this case, by the compiler) that it's supposed to know it and use the correct CPU instructions to manipulate such values.
To say it in another way, the type of a variable is an abstraction of the language that tells to the compiler what code to generate to manipulate the memory.
Types in some way do exist at machine code level: there are various instructions to work on various types - i.e. the way the raw values stored in memory are interpreted, but it's up to the code executed to use the correct instructions to treat the values stored in memory correctly.
The compiler has table which named "symbols table", so the compiler know which type is every var and how it should regard it.
This depends on the architecture. Most systems use IEEE 754 Floating Point Representation and Two's Compliment for integer values, but it's up to the CPU in question. It knows how to turn those bytes into "values" appropriately.
On the CPU side, this mostly relates to two things: the registers and the instruction set (look at x86 for example).
The register is just a small chunk of memory that is closest to the CPU. Values are put there and used there for doing basic operations.
The instruction set will include a set of fixed names (like EAX, AX, etc.) for addressing memory slots on the register. Depending on the name, they can refer to shorter or longer slots (e.g. 8 bits, 16, 32, 64, etc.). Corresponding to those registers, there are operations (like addition, multiplication, etc.) which act on register values of certain size too. How the CPU actually executes the instructions or even stores the registers is not relevant (it's at the discretion of the manufacturer of the CPU), and it's up to the programmer (or compiler) to use the instruction set correctly.
The CPU itself has no idea what it's doing (it's not "intelligent") it just does the operations as they are requested. The compiler is the one which keeps track of the types of the variables and makes sure that the instructions that are generated and later executed by the program correspond to what you have coded (that's called "compilation"). But once the program is compiled, the CPU doesn't "keep track" of the types or sizes or anything like that (it would be too expensive to do so). Since compilers are pretty much guaranteed to produce instructions that are consistent, this is not an issue. Of course, if you programmed your own code in assembly and used mismatching registers and instructions, the CPU still wouldn't care, it would just make you program behave very weird (likely to crash).
Internally, a CPU may be wired to fetch 32 bits for an integer, which translates into 4 8-bit octets (bytes). The CPU does not regard the fetch as 4 bytes, but rather 32 bits.
The CPU is also internally wired to fetch 8 bits for a character (byte). In many processor architectures, the CPU fetches 32 bits from memory and internally ignores the unused bits (keeping the lowest 8 bits). This simplifies the processor architecture by only requiring fetches of 32 bits.
In efficient platforms, the memory is also accessible in 32-bit quantities. The data flow from the memory to the processor is often called the databus. In this description it would be 32 bits wide.
Other processor architectures can fetch 8 bits for a character. This removes the need for the processor to ignore 3 bytes from a 32-bit fetch.
Some programmers view integers in widths of bytes rather than bits. Thus a 32-bit integer would be thought of as 4 bytes. This can create confusion especially with bit ordering, a.k.a. Endianess. Some processors have the first byte containing the most significant bits (Big Endian), while others have the first byte representing the least significant bits (Little Endian). This leads to problems when transferring binary data between platforms.
Knowing that a processor's integer can hold 4 bytes and that it fetches 4 bytes at a time, many programmers like to pack 4 characters into an integer to improve performance. Thus the processor would require 1 fetch for 4 characters rather than 4 fetches for 4 characters. This performance improvement may be wasted by the execution time required to pack and unpack the characters from the integer.
In summary, I highly suggest you forget about how many bytes make up an integer and any relationship to a quantity of characters or bytes. This concept is only relevant on a few embedded platforms or a few high performance applications. Your objective is to deliver correct and robust code within a given duration. Performance and size concerns are at the end of a project, only tended if somebody complains. You will do fine in your career if you concentrate on the range and limitations of an integer rather than how much memory it occupies.