Characteristics of bit-Fields in C++ - c++

Reading https://en.cppreference.com/w/cpp/language/bit_field, are the following conclusions correct?
whether adjacent bit-fields have no padding in between is implementation-defined (this reads different in https://eel.is/c++draft/class.bit#:bit-field)
the placement of a bit-field within the class-object is implementation-defined
the position of the bits inside a bit-field is implementation-defined (although C++20 defines signed-integer to be 2ths complement).
(for C see: Characteristics of bit-Fields in C)

"The question has three very clear points towards one specific feature in one language. So it would be helpful to get one answer
comprising all three points of the question"
Addressing points one-by-one
The idea that adjacent bit-fields have no padding in between, cannot be guaranteed in current implementations of C++.
Yes, the placement of a bit-field within a C++ class-object is implementation-defined
"...there is no guarantee in the standard that bitfields are mapped to adjacent memory regions, although most sensible implementations would do that..."
referenced from...
In short, the conclusion is that no guarantees exist that bit-field implementation between various new specifications of C++ will be consistent. Portability is therefore difficult, if not impossible from one C++ implementation to the other, forcing that the specifications and other documentation supporting the C++ compiler being used must be consulted for any application using it to be sure of its implementation (rules) regarding how padding, or other attributes of bit-fields are implemented.

Related

Issues with C++ bitfields

I have to write a file header with a specific data format. For simplicity, let's just assume it is:
bits [0-7]: index a
bits [8-9]: index b
bits [10-15]: index c
All of them are simple unsigned integers. I thought I might use bit fields to get a nice syntax. I defined
struct Foo {
unsigned int a : 8, b : 2, c : 6;
};
However, I get sizeof(Foo) == 4. Why is that so? I expected a 2-byte structure here. Is the compiler adding padding between my fields? If I use unsigned char as my member type, I get a size of 2 bytes.
On cppreference, it says:
Multiple adjacent bit fields are usually packed together (although
this behavior is implementation-defined).
Does that mean that I cannot rely on the fields being packed together? Eventually, I will use memcpy to turn this struct into a stream of bytes and write that to a file. Is that not a good use of bit fields? This will only work if these bits are guaranteed to be packed together.
EDIT: The actual header relates to the GIF format. Many indexes are packed into just a few bytes. Some of them are made up of 1, 2, 3 or more bits.
From [class.bit]/1 [extract]:
[...] Allocation of bit-fields within a class object is implementation-defined. Alignment of bit-fields is implementation-defined.
and, from [defns.impl.defined]:
implementation-defined behavior
behavior, for a well-formed program construct and correct data, that
depends on the implementation and that each implementation documents
Thus, for a portable implementation you cannot rely on any specific kind of behaviour for implementation-defined behaviour. If you are developing for a particular platform and compiler, however, you could rely on documented implementation-defined behaviour to a certain extent.

Is Byte Really The Minimum Addressable Unit?

Section 3.6 of C11 standard defines "byte" as "addressable unit of data storage ... to hold ... character".
Section 1.7 of C++11 standard defines "byte" as "the fundamental storage unit in the C++ memory model ... to contain ... character".
Both definitions does not say that "byte" is the minimum addressable unit. Is this because standards intentionally want to abstract from a specific machine ? Can you provide a real example of machine where C/C++ compiler were decided to have "byte" longer/shorter than the minimum addressable unit ?
A byte is the smallest addressable unit in strictly conforming C code. Whether the machine on which the C implementation executes a program supports addressing smaller units is irrelevant to this; the C implementation must present a view in which bytes are the smallest addressable unit in strictly conforming C code.
A C implementation may support addressing smaller units as an extension, such as simply by defining the results of certain pointer operations that are otherwise undefined by the C standard.
One example of a real machine and its compiler where the minimal addressable unit is smaller than a byte is the 8051 family. One compiler I was used to is Keil C51.
The minimal addressable unit is a bit. You can define a variable of this type, you can read and write it. However, the syntax to define the variable is non-standard. Of course, C51 needs several extensions to support all of this. BTW, pointers to bits are not allowed.
For example:
unsigned char bdata bitsAdressable;
sbit bitAddressed = bitsAdressable^5;
void f(void) {
bitAddressed = 1;
}
bit singleBit;
void g(bit value) {
singleBit = value;
}
Both definitions does not say that "byte" is the minimum addressable unit.
That's because they don't need to. Byte-wise types (char, unsigned char, std::byte, etc) have sufficient restrictions that enforce this requirement.
The size of byte-wise types is explicitly defined to be precisely 1:
sizeof(char), sizeof(signed char) and sizeof(unsigned char) are 1.
The alignment of byte-wise types is the smallest alignment possible:
Furthermore, the narrow character types (6.9.1) shall have the weakest alignment requirement
This doesn't have to be an alignment of 1, of course. Except... it does.
See, if the alignment were higher than 1, that would mean that a simple byte array wouldn't work. Array indexing is based on pointer arithmetic, and pointer arithmetic determines the next address based on sizeof(T). But if alignof(T) is greater than sizeof(T), then the second element in any array of T would be misaligned. That's not allowed.
So even though the standard doesn't explicitly say that the alignment of bytewise types is 1, other requirements ensure that it must be.
Overall, this means that every pointer to an object has an alignment at least as restrictive as a byte-wise type. So no object pointer can be misaligned, relative to the alignment of byte-wise types. All valid, non-NULL pointers (pointers to a live object or to a past-the-end pointer) must therefore be at least aligned enough to point to a char.
Similarly, the difference between two pointers is defined in C++ as the difference between the array indices of the elements pointed to by those pointers (pointer arithmetic in C++ requires that the two pointers point into the same array). Additive pointer arithmetic is as previously stated based on the sizeof the type being pointed to.
Given all of these facts, even if an implementation has pointers whose addresses can address values smaller than char, it is functionally impossible for the C++ abstract model to generate a pointer and still have that pointer count as valid (pointing to an object/function, a past-the-end of an array, or be NULL). You could create such a pointer value with a cast from an integer. But you would be creating an invalid pointer value.
So while technically there could be smaller addresses on the machine, you could never actually use them in a valid, well-formed C++ program.
Obviously compiler extensions could do anything. But as far as conforming programs are concerned, it simply isn't possible to generate valid pointers that are misaligned for byte-wise types.
I programmed both the TMS34010 and its successor TMS34020 graphics chips back in the early 1990's and they had a flat address space and were bit addressable i.e. addresses indexed each bit. This was very useful for computer graphics of the time and back when memory was a lot more precious.
The embedded C-compiler didn't really have away to access individual bits directly, since from a (standard) C language point of view the byte was still the smallest unit as pointed out in a previous post.
Thus if you want to read/write a stream of bits in C, you need to read/write (at least) a byte at a time and buffer (for example when writing a Arithmetic or Huffman Coder).
(Thank you everyone who commented and answered, every word helps)
Memory model of a programming language and memory model of the target machine are different things.
Yes, byte is the minimum addressable unit in context of memory model of programming language.
No, byte is not the minimum addressable unit in context of memory model of machine. For example, there are machines where minimum addressable unit is longer or shorter than the "byte" of programming language:
longer: HP Saturn - 4-bit unit vs 8-bit byte gcc (thanks Nate).
shorter: IBM 360 - 36-bit unit vs 6-bit byte (thanks Antti)
longer: Intel 8051 - 1-bit unit vs 8-bit byte (thanks Busybee)
longer: Ti TMS34010 - 1-bit unit vs 8-bit byte (thanks Wcochran)

How adaptible are the C and C++ standards to a hypothetical ternary hardware architecture?

How easily could you program a ternary computer in C or C++?
Obviously, the standard logical operators (like &, | and ^) only make sense when using binary logic.
For integer values, the C standard refers to value ranges while the C++ standard mentions bit lengths (eg. long has to be at least 32bit long). How would that apply to a computer using trits (i.e. ternary bits) ?
Would it, in general, be practical to use a slightly modified version of C/C++ for programming on a ternary architecture, or should you design a new programming language from scratch?
Important points to consider would be backward compatibility (could binary-assuming programs be easily compiled for a ternary architecture, or would an emulation of binary data storage be necessary?) and assumptions implicit in the design of the C/C++ standards.
The wording of the C++ standard assumes a binary architecture:
[intro.memory]/1:
The fundamental storage unit in the C++ memory model is the byte. A
byte is at least large enough to contain any member of the basic
execution character set and the eight-bit code units of the
Unicode UTF-8 encoding form and is composed of a contiguous sequence
of bits, the number of which is implementation-defined.
[basic/fundamental]/4:
Unsigned integers shall obey the laws of arithmetic modulo 2 [raised
to the power of] n where n is the number of bits in the value
representation of that particular size of integer.
Furthermore, bit-fields and padding bits are frequently used concepts.
Operators like left-shift, right-shift are also referring to bits, and bitwise-and, bitwise-or and bitwise-xor are by definition operation that operate at bit level asuming that each bit is either true or false.
What if the standard would be adapted to ternary architecture ?
We could imagine that the standard could use another term to designate the smallest piece of information in the architecture, in a similar way than it was done for the byte (the byte although most often 8 bits is not defined as such in the standard, so that the standard could well work with machines having 10 bit bytes).
Nevertheless the consequences would be terrible:
left-shift for example is assumed in many algorithms to multiply by a power of 2, and suddenly, it would multiply by a power of 3. Same for right-shift. So a lot of existing code would not work anymore.
bitwise operations are not defined for trits: they are only defined for binary bits. So the standard would have to redefine them in a way or another (for example by emulating the original behavior with some kind of power of 2 maths). Again, their are chances that some existing code gets broken, esepecially if used in combination with shifts.
Additional remark
Since the visionary book "Cybernetics" of Norbert Wiener published in 1948 (!!!) it makes no doubt anymore that alternatives to the binary sytems are nout. In the chapter "Computing machines and the nervous system" he explained very well why numerical machines are more accurate and performant than analog ones, and that among the numerical machines, the binary arithmetic outperformed the others because it was simpler, faster and in addition easier and cheaper to implement. For the time being, nobody achieved to demonstrate the contrary, so no ternary computer architecture in sight soon.
Comments
Peter points out in the comments that the implementation just has to offer the specified behavior of the abstract machine defined in the C++ standard. This is true according to [intro.abstract]/5. But my point is that this is only a theoretical truth in view of ternary machines.
The binary model is such a strong assumption in so many places of the standard, and intertwined with the addressing scheme, that I will pretend that it is impossible to emulate in an efficient and consistent manner on a ternary machine.
Just to illustrate the issue with the definition of bytes: it requires 6 trits to fit the requirements for a byte. However 6 trits corresponds to 9,5 bits. In order for a byte to correspond to a consecutive number of bits as required by the standard you'd need it to be s trits so that pow(3,s) == pow(2,n). This equation has no solutions. Alternatively you could say that a byte is 9 bits stored into 6 trits and that you just ignore some ternary values. But as bytes are used to store pointers, you'd also ignore some memory ranges. So you'd need a mapping function to convert between values stored in bytes and machine addresses. But what then with hardware alignment constraints ? These might not correspond to alignments that can be expressed according to the binary model, etc... In the end you would need to have a slow virtual machine that completely emulates by software a binary architecture (certainly with the same level of performance than the many MIPS emulators on the x86 architecture, so ok for educational purpose). I think that this could then comply with the standard, but no longer our performance expectations.

Why isn't there an endianness modifier in C++ like there is for signedness?

(I guess this question could apply to many typed languages, but I chose to use C++ as an example.)
Why is there no way to just write:
struct foo {
little int x; // little-endian
big long int y; // big-endian
short z; // native endianness
};
to specify the endianness for specific members, variables and parameters?
Comparison to signedness
I understand that the type of a variable not only determines how many bytes are used to store a value but also how those bytes are interpreted when performing computations.
For example, these two declarations each allocate one byte, and for both bytes, every possible 8-bit sequence is a valid value:
signed char s;
unsigned char u;
but the same binary sequence might be interpreted differently, e.g. 11111111 would mean -1 when assigned to s but 255 when assigned to u. When signed and unsigned variables are involved in the same computation, the compiler (mostly) takes care of proper conversions.
In my understanding, endianness is just a variation of the same principle: a different interpretation of a binary pattern based on compile-time information about the memory in which it will be stored.
It seems obvious to have that feature in a typed language that allows low-level programming. However, this is not a part of C, C++ or any other language I know, and I did not find any discussion about this online.
Update
I'll try to summarize some takeaways from the many comments that I got in the first hour after asking:
signedness is strictly binary (either signed or unsigned) and will always be, in contrast to endianness, which also has two well-known variants (big and little), but also lesser-known variants such as mixed/middle endian. New variants might be invented in the future.
endianness matters when accessing multiple-byte values byte-wise. There are many aspects beyond just endianness that affect the memory layout of multi-byte structures, so this kind of access is mostly discouraged.
C++ aims to target an abstract machine and minimize the number of assumptions about the implementation. This abstract machine does not have any endianness.
Also, now I realize that signedness and endianness are not a perfect analogy, because:
endianness only defines how something is represented as a binary sequence, but now what can be represented. Both big int and little int would have the exact same value range.
signedness defines how bits and actual values map to each other, but also affects what can be represented, e.g. -3 can't be represented by an unsigned char and (assuming that char has 8 bits) 130 can't be represented by a signed char.
So that changing the endianness of some variables would never change the behavior of the program (except for byte-wise access), whereas a change of signedness usually would.
What the standard says
[intro.abstract]/1:
The semantic descriptions in this document define a parameterized nondeterministic abstract machine.
This document places no requirement on the structure of conforming implementations.
In particular, they need not copy or emulate the structure of the abstract machine.
Rather, conforming implementations are required to emulate (only) the observable behavior of the abstract machine as explained below.
C++ could not define an endianness qualifier since it has no concept of endianness.
Discussion
About the difference between signness and endianness, OP wrote
In my understanding, endianness is just a variation of the same principle [(signness)]: a different interpretation of a binary pattern based on compile-time information about the memory in which it will be stored.
I'd argue signness both have a semantic and a representative aspect1. What [intro.abstract]/1 implies is that C++ only care about semantic, and never addresses the way a signed number should be represented in memory2. Actually, "sign bit" only appears once in the C++ specs and refer to an implementation-defined value.
On the other hand, endianness only have a representative aspect: endianness conveys no meaning.
With C++20, std::endian appears. It is still implementation-defined, but let us test the endian of the host without depending on old tricks based on undefined behaviour.
1) Semantic aspect: an signed integer can represent values below zero; representative aspect: one need to, for example, reserve a bit to convey the positive/negative sign.
2) In the same vein, C++ never describe how a floating point number should be represented, IEEE-754 is often used, but this is a choice made by the implementation, in any case enforced by the standard: [basic.fundamental]/8 "The value representation of floating-point types is implementation-defined".
In addition to YSC's answer, let's take your sample code, and consider what it might aim to achieve
struct foo {
little int x; // little-endian
big long int y; // big-endian
short z; // native endianness
};
You might hope that this would exactly specify layout for architecture-independent data interchange (file, network, whatever)
But this can't possibly work, because several things are still unspecified:
data type size: you'd have to use little int32_t, big int64_t and int16_t respectively, if that's what you want
padding and alignment, which cannot be controlled strictly within the language: use #pragma or __attribute__((packed)) or some other compiler-specific extension
actual format (1s- or 2s-complement signedness, floating-point type layout, trap representations)
Alternatively, you might simply want to reflect the endianness of some specified hardware - but big and little don't cover all the possibilities here (just the two most common).
So, the proposal is incomplete (it doesn't distinguish all reasonable byte-ordering arrangements), ineffective (it doesn't achieve what it sets out to), and has additional drawbacks:
Performance
Changing the endianness of a variable from the native byte ordering should either disable arithmetic, comparisons etc (since the hardware cannot correctly perform them on this type), or must silently inject more code, creating natively-ordered temporaries to work on.
The argument here isn't that manually converting to/from native byte order is faster, it's that controlling it explicitly makes it easier to minimise the number of unnecessary conversions, and much easier to reason about how code will behave, than if the conversions are implicit.
Complexity
Everything overloaded or specialized for integer types now needs twice as many versions, to cope with the rare event that it gets passed a non-native-endianness value. Even if that's just a forwarding wrapper (with a couple of casts to translate to/from native ordering), it's still a lot of code for no discernible benefit.
The final argument against changing the language to support this is that you can easily do it in code. Changing the language syntax is a big deal, and doesn't offer any obvious benefit over something like a type wrapper:
// store T with reversed byte order
template <typename T>
class Reversed {
T val_;
static T reverse(T); // platform-specific implementation
public:
explicit Reversed(T t) : val_(reverse(t)) {}
Reversed(Reversed const &other) : val_(other.val_) {}
// assignment, move, arithmetic, comparison etc. etc.
operator T () const { return reverse(val_); }
};
Integers (as a mathematical concept) have the concept of positive and negative numbers. This abstract concept of sign has a number of different implementations in hardware.
Endianness is not a mathematical concept. Little-endian is a hardware implementation trick to improve the performance of multi-byte twos-complement integer arithmetic on a microprocessor with 16 or 32 bit registers and an 8-bit memory bus. Its creation required using the term big-endian to describe everything else that had the same byte-order in registers and in memory.
The C abstract machine includes the concept of signed and unsigned integers, without details -- without requiring twos-complement arithmetic, 8-bit bytes or how to store a binary number in memory.
PS: I agree that binary data compatibility on the net or in memory/storage is a PIA.
That's a good question and I have often thought something like this would be useful. However you need to remember that C aims for platform independence and endianness is only important when a structure like this is converted into some underlying memory layout. This conversion can happen when you cast a uint8_t buffer into an int for example. While an endianness modifier looks neat the programmer still needs to consider other platform differences such as int sizes and structure alignment and packing.
For defensive programming when you want find grain control over how some variables or structures are represented in a memory buffer then it is best to code explicit conversion functions and then let the compiler optimiser generate the most efficient code for each supported platform.
Endianness is not inherently a part of a data type but rather of its storage layout.
As such, it would not be really akin to signed/unsigned but rather more like bit field widths in structs. Similar to those, they could be used for defining binary APIs.
So you'd have something like
int ip : big 32;
which would define both storage layout and integer size, leaving it to the compiler to do the best job of matching use of the field to its access. It's not obvious to me what the allowed declarations should be.
Short Answer: if it should not be possible to use objects in arithmetic expressions (with no overloaded operators) involving ints, then these objects should not be integer types. And there is no point in allowing addition and multiplication of big-endian and little-endian ints in the same expression.
Longer Answer:
As someone mentioned, endianness is processor-specific. Which really means that this is how numbers are represented when they are used as numbers in the machine language (as addresses and as operands/results of arithmetic operations).
The same is "sort of" true of signage. But not to the same degree. Conversion from language-semantic signage to processor-accepted signage is something that needs to be done to use numbers as numbers. Conversion from big-endian to little-endian and reverse is something that needs to be done to use numbers as data (send them over the network or represent metadata about data sent over the network such as payload lengths).
Having said that, this decision appears to be mostly driven by use cases. The flip side is that there is a good pragmatic reason to ignore certain use cases. The pragmatism arises out of the fact that endianness conversion is more expensive than most arithmetic operations.
If a language had semantics for keeping numbers as little-endian, it would allow developers to shoot themselves in the foot by forcing little-endianness of numbers in a program which does a lot of arithmetic. If developed on a little-endian machine, this enforcing of endianness would be a no-op. But when ported to a big-endian machine, there would a lot of unexpected slowdowns. And if the variables in question were used both for arithmetic and as network data, it would make the code completely non-portable.
Not having these endian semantics or forcing them to be explicitly compiler-specific forces the developers to go through the mental step of thinking of the numbers as being "read" or "written" to/from the network format. This would make the code which converts back and forth between network and host byte order, in the middle of arithmetic operations, cumbersome and less likely to be the preferred way of writing by a lazy developer.
And since development is a human endeavor, making bad choices uncomfortable is a Good Thing(TM).
Edit: here's an example of how this can go badly:
Assume that little_endian_int32 and big_endian_int32 types are introduced. Then little_endian_int32(7) % big_endian_int32(5) is a constant expression. What is its result? Do the numbers get implicitly converted to the native format? If not, what is the type of the result? Worse yet, what is the value of the result (which in this case should probably be the same on every machine)?
Again, if multi-byte numbers are used as plain data, then char arrays are just as good. Even if they are "ports" (which are really lookup values into tables or their hashes), they are just sequences of bytes rather than integer types (on which one can do arithmetic).
Now if you limit the allowed arithmetic operations on explicitly-endian numbers to only those operations allowed for pointer types, then you might have a better case for predictability. Then myPort + 5 actually makes sense even if myPort is declared as something like little_endian_int16 on a big endian machine. Same for lastPortInRange - firstPortInRange + 1. If the arithmetic works as it does for pointer types, then this would do what you'd expect, but firstPort * 10000 would be illegal.
Then, of course, you get into the argument of whether the feature bloat is justified by any possible benefit.
From a pragmatic programmer perspective searching Stack Overflow, it's worth noting that the spirit of this question can be answered with a utility library. Boost has such a library:
http://www.boost.org/doc/libs/1_65_1/libs/endian/doc/index.html
The feature of the library most like the language feature under discussion is a set of arithmetic types such as big_int16_t.
Because nobody has proposed to add it to the standard, and/or because compiler implementer have never felt a need for it.
Maybe you could propose it to the committee. I do not think it is difficult to implement it in a compiler: compilers already propose fundamental types that are not fundamental types for the target machine.
The development of C++ is an affair of all C++ coders.
#Schimmel. Do not listen to people who justify the status quo! All the cited arguments to justify this absence are more than fragile. A student logician could find their inconsistence without knowing anything about computer science. Just propose it, and just don't care about pathological conservatives. (Advise: propose new types rather than a qualifier because the unsigned and signed keywords are considered mistakes).
Endianness is compiler specific as a result of being machine specific, not as a support mechanism for platform independence. The standard -- is an abstraction that has no regard for imposing rules that make things "easy" -- its task is to create similarity between compilers that allows the programmer to create "platform independence" for their code -- if they choose to do so.
Initially, there was a lot of competition between platforms for market share and also -- compilers were most often written as proprietary tools by microprocessor manufacturers and to support operating systems on specific hardware platforms. Intel was likely not very concerned about writing compilers that supported Motorola microprocessors.
C was -- after all -- invented by Bell Labs to rewrite Unix.

Why alignment is power of 2?

There is a quote from cppreference:
Every object type has the property called alignment requirement, which
is an integer value (of type std::size_t, always a power of 2)
representing the number of bytes between successive addresses at which
objects of this type can be allocated.
I understand, this reference is non-normative. But there is no something about value of alignof(T) in the standard, rather than it is no more than alignof(std::max_align_t).
It is not obviously, that alignment is power of 2. Why does alignment not be a 3?
The standard has the final word for the language, so here a quote of that section. I bolded the power-of-2 requirement:
3.11 Alignment [basic.align]
1 Object types have alignment requirements (3.9.1, 3.9.2) which place restrictions on the addresses at which an object of that type may be allocated. An alignment is an implementation-defined integer value representing the number of bytes between successive addresses at which a given object can be allocated. An object type imposes an alignment requirement on every object of that type; stricter alignment can be requested using the alignment specifier (7.6.2).
2 A fundamental alignment is represented by an alignment less than or equal to the greatest alignment supported by the implementation in all contexts, which is equal to alignof(std::max_align_t) (18.2). The alignment required for a type might be different when it is used as the type of a complete object and when it is used as the type of a subobject. [ Example:
struct B { long double d; };
struct D : virtual B { char c; }
When D is the type of a complete object, it will have a subobject of type B, so it must be aligned appropriately for a long double. If D appears as a subobject of another object that also has B as a virtual base class, the B subobject might be part of a different subobject, reducing the alignment requirements on the D subobject. —end example ] The result of the alignof operator reflects the alignment requirement of the type in the complete- object case.
3 An extended alignment is represented by an alignment greater than alignof(std::max_align_t). It is implementation-defined whether any extended alignments are supported and the contexts in which they are supported (7.6.2). A type having an extended alignment requirement is an over-aligned type. [ Note: every over-aligned type is or contains a class type to which extended alignment applies (possibly through a non-static data member). —end note ]
4 Alignments are represented as values of the type std::size_t. Valid alignments include only those values returned by an alignof expression for the fundamental types plus an additional implementation-defined set of values, which may be empty. Every alignment value shall be a non-negative integral power of two.
5 Alignments have an order from weaker to stronger or stricter alignments. Stricter alignments have larger alignment values. An address that satisfies an alignment requirement also satisfies any weaker valid alignment requirement.
Why did all implementations conform to that requirement (That's part of the reason it could be included at all)?
Well, because it is natural to multiply / divide / mask powers of 2 in binary, and all systems were (excluding some really ancient ones), are, and for the foreseeable future will stay fundamentally binary.
Being natural means it is much more efficient than any other multiplications / divisions / modulo arithmetic, sometimes by orders of magnitude.
As #MooingDuck points out, this fundamental binary nature of computing platforms has already pervaded the language and its standard to such an extent, trying to build a non-binary conforming implementation is about on-par with untying the gordian knot without just cutting it. There are really few computer languages where that's not true.
Also, see a table of word sizes on wikipedia for corroboration.
That's how computers are built.
A computer has a natural 'word' size that is handled more easily than other sizes. On 64-bit CPUs, the size is 8-bytes. Operating on 8-bytes is most efficient. The hardware is built in a way that fetching memory that is aligned to this word size is also more efficient. So alignment is usually based on the CPU's word size.
Word sizes are powers of two because, again, that's how computers are built. Everything comes down to bits - so does the number of bits in a word. It's easier to design the hardware where the number of bits in a word is itself a power of two.