Related
I want to read a file 32 bytes at a time using a C/C++ program, but I want to be sure that the data will be 256 bits. In essence I am worried about leading bits in the "bytes" that I read from the file being off ? Is that even a matter of concern ?
Example : If I have a number say 2 represented in binary as 10 . This would be sufficient for me as a human.
How is that different as far a computer is concerned if it's written as: 00000010 to represent a char value of 1 byte ??? Would the leading zeros affect the bit count ? Does that in turn affect operations like XOR ?
I've trouble understanding its effects ! Does that involve data loss ? I really do not know... !
Every help to clear my misunderstanding will be appreciated !!!
Every routine in the C standard library that reads from a file or stream reads in units of bytes. Each byte read is a fixed number of bits; what is read from a file does not vary due to leading zeros or lack thereof in a byte. Some routines return a single character (which is a byte). Some routines put data read into a buffer and return a count of bytes read. Some routines, such as scanf, return a count of the number of items successfully converted. (You generally would not use these routines to read a fixed number of bytes.)
The number of bits in a byte is set by the C implementation. It is reported in CHAR_BIT, defined in <limits.h>. It is eight in all common C implementations.
Although the number of bits per byte does not vary, the number of bytes read from a stream can vary. If a stream is opened as a text stream, characters may be “added, altered, or deleted on input and output to conform to differing conventions for representing text in the host environment” (C 2018 7.21.2 2). To avoid this, you should open a stream as a binary stream.
The CHAR_BIT macro (defined in climits) will tell you how many bits make up a byte in the execution environment. However I am not aware of any recent general-purpose hardware that uses bytes of other than 8 bits. Some specialized processors (such as for digital signal processing) may use other sizes. Also completely outdated equipment used a wide variety of sizes (the typical alternative to 8 bits being 9).
No
C++ allows a char to be any size a platform requires. However, the macro CHAR_BIT always has the number of bits in a char.
So, to find out the number of bits in 32 bytes, you would use the formula 32*CHAR_BIT.
C++17 has introduced the new type std::byte that is not a character type and is always CHAR_BIT bits, as explained in the SO question std::byte on odd platforms
In order to find the number of bytes needed to hold 256 bits, you have a problem, because CHAR_BIT isn't always a divisor of 256. So, you have to decide what you want and use a more complicated formula. For example, 1+(255+CHAR_BIT)/CHAR_BIT will give you the number of bytes needed to hold 256 contiguous bits.
This is with reference to the text from C++ Primer Plus by Stephen Prata-
A byte means a 8-bit unit of memory in the sense of unit of measurement that describes the amount of memory in a computer.
However, C++ defines byte differently. The C++ byte consists of at least enough adjacent bits to accommodate the basic character set for the implementation.
Can you please explain if a C++ compiler have 16-bit byte whereas the system has 8-bit byte then how will the program run on such system?
What the author wants to say about the size of a byte is that, quoting from Wikipedia:
The popularity of major commercial computing architectures has aided in the ubiquitous acceptance of the 8-bit size.
On the other hand, the unit of memory in C++ is given by the built-in type char; under some implementation, a char may not be an 8-bit memory chunk; though, in your C++ program every sizeof(T) will be expressed in multiples of sizeof(char), that is equal to 1 by definition.
The number of bit in a byte for a particular implementation is recorded into the macro CHAR_BIT, defined inside the standard header <climits>. It is guaranteed that char is at least 8-bits.
Finally, this is the definition of byte given by the C++ Standard (§1.7, intro.memory) :
The fundamental storage unit in the C++ memory model is the byte. A byte is at least large enough to contain
any member of the basic execution character set (2.3) and the eight-bit code units of the Unicode UTF-8
encoding form and is composed of a contiguous sequence of bits, the number of which is implementationdefined.
The least significant bit is called the low-order bit; the most significant bit is called the high-order
bit. The memory available to a C++ program consists of one or more sequences of contiguous bytes. Every
byte has a unique address.
A byte means a 8-bit unit of memory.
That is incorrect.
However, C++ defines byte differently.
That is also incorrect.
In both C++ terminology and general parlance, a byte is the minimum unit of memory. An 8-bit byte is known as an octet.
Can you please explain if a C++ compiler have 16-bit byte whereas the system has 8-bit byte then how will the program run on such system?
It won't. If you compile a program for an architecture whose bytes are 16-bit, it will not run on a computer with an architecture whose bytes are 8-bit.
You have to compile for the processor you're using.
There used to be machines that had either variable byte size, or a byte size smaller than 8. The spec leaves it open to implementation on the given hardware.
The DEC PDP-10 had a 36 bit word size, and you could specify the size of a byte (usually 5 7 bit bytes to the word...)
http://pdp10.nocrew.org/docs/instruction-set/Byte.html
Why are all data type sizes always a power of 2?
Let's take two examples:
short int 16
char 8
Why are they not the like following?
short int 12
That's an implementation detail, and it isn't always the case. Some exotic architectures have non-power-of-two data types. For example, 36-bit words were common at one stage.
The reason powers of two are almost universal these days is that it typically simplifies internal hardware implementations. As a hypothetical example (I don't do hardware, so I have to confess that this is mostly guesswork), the portion of an opcode that indicates how large one of its arguments is might be stored as the power-of-two index of the number of bytes in the argument, thus two bits is sufficient to express which of 8, 16, 32 or 64 bits the argument is, and the circuitry required to convert that into the appropriate latching signals would be quite simple.
The reason why builtin types are those sizes is simply that this is what CPUs support natively, i.e. it is the fastest and easiest. No other reason.
As for structs, you can have variables in there which have (almost) any number of bits, but you will usually want to stay with integral types unless there is a really urgent reason for doing otherwise.
You will also usually want to group identical-size types together and start a struct with the largest types (usually pointers).That will avoid needless padding and it will make sure you don't have access penalties that some CPUs exhibit with misaligned fields (some CPUs may even trigger an exception on unaligned access, but in this case the compiler would add padding to avoid it, anyway).
The size of char, short, int, long etc differ depending on the platform. 32 bit architectures tend to have char=8, short=16, int=32, long=32. 64 bit architectures tend to have char=8, short=16, int=32, long=64.
Many DSPs don't have power of 2 types. For example, Motorola DSP56k (a bit dated now) has 24 bit words. A compiler for this architecture (from Tasking) has char=8, short=16, int=24, long=48. To make matters confusing, they made the alignment of char=24, short=24, int=24, long=48. This is because it doesn't have byte addressing: the minimum accessible unit is 24 bits. This has the exciting (annoying) property of involving lots of divide/modulo 3 when you really do have to access an 8 bit byte in an array of packed data.
You'll only find non-power-of-2 in special purpose cores, where the size is tailored to fit a special usage pattern, at an advantage to performance and/or power. In the case of 56k, this was because there was a multiply-add unit which could load two 24 bit quantities and add them to a 48 bit result in a single cycle on 3 buses simultaneously. The entire platform was designed around it.
The fundamental reason most general purpose architectures use powers-of-2 is because they standardized on the octet (8 bit bytes) as the minimum size type (aside from flags). There's no reason it couldn't have been 9 bit, and as pointed out elsewhere 24 and 36 bit were common. This would permeate the rest of the design: if x86 was 9 bit bytes, we'd have 36 octet cache lines, 4608 octet pages, and 569KB would be enough for everyone :) We probably wouldn't have 'nibbles' though, as you can't divide a 9 bit byte in half.
This is pretty much impossible to do now, though. It's all very well having a system designed like this from the start, but inter-operating with data generated by 8 bit byte systems would be a nightmare. It's already hard enough to parse 8 bit data in a 24 bit DSP.
Well, they are powers of 2 because they are multiples of 8, and this comes (simplifying a little) from the fact that usually the atomic allocation unit in memory is a byte, which (edit: often, but not always) is made by 8 bits.
Bigger data sizes are made taking multiple bytes at a time.
So you could have 8,16,24,32... data sizes.
Then, for the sake of memory access speed, only powers of 2 are used as a multiplier of the minimum size (8), so you get data sizes along these lines:
8 => 8 * 2^0 bits => char
16 => 8 * 2^1 bits => short int
32 => 8 * 2^2 bits => int
64 => 8 * 2^3 bits => long long int
8 bits is the most common size for a byte (but not the only size, examples of 9 bit bytes and other byte sizes are not hard to find). Larger data types are almost always multiples of the byte size, hence they will typically be 16, 32, 64, 128 bits on systems with 8 bit bytes, but not always powers of 2, e.g. 24 bits is common for DSPs, and there are 80 bit and 96 bit floating point types.
The sizes of standard integral types are defined as multiple of 8 bits, because a byte is 8-bits (with a few extremely rare exceptions) and the data bus of the CPU is normally a multiple of 8-bits wide.
If you really need 12-bit integers then you could use bit fields in structures (or unions) like this:
struct mystruct
{
short int twelveBitInt : 12;
short int threeBitInt : 3;
short int bitFlag : 1;
};
This can be handy in embedded/low-level environments - but bear in mind that the overall size of the structure will still be packed out to the full size.
They aren't necessarily. On some machines and compilers, sizeof(long double) == 12 (96 bits).
It's not necessary that all data types use of power of 2 as number of bits to represent. For example, long double uses 80 bits(though its implementation dependent on how much bits to allocate).
One advantage you gain with using power of 2 is, larger data types can be represented as smaller ones. For example, 4 chars(8 bits each) can make up an int(32 bits). In fact, some compilers used to simulate 64 bit numbers using two 32 bit numbers.
Most of the times your computer tries to keep all data formats in either a whole multiple (2, 3, 4...) or a whole part (1/2, 1/3, 1/4...) of the machine data size. It does this so that each time it loads N data words it loads an integer number of bits of information for you. That way, it doesn't have to recombine parts later on.
You can see this in the x86 for example:
a char is 1/4th of 32-bits
a short is 1/2 of 32-bits
an int / long are a whole 32 bits
a long long is 2x 32 bits
a float is a single 32-bits
a double is two times 32-bits
a long double may either be three or four times 32-bits, depending on your compiler settings. This is because for 32-bit machines it's three native machine words (so no overhead) to load 96 bits. On 64-bit machines it is 1.5 native machine word, so 128 bits would be more efficient (no recombining). The actual data content of a long double on x86 is 80 bits, so both of these are already padded.
A last aside, the computer doesn't always load in its native data size. It first fetches a cache line and then reads from that in native machine words. The cache line is larger, usually around 64 or 128 bytes. It's very useful to have a meaningful bit of data fit into this and not be stuck on the edge as you'd have to load two whole cache lines to read it then. That's why most computer structures are a power of two in size; it will fit in any power of two size storage either half, completely, double or more - you're guaranteed to never end up on a boundary.
There are a few cases where integral types must be an exact power of two. If the exact-width types in <stdint.h> exist, such as int16_t or uint32_t, their widths must be exactly that size, with no padding. Floating-point math that declares itself to follow the IEEE standard forces float and double to be powers of two (although long double often is not). There are additionally types char16_t and char32_t in the standard library now, or built-in to C++, defined as exact-width types. The requirements about support for UTF-8 in effect mean that char and unsigned char have to be exactly 8 bits wide.
In practice, a lot of legacy code would already have broken on any machine that didn’t support types exactly 8, 16, 32 and 64 bits wide. For example, any program that reads or writes ASCII or tries to connect to a network would break.
Some historically-important mainframes and minicomputers had native word sizes that were multiples of 3, not powers of two, particularly the DEC PDP-6, PDP-8 and PDP-10.
This was the main reason that base 8 used to be popular in computing: since each octal digit represented three bits, a 9-, 12-, 18- or 36-bit pattern could be represented more neatly by octal digits than decimal or hex. For example, when using base-64 to pack characters into six bits instead of eight, each packed character took up two octal digits.
The two most visible legacies of those architectures today are that, by default, character escapes such as '\123' are interpreted as octal rather than decimal in C, and that Unix file permissions/masks are represented as three or four octal digits.
Hello I am writing a library for communicating to an external device via rs-232 serial connection.
Often I have to communicate a command that includes an 8 bit = 1 byte character or a 16 bit = 2 byte number. How do I do this in a portable way?
Main problem
From reading other questions it seems that the standard does not guarantee 1byte = 8bits,
(defined in the Standard $1.7/1)
The fundamental storage unit in the C
+ + memory model is the byte. A byte is at least large enough to contain
any member of the basic execution
character set and is composed of a
contiguous sequence of bits, the
number of which is
implementation-defined.
How can I guarantee the number of bits of char? My device expects 8-bits exactly, rather than at least 8 bits.
I realise that almost all implementations have 1byte = 8 bits but I am curious as to how to guarantee it.
Short->2 byte check
I hope you don't mind, I would also like to run my proposed solution for the short -> 2 byte conversion by you. I am new to byte conversions and cross platform portability.
To guarantee the the number of bytes of the short I guess I am going to need to need to
do a sizeof(short). If sizeof(short)=2 Convert to bytes and check the byte ordering (as here)
if sizeof(short)>2 then convert the short to bytes, check the byte ordering (as here), then check the most significant bytes are empty and remove them?
Is this the right thing to do? Is there a better way?
Many thanks
AFAIK, communication with the serial port is somehow platform/OS dependent, so when you write the low level part of it, you'll know very well the platform, its endianness and CHAR_BIT. In this way the question does not have any sense.
Also, don't forget that UART hardware is able to transmit 7 or 8 bit words, so it does not depend on the system architecture.
EDIT: I mentioned that the UART's word is fixed (let's consider mode 3 with 8 bits, as the most standard), the hardware itself won't send more than 8 bits, so by giving it one send command, it will send exactly 8 bits, regardless of the machine's CHAR_BIT. In this way, by using one single send for byte and
unsigned short i;
send(i);
send(i>>8);
you can be sure it will do the right thing.
Also, a good idea would be to see what exactly boost.asio is doing.
This thread seems to suggest you can use CHAR_BIT from <climits>. This page even suggests 8 is the minimum amount of bits in a char... Don't know how the quote from the standard relates to this.
For fixed-size integer types, if using MSVC2010 or GCC, you can rely on C99's <stdint.h> (even in C++) to define (u)int8_t and (u)int16_t which are guaranteed to be exactly 8 and 16 bits wide respectively.
CHAR_BIT from the <climits> header tells you the number of bits in a char. This is at least 8. Also, a short int uses at least 16 bits for its value representation. This is guaranteed by the minimum value ranges:
type can at least represent
---------------------------------------
unsigned char 0...255
signed char -127...127
unsigned short 0...65535
signed short -32767...32767
unsigned int 0...65535
signed int -32767...32767
see here
Regarding portability, whenever I write code that relies on CHAR_BIT==8 I simply write this:
#include <climits>
#if CHAR_BIT != 8
#error "I expect CHAR_BIT==8"
#endif
As you said, this is true for almost all platforms and if it's not in a particular case, it won't compile. That's enough portability for me. :-)
Every now and then, someone on SO points out that char (aka 'byte') isn't necessarily 8 bits.
It seems that 8-bit char is almost universal. I would have thought that for mainstream platforms, it is necessary to have an 8-bit char to ensure its viability in the marketplace.
Both now and historically, what platforms use a char that is not 8 bits, and why would they differ from the "normal" 8 bits?
When writing code, and thinking about cross-platform support (e.g. for general-use libraries), what sort of consideration is it worth giving to platforms with non-8-bit char?
In the past I've come across some Analog Devices DSPs for which char is 16 bits. DSPs are a bit of a niche architecture I suppose. (Then again, at the time hand-coded assembler easily beat what the available C compilers could do, so I didn't really get much experience with C on that platform.)
char is also 16 bit on the Texas Instruments C54x DSPs, which turned up for example in OMAP2. There are other DSPs out there with 16 and 32 bit char. I think I even heard about a 24-bit DSP, but I can't remember what, so maybe I imagined it.
Another consideration is that POSIX mandates CHAR_BIT == 8. So if you're using POSIX you can assume it. If someone later needs to port your code to a near-implementation of POSIX, that just so happens to have the functions you use but a different size char, that's their bad luck.
In general, though, I think it's almost always easier to work around the issue than to think about it. Just type CHAR_BIT. If you want an exact 8 bit type, use int8_t. Your code will noisily fail to compile on implementations which don't provide one, instead of silently using a size you didn't expect. At the very least, if I hit a case where I had a good reason to assume it, then I'd assert it.
When writing code, and thinking about cross-platform support (e.g. for general-use libraries), what sort of consideration is it worth giving to platforms with non-8-bit char?
It's not so much that it's "worth giving consideration" to something as it is playing by the rules. In C++, for example, the standard says all bytes will have "at least" 8 bits. If your code assumes that bytes have exactly 8 bits, you're violating the standard.
This may seem silly now -- "of course all bytes have 8 bits!", I hear you saying. But lots of very smart people have relied on assumptions that were not guarantees, and then everything broke. History is replete with such examples.
For instance, most early-90s developers assumed that a particular no-op CPU timing delay taking a fixed number of cycles would take a fixed amount of clock time, because most consumer CPUs were roughly equivalent in power. Unfortunately, computers got faster very quickly. This spawned the rise of boxes with "Turbo" buttons -- whose purpose, ironically, was to slow the computer down so that games using the time-delay technique could be played at a reasonable speed.
One commenter asked where in the standard it says that char must have at least 8 bits. It's in section 5.2.4.2.1. This section defines CHAR_BIT, the number of bits in the smallest addressable entity, and has a default value of 8. It also says:
Their implementation-defined values shall be equal or greater in magnitude (absolute value) to those shown, with the same sign.
So any number equal to 8 or higher is suitable for substitution by an implementation into CHAR_BIT.
Machines with 36-bit architectures have 9-bit bytes. According to Wikipedia, machines with 36-bit architectures include:
Digital Equipment Corporation PDP-6/10
IBM 701/704/709/7090/7094
UNIVAC 1103/1103A/1105/1100/2200,
A few of which I'm aware:
DEC PDP-10: variable, but most often 7-bit chars packed 5 per 36-bit word, or else 9 bit chars, 4 per word
Control Data mainframes (CDC-6400, 6500, 6600, 7600, Cyber 170, Cyber 176 etc.) 6-bit chars, packed 10 per 60-bit word.
Unisys mainframes: 9 bits/byte
Windows CE: simply doesn't support the `char` type at all -- requires 16-bit wchar_t instead
There is no such thing as a completely portable code. :-)
Yes, there may be various byte/char sizes. Yes, there may be C/C++ implementations for platforms with highly unusual values of CHAR_BIT and UCHAR_MAX. Yes, sometimes it is possible to write code that does not depend on char size.
However, almost any real code is not standalone. E.g. you may be writing a code that sends binary messages to network (protocol is not important). You may define structures that contain necessary fields. Than you have to serialize it. Just binary copying a structure into an output buffer is not portable: generally you don't know neither the byte order for the platform, nor structure members alignment, so the structure just holds the data, but not describes the way the data should be serialized.
Ok. You may perform byte order transformations and move the structure members (e.g. uint32_t or similar) using memcpy into the buffer. Why memcpy? Because there is a lot of platforms where it is not possible to write 32-bit (16-bit, 64-bit -- no difference) when the target address is not aligned properly.
So, you have already done a lot to achieve portability.
And now the final question. We have a buffer. The data from it is sent to TCP/IP network. Such network assumes 8-bit bytes. The question is: of what type the buffer should be? If your chars are 9-bit? If they are 16-bit? 24? Maybe each char corresponds to one 8-bit byte sent to network, and only 8 bits are used? Or maybe multiple network bytes are packed into 24/16/9-bit chars? That's a question, and it is hard to believe there is a single answer that fits all cases. A lot of things depend on socket implementation for the target platform.
So, what I am speaking about. Usually code may be relatively easily made portable to certain extent. It's very important to do so if you expect using the code on different platforms. However, improving portability beyond that measure is a thing that requires a lot of effort and often gives little, as the real code almost always depends on other code (socket implementation in the example above). I am sure that for about 90% of code ability to work on platforms with bytes other than 8-bit is almost useless, for it uses environment that is bound to 8-bit. Just check the byte size and perform compilation time assertion. You almost surely will have to rewrite a lot for a highly unusual platform.
But if your code is highly "standalone" -- why not? You may write it in a way that allows different byte sizes.
It appears that you can still buy an IM6100 (i.e. a PDP-8 on a chip) out of a warehouse. That's a 12-bit architecture.
Many DSP chips have 16- or 32-bit char. TI routinely makes such chips for example.
The C and C++ programming languages, for example, define byte as "addressable unit of data large enough to hold any member of the basic character set of the execution environment" (clause 3.6 of the C standard). Since the C char integral data type must contain at least 8 bits (clause 5.2.4.2.1), a byte in C is at least capable of holding 256 different values. Various implementations of C and C++ define a byte as 8, 9, 16, 32, or 36 bits
Quoted from http://en.wikipedia.org/wiki/Byte#History
Not sure about other languages though.
http://en.wikipedia.org/wiki/IBM_7030_Stretch#Data_Formats
Defines a byte on that machine to be variable length
The DEC PDP-8 family had a 12 bit word although you usually used 8 bit ASCII for output (on a Teletype mostly). However, there was also a 6-BIT character code that allowed you to encode 2 chars in a single 12-bit word.
For one, Unicode characters are longer than 8-bit. As someone mentioned earlier, the C spec defines data types by their minimum sizes. Use sizeof and the values in limits.h if you want to interrogate your data types and discover exactly what size they are for your configuration and architecture.
For this reason, I try to stick to data types like uint16_t when I need a data type of a particular bit length.
Edit: Sorry, I initially misread your question.
The C spec says that a char object is "large enough to store any member of the execution character set". limits.h lists a minimum size of 8 bits, but the definition leaves the max size of a char open.
Thus, the a char is at least as long as the largest character from your architecture's execution set (typically rounded up to the nearest 8-bit boundary). If your architecture has longer opcodes, your char size may be longer.
Historically, the x86 platform's opcode was one byte long, so char was initially an 8-bit value. Current x86 platforms support opcodes longer than one byte, but the char is kept at 8 bits in length since that's what programmers (and the large volumes of existing x86 code) are conditioned to.
When thinking about multi-platform support, take advantage of the types defined in stdint.h. If you use (for instance) a uint16_t, then you can be sure that this value is an unsigned 16-bit value on whatever architecture, whether that 16-bit value corresponds to a char, short, int, or something else. Most of the hard work has already been done by the people who wrote your compiler/standard libraries.
If you need to know the exact size of a char because you are doing some low-level hardware manipulation that requires it, I typically use a data type that is large enough to hold a char on all supported platforms (usually 16 bits is enough) and run the value through a convert_to_machine_char routine when I need the exact machine representation. That way, the platform-specific code is confined to the interface function and most of the time I can use a normal uint16_t.
what sort of consideration is it worth giving to platforms with non-8-bit char?
magic numbers occur e.g. when shifting;
most of these can be handled quite simply
by using CHAR_BIT and e.g. UCHAR_MAX instead of 8 and 255 (or similar).
hopefully your implementation defines those :)
those are the "common" issues.....
another indirect issue is say you have:
struct xyz {
uchar baz;
uchar blah;
uchar buzz;
}
this might "only" take (best case) 24 bits on one platform,
but might take e.g. 72 bits elsewhere.....
if each uchar held "bit flags" and each uchar only had 2 "significant" bits or flags that
you were currently using, and you only organized them into 3 uchars for "clarity",
then it might be relatively "more wasteful" e.g. on a platform with 24-bit uchars.....
nothing bitfields can't solve, but they have other things to watch out
for ....
in this case, just a single enum might be a way to get the "smallest"
sized integer you actually need....
perhaps not a real example, but stuff like this "bit" me when porting / playing with some code.....
just the fact that if a uchar is thrice as big as what is "normally" expected,
100 such structures might waste a lot of memory on some platforms.....
where "normally" it is not a big deal.....
so things can still be "broken" or in this case "waste a lot of memory very quickly" due
to an assumption that a uchar is "not very wasteful" on one platform, relative to RAM available, than on another platform.....
the problem might be more prominent e.g. for ints as well, or other types,
e.g. you have some structure that needs 15 bits, so you stick it in an int,
but on some other platform an int is 48 bits or whatever.....
"normally" you might break it into 2 uchars, but e.g. with a 24-bit uchar
you'd only need one.....
so an enum might be a better "generic" solution ....
depends on how you are accessing those bits though :)
so, there might be "design flaws" that rear their head....
even if the code might still work/run fine regardless of the
size of a uchar or uint...
there are things like this to watch out for, even though there
are no "magic numbers" in your code ...
hope this makes sense :)
The weirdest one I saw was the CDC computers. 6 bit characters but with 65 encodings. [There were also more than one character set -- you choose the encoding when you install the OS.]
If a 60 word ended with 12, 18, 24, 30, 36, 40, or 48 bits of zero, that was the end of line character (e.g. '\n').
Since the 00 (octal) character was : in some code sets, that meant BNF that used ::= was awkward if the :: fell in the wrong column. [This long preceded C++ and other common uses of ::.]
ints used to be 16 bits (pdp11, etc.). Going to 32 bit architectures was hard. People are getting better: Hardly anyone assumes a pointer will fit in a long any more (you don't right?). Or file offsets, or timestamps, or ...
8 bit characters are already somewhat of an anachronism. We already need 32 bits to hold all the world's character sets.
The Univac 1100 series had two operational modes: 6-bit FIELDATA and 9-bit 'ASCII' packed 6 or 4 characters respectively into 36-bit words. You chose the mode at program execution time (or compile time.) It's been a lot of years since I actually worked on them.