Unsigned narrow character type number representation - c++

N3797::3.9.1/1 [basic.fundamental] says
For unsigned narrow character types, all possible bit patterns of the
value representation represent numbers.
That's a bit unclear for me. We have the following ranges for narrow character types:
unsigned char := 0 -- 255
signed char : = -128 -- 127
For both unsgined char and signed char objects we have one-to-one mapping from the set of bits in these object representation to the integral value they could represent. The Standard says N3797::3.9.1/1 [basic.fundamental]
These requirements do not hold for other types.
Why the requirement I cited doesn't hold say for signed char type?

Signed types can use one of three representations: two's complement, one's complement, or sign-magnitude. The last two each have one bit pattern (the negation of zero) which doesn't represent a number.
Two's complement is more or less universal for integer types these days; but the language still allows for the others.

A few machines have what are called "trap representations". This means (for example) that an int can contain an extra bit (or more than one) to signify whether it has been initialized or not.
If you try to read an int with that bit saying it hasn't been initialized, it can trigger some sort of trap/exception/fault that (for example) immediately shuts down your program with some sort of error message. Any time you write a value to the int, that trap representation is cleared, so reading from it can/will work.
So basically, when your program starts, it initializes all your ints to such trap representations. If you try to read from an uninitialized variable, the hardware will catch it immediately and give you an error message.
The standard mandates that for unsigned char, no such trap representation is possible--all the bits of an unsigned char must be "visible"--they must form part of the value. That means none of them can be hidden; no pattern of bits you put into an unsigned char can form a trap representation (or anything similar). Any bits you put into unsigned char must simply form some value.
Any other type, however, can have trap representations. If, for example, you take some (more or less) arbitrarily chosen 8 bits out of some other type, and read them as an unsigned char, they'll always form a value you can read, write to a file, etc. If, however, you attempt to read them as any other type (signed char, unsigned int, etc.) it's allowable for it to form a trap representation, and attempting to do anything with it can give undefined behavior.

Related

How to check whether an int variable contains a legal (not trap representation) value?

Context:
This is mainly a followup to that other question. OP wanted to guess whether a variable contained an int or not, and my first thought was that in C (as in C++) an int variable could only contain an int value. And Eric Postpischil reminded me that trap representations were allowed per standard for the int type...
Of course, I know that most modern system only use 2-complement representations of integers and no padding bits, meaning that no trap representation can be observed. Nevertheless both standards seem to still allow for 3 representations of signed types: sign and magnitude, one's complement and two's complement. And at least C18 draft (n2310 6.2.6 Representations of types) explicitely allows padding bits for integer types other that char.
Question
So in the context of possible padding bits, or non two's complement signed representation, int variables could contain trap values for conformant implementations. Is there a reliable way to make sure that an int variable contains a valid value?
In C++'s current working draft (for C++20), an integer cannot have a trap representation. An integer is mandated as two's complement: ([basic.fundamental]/3)
An unsigned integer type has the same object representation, value representation, and alignment requirements ([basic.align]) as the corresponding signed integer type.
For each value x of a signed integer type, the value of the corresponding unsigned integer type congruent to x modulo 2N has the same value of corresponding bits in its value representation. 41
[ Example: The value −1 of a signed integer type has the same representation as the largest value of the corresponding unsigned type.
— end example
]
Where the note 41 says
This is also known as two's complement representation.
This was changed in p0907.
Additionally, padding bits in integers cannot cause traps: ([basic.fundamental/4])
Each set of values for any padding bits ([basic.types]) in the object representation are alternative representations of the value specified by the value representation.
[ Note: Padding bits have unspecified value, but do not cause traps.
See also ISO C 6.2.6.2.
— end note
]

send int over socket: signed vs unsigned

Say I want to send 4 byte integer over network. The integer has fixed size, due to using types from stdint.
My question is: Does it matter if I try to send either signed or unsigned integer using these 4 bytes? (assuming I use same method to serialize/deserialize the integer to/from bytes, both on client and server side). Can there be some other problems? (I don't refer to endianness issues either)
This issue seldom gets the attention it deserves.
As Floris observes, only the bytes of the representation get sent. C and C++ define the bitwise representation* of unsigned numbers, but not signed ones, so sending signed numbers as bytes opens a compatibility gap.
It's easy to "fix" the format for transmission. Casting a signed int to its corresponding unsigned type is guaranteed to generate two's complement representation. But how to convert back? Casting an unsigned integer to its signed counterpart generates signed integer overflow when you want a negative number, which produces an unspecified result — you could get anything.
To be really safe, use a branch:
signed int deserialize_sint( unsigned int nonnegative ) {
if ( nonnegative < INT_MAX ) return nonnegative;
else return - (int) ( - nonnegative ); // Only cast an unsigned number < INT_MAX
}
With luck, the compiler will see that both cases are the same and eliminate the branch.
The above function is written in C; apologies to the C++ crowd.
If you want to be extra paranoid, you could check - nonnegative < INT_MAX before performing the cast, because the most negative number in a two's complement will still overflow a one's complement machine. The best you can do for the case of nonnegative == - nonnegative is to return a wider type, or if that's impossible, flag a runtime error.
* Endianness becomes ambiguous when the bits are divided into a byte sequence, though.
Because the standard does not mandate a particular representation for signed types:
3.9.1 Fundamental types [basic.fundamental] Paragraph 7 of n3936
Types bool, char, char16_t, char32_t, wchar_t, and the signed and unsigned integer types are collectively called integral types. A synonym for integral type is integer type. The representations of integral types shall define values by use of a pure binary numeration system. [ Example: this International Standard permits 2’s complement, 1’s complement and signed magnitude representations for integral types. —end example ]
Sending signed integer values in a binary representation is not well defined (unless you explicitly specify this as part of your protocol and do some manual work to make sure you know how to read/write that binary representation).
There are a couple of solutions depending on the exact requirements.
If speed is not primary concern then you could use an English (substitute language of your choice) representation and serialize integers to/from text. For a lot of problems this is not a bad solution as the major speed bump is not the serialization cost but network latency. Network latency is the major problem in most situations (but not always).
So alternatively if you need binary representation (because you timed it and the volume/density of your numbers requires it). Then the endianess problem is not hard to solve because of htonl() and family. Which covers all unsigned integral types (well at least 16/32 bit values).
So all you really need to solve is the representation of signed values. So pick one (Use the most common representation for the machines you use and the translation will then usually be a no-op). But if you know the on the wire representation (because it is specified in your protocol), then you can translate to/from this representation (usually this cost is small (a conditional addition)) on machines that do not natively support this representation.
When you send a number over a socket, it's just bytes.
Now if you want to send a negative number, and the representation of negative numbers is different at the receiving end, then you might have a problem. Otherwise, it's just bytes.
So if there is a chance that the binary representation of the negative number would be misunderstood at the receiving end, then you need to do some translating (maybe send a sign byte followed by four magnitude bytes, and put it all together at the other end).
That's quite unlikely though.

Under what circumstances would one use a signed char in C++?

In most situations, one would declare a char object to assign one of the character values on the ascii table ranging from 0 - 127. Even the extended character sets range from 128 - 255 (still positive). So i'm assuming that when dealing with the printing of characters, one only needs to use an unsigned char.
Now, based on some research on SO, people use a signed char when they need to use really small integers, but for that we can utilize the [u]int8 type. So i'm having trouble coming to terms with why one would need to use a signed char ? You can use it if you are dealing with the basic character ascii table (which unsigned char is already capable of doing) or you can use it to represent small integers (which [u]int8 already takes care of).
Can someone please provide a programming example in which a signed char is preferred over the other types ?
The reason is that you don't know, at least portably, if plain char variables are signed or unsigned. Different implementations have different approaches, a plain char may be signed in one platform and unsigned in another.
If you want to store negative values in a variable of type char, you absolutely must declare it as signed char, because only then you can be sure that every platform will be able to store negative values in there. Yes, you can use [u]int8 type, but this was not always the case (it was only introduced in C++11), and in fact, int8 is most likely an alias for signed char.
Moreover, uint8_t and int8_t are defined to be optional types, meaning you can't always rely on its existence (contrary to signed char). In particular, if a machine has a byte unit with more than 8 bits, it is not very likely that uint8_t and int8_t are defined (although they can; a compiler is always free to provide it and do the appropriate calculations). See this related question: What is int8_t if a machine has > 8 bits per byte?
Is char signed or unsigned?
Actually it is neither, it's implementation defined if a variable of type char can hold negative values. So if you are looking for a portable way to store negative values in a narrow character type explicitly declare it as signed char.
§ 3.9.1 - Fundamental Types - [basic.fundamental]
1 Objects declared as characters (char) shall be large enough to store any member of the implementation's basic character set. If a character from this set is stored in a character object, the integral value of that character object is equal to the value of the single character literal form of that character. It is implementation-defined whether a char object can hold negative values.
I'd like to use the smallest signed integer type available, which one is it?
c++11 introduced several fixed with integer types, but a common misunderstanding is that these types are guaranteed to be available, something which isn't true.
§ 18.4.1 - Header <cstdint> synopsis - [cstdint.syn]
typedefsigned integer typeint8_t; // optional
To preserve space in this post most of the section has been left out, but the optional rationale applies to all {,u}int{8,16,32,64}_t types. An implementation is not required to implement them.
The standard mandates that int_least8_t is available, but as the name implies this type is only guaranteed to have a width equal or larger than 8 bits.
However, the standard guarantees that even though signed char, char, and unsigned char are three distinct types[1] they must occupy the same amount of storage and have the same alignment requirements.
After inspecting the standard further we will also find that sizeof(char) is guaranteed to be 1[2] , which means that this type is guaranteed to occupy the smallest amount of space that a C++ variable can occupy under the given implementation.
Conclusion
Remember that unsigned char and signed char must occupy the same amount of storage as a char?
The smallest signed integer type that is guaranteed to be available is therefore signed char.
[note 1]
§ 3.9.1 - Fundamental Types - [basic.fundamental]
1 Plain char, signed char, and unsigned char are three distinct types, collectively called narrow character types.
A char, a signed char, and an unsigned char occupy the same amount of storage and have the same alignment requirements (3.11); that is, they have the same object representation. For narrow character types, all bits of the object representation participate in the value representation.
[note 2]
§ 5.3.3 - Sizeof - [expr.sizeof]
sizeof(char), sizeof(signed char), and sizeof(unsigned char) are 1.
The result of sizeof applied to any other fundamental type (3.9.1) is implementation-defined.
You can use char for arithmetic operations with small integers. unsigned char will give you greater range, while signed char will give you a smaller absolute range and the ability to work with negative numbers.
There are situations where char's small size is of importance and is preffered for these operations, see here, so when one has negative numbers to deal with, signed char is the way to go.

Why is unsigned integer overflow defined behavior but signed integer overflow isn't?

Unsigned integer overflow is well defined by both the C and C++ standards. For example, the C99 standard (§6.2.5/9) states
A computation involving unsigned operands can never overflow,
because a result that cannot be represented by the resulting unsigned integer type is
reduced modulo the number that is one greater than the largest value that can be
represented by the resulting type.
However, both standards state that signed integer overflow is undefined behavior. Again, from the C99 standard (§3.4.3/1)
An example of undefined behavior is the behavior on integer overflow
Is there an historical or (even better!) a technical reason for this discrepancy?
The historical reason is that most C implementations (compilers) just used whatever overflow behaviour was easiest to implement with the integer representation it used. C implementations usually used the same representation used by the CPU - so the overflow behavior followed from the integer representation used by the CPU.
In practice, it is only the representations for signed values that may differ according to the implementation: one's complement, two's complement, sign-magnitude. For an unsigned type there is no reason for the standard to allow variation because there is only one obvious binary representation (the standard only allows binary representation).
Relevant quotes:
C99 6.2.6.1:3:
Values stored in unsigned bit-fields and objects of type unsigned char shall be represented using a pure binary notation.
C99 6.2.6.2:2:
If the sign bit is one, the value shall be modified in one of the following ways:
— the corresponding value with sign bit 0 is negated (sign and magnitude);
— the sign bit has the value −(2N) (two’s complement);
— the sign bit has the value −(2N − 1) (one’s complement).
Nowadays, all processors use two's complement representation, but signed arithmetic overflow remains undefined and compiler makers want it to remain undefined because they use this undefinedness to help with optimization. See for instance this blog post by Ian Lance Taylor or this complaint by Agner Fog, and the answers to his bug report.
Aside from Pascal's good answer (which I'm sure is the main motivation), it is also possible that some processors cause an exception on signed integer overflow, which of course would cause problems if the compiler had to "arrange for another behaviour" (e.g. use extra instructions to check for potential overflow and calculate differently in that case).
It is also worth noting that "undefined behaviour" doesn't mean "doesn't work". It means that the implementation is allowed to do whatever it likes in that situation. This includes doing "the right thing" as well as "calling the police" or "crashing". Most compilers, when possible, will choose "do the right thing", assuming that is relatively easy to define (in this case, it is). However, if you are having overflows in the calculations, it is important to understand what that actually results in, and that the compiler MAY do something other than what you expect (and that this may very depending on compiler version, optimisation settings, etc).
First of all, please note that C11 3.4.3, like all examples and foot notes, is not normative text and therefore not relevant to cite!
The relevant text that states that overflow of integers and floats is undefined behavior is this:
C11 6.5/5
If an exceptional condition occurs during the evaluation of an
expression (that is, if the result is not mathematically defined or
not in the range of representable values for its type), the behavior
is undefined.
A clarification regarding the behavior of unsigned integer types specifically can be found here:
C11 6.2.5/9
The range of nonnegative values of a signed integer type is a subrange
of the corresponding unsigned integer type, and the representation of
the same value in each type is the same. A computation involving
unsigned operands can never overflow, because a result that cannot be
represented by the resulting unsigned integer type is reduced modulo
the number that is one greater than the largest value that can be
represented by the resulting type.
This makes unsigned integer types a special case.
Also note that there is an exception if any type is converted to a signed type and the old value can no longer be represented. The behavior is then merely implementation-defined, although a signal may be raised.
C11 6.3.1.3
6.3.1.3 Signed and unsigned integers
When a value with integer
type is converted to another integer type other than _Bool, if the
value can be represented by the new type, it is unchanged.
Otherwise, if the new type is unsigned, the value is converted by
repeatedly adding or subtracting one more than the maximum value that
can be represented in the new type until the value is in the range of
the new type.
Otherwise, the new type is signed and the value
cannot be represented in it; either the result is
implementation-defined or an implementation-defined signal is raised.
In addition to the other issues mentioned, having unsigned math wrap makes the unsigned integer types behave as abstract algebraic groups (meaning that, among other things, for any pair of values X and Y, there will exist some other value Z such that X+Z will, if properly cast, equal Y and Y-Z will, if properly cast, equal X). If unsigned values were merely storage-location types and not intermediate-expression types (e.g. if there were no unsigned equivalent of the largest integer type, and arithmetic operations on unsigned types behaved as though they were first converted them to larger signed types, then there wouldn't be as much need for defined wrapping behavior, but it's difficult to do calculations in a type which doesn't have e.g. an additive inverse.
This helps in situations where wrap-around behavior is actually useful - for example with TCP sequence numbers or certain algorithms, such as hash calculation. It may also help in situations where it's necessary to detect overflow, since performing calculations and checking whether they overflowed is often easier than checking in advance whether they would overflow, especially if the calculations involve the largest available integer type.
Perhaps another reason for why unsigned arithmetic is defined is because unsigned numbers form integers modulo 2^n, where n is the width of the unsigned number. Unsigned numbers are simply integers represented using binary digits instead of decimal digits. Performing the standard operations in a modulus system is well understood.
The OP's quote refers to this fact, but also highlights the fact that there is only one, unambiguous, logical way to represent unsigned integers in binary. By contrast, Signed numbers are most often represented using two's complement but other choices are possible as described in the standard (section 6.2.6.2).
Two's complement representation allows certain operations to make more sense in binary format. E.g., incrementing negative numbers is the same that for positive numbers (expect under overflow conditions). Some operations at the machine level can be the same for signed and unsigned numbers. However, when interpreting the result of those operations, some cases don't make sense - positive and negative overflow. Furthermore, the overflow results differ depending on the underlying signed representation.
The most technical reason of all, is simply that trying to capture overflow in an unsigned integer requires more moving parts from you (exception handling) and the processor (exception throwing).
C and C++ won't make you pay for that unless you ask for it by using a signed integer. This isn't a hard-fast rule, as you'll see near the end, but just how they proceed for unsigned integers. In my opinion, this makes signed integers the odd-one out, not unsigned, but it's fine they offer this fundamental difference as the programmer can still perform well-defined signed operations with overflow. But to do so, you must cast for it.
Because:
unsigned integers have well defined overflow and underflow
casts from signed -> unsigned int are well defined, [uint's name]_MAX - 1 is conceptually added to negative values, to map them to the extended positive number range
casts from unsigned -> signed int are well defined, [uint's name]_MAX - 1 is conceptually deducted from positive values beyond the signed type's max, to map them to negative numbers)
You can always perform arithmetic operations with well-defined overflow and underflow behavior, where signed integers are your starting point, albeit in a round-about way, by casting to unsigned integer first then back once finished.
int32_t x = 10;
int32_t y = -50;
// writes -60 into z, this is well defined
int32_t z = int32_t(uint32_t(y) - uint32_t(x));
Casts between signed and unsigned integer types of the same width are free, if the CPU is using 2's compliment (nearly all do). If for some reason the platform you're targeting doesn't use 2's Compliment for signed integers, you will pay a small conversion price when casting between uint32 and int32.
But be wary when using bit widths smaller than int
usually if you are relying on unsigned overflow, you are using a smaller word width, 8bit or 16bit. These will promote to signed int at the drop of a hat (C has absolutely insane implicit integer conversion rules, this is one of C's biggest hidden gotcha's), consider:
unsigned char a = 0;
unsigned char b = 1;
printf("%i", a - b); // outputs -1, not 255 as you'd expect
To avoid this, you should always cast to the type you want when you are relying on that type's width, even in the middle of an operation where you think it's unnecessary. This will cast the temporary and get you the signedness AND truncate the value so you get what you expected. It's almost always free to cast, and in fact, your compiler might thank you for doing so as it can then optimize on your intentions more aggressively.
unsigned char a = 0;
unsigned char b = 1;
printf("%i", (unsigned char)(a - b)); // cast turns -1 to 255, outputs 255

What do the C and C++ standards say about bit-level integer representation and manipulation?

I know the C and C++ standards don't dictate a particular representation for numbers (could be two's complement, sign-and-magnitude, etc.). But I don't know the standards well enough (and couldn't find if it's stated) to know if there are any particular restrictions/guarantees/reserved representations made when working with bits. Particularly:
If all the bits in an integer type are zero, does the integer as whole represent zero?
If any bit in an integer type is one, does the integer as a whole represent non-zero? (if this is a "yes" then some representations like sign-and-magnitude would be additionally restricted)
Is there a guaranteed way to check if any bit is not set?
Is there a guaranteed way to check if any bit is set? (#3 and #4 kind of depend on #1 and #2, because I know how to set, for example the 5th bit (see #5) in some variable x, and I'd like to check a variable y to see if it's 5th bit is 1, I would like to know if if (x & y) will work (because as I understand, this relies on the value of the representation and not whether nor not that bit is actually 1 or 0))
Is there a guaranteed way to set the left-most and/or right-most bits? (At least a simpler way than taking a char c with all bits true (set by c = c | ~c) and doing c = c << (CHAR_BIT - 1) for setting the high-bit and c = c ^ (c << 1) for the low-bit, assuming I'm not making any assumptions I should't be, given these questions)
If the answer to #1 is "no" how could one iterate over the bits in an integer type and check if each one was a 1 or a 0?
I guess my overall question is: are there any restrictions/guarantees/reserved representations made by the C and C++ standards regarding bits and integers, despite the fact that an integer's representation is not mandated (and if the C and C++ standards differ in this regard, what's their difference)?
I came up with these questions while doing my homework which required me to do some bit manipulating (note these aren't questions from my homework, these are much more "abstract").
Edit: As to what I refer to as "bits," I mean "value forming" bits and am not including "padding" bits.
(1) If all the bits in an integer type are zero, does the integer as whole represent zero?
Yes, the bit pattern consisting of all zeroes always represents 0:
The representations of integral types shall define values by use of a pure binary numeration system.49 [§3.9.1/7]
49 A positional representation for integers that uses the binary digits 0 and 1, in which the values represented by successive bits are additive, begin with 1, and are multiplied by successive integral power of 2, except perhaps for the bit with the highest position.
(2) If any bit in an integer type is one, does the integer as a whole represent non-zero? (if this is a "yes" then some representations like sign-and-magnitude would be additionally restricted)
No. In fact, signed magnitude is specifically allowed:
[ Example: this International Standard permits 2’s complement, 1’s complement and signed magnitude representations for integral types. —end
example ] [§3.9.1/7]
(3) Is there a guaranteed way to check if any bit is not set?
I believe the answer to this is "no," if you consider signed types. It is equivalent to equality testing with a bit pattern of all ones, which is only possible if you have a way to produce a signed number with bit pattern of all ones. For an unsigned number this representation is guaranteed, but casting from unsigned to signed is undefined if the number is unrepresentable:
If the destination type is signed, the value is unchanged if it can be represented in the destination type (and bit-field width); otherwise, the value is implementation-defined. [§4.7/3]
(4) Is there a guaranteed way to check if any bit is set?
I don't think so, because signed magnitude is allowed—0 would compare equal to −0. But it should be possible with unsigned numbers.
(5) Is there a guaranteed way to set the left-most and/or right-most bits?
Again, I believe the answer is "yes" for unsigned numbers, but "no" for signed numbers. Shifts are undefined for negative signed numbers:
Otherwise, if E1 has a signed type and non-negative value, and E1 × 2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined. [§5.8/2]
You use the term "all bits" repeatedly, but you do not clarify what "all bits" you are referring to. Object representation of integer types in C/C++ might include value-forming bits and padding bits. The only integer type that is guaranteed not to have padding bits is [signed/unsigned] char.
The language always guaranteed that if all value-forming bits are zero, then the represented integer value is also zero.
As for padding bits, things are/were a bit more complicated. The original specification of C language (C89/90 as well as the original C99) did not guarantee that setting all object bits to zero produced a valid integer representation. It could've produced an invalid trap representation. I.e. in the original C (and even in C99 at first) using memset(..., 0, ...) on integer types did not guarantee that the objects will receive valid zero values (with the exception of [signed/unsigned] char). This was changed in later specifications, namely in one of the technical corrigendums for C99. Now it is required that all-zero bit pattern in an integer object (that involves all bits, including padding ones) represents a valid zero value.
I.e. in modern C it is legal to use memset(..., 0, ...) to set any integer objects to zero, but it became legal only after C99.
You already got some answers about the representation of integer values. There is exactly one way that is guaranteed to give you all the individual bits of any object that is represented in memory: view it as array of unsigned char. This is the only integral type that has no padding bits and is guaranteed to have no trap representation. So casting a pointer of type T* to your object to unsigned char* will always work, as long as you only access the first sizeof(T) bytes. By that you could inspect and set all bytes (and thus bits) to your liking.
If you are interested in more details, here I have written something up about the anatomy of integer types in C. C++ might differ a bit from that, in particular type puning through union as described there doesn't seem to be well defined in C++.
Q: If any bit in an integer type is one, does the integer as a whole represent non-zero? (if this is a "yes" then some representations like sign-and-magnitude would be additionally restricted)
No. The standards for C and C++ don't rule out signed magnitude or one's complement, both of which have +0 and -0. While +0 and -0 do have to compare equal, but they do not have to have the same representation.
Good luck finding a machine nowadays that uses signed magnitude or one's complement.
If you want your brain to explode, consider this: If you interpret an int or long or long long as an array of unsigned char (which is the most reasonable thing to do if you want to see all the bits), you know that the order of bytes is not defined, for example "bigendian" vs. "littleendian". We all (hopefully) know that.
But it is worse: Each bit of an int could be stored in any of the bits of the array of char. So there are 32! ways how the bits of a 32 bit integer could be mapped to an array of four 8-bit unsigned chars by a truly bizarre implementation. Fortunately, I haven't encountered more than two ways myself (and I know of one more ordering in a real computer).
If all the bits in an integer type are zero, does the integer as whole represent zero?
Edit: since you have now clarified that you are not concerned with the padding bits, the answer to this is actually "yes". But I leave the original:
Not necessarily, it could be a trap representation. See C99 6.2.6.1:
For unsigned integer types other than unsigned char, the bits of the object
representation shall be divided into two groups: value bits and padding bits (there need not be any of the latter)
The presence of padding bits allows for the possibility that all 0 is a trap representation. (As noted by Keith Thompson in the comment below, the more recent C11 makes explicit that such a representation is not a trap representation).
and
The values of any padding bits are unspecified
and
44) Some combinations of padding bits might generate trap representations
If you restrict the question to value and sign bits, the answer is yes, due to 6.2.6.2:
If there are N value bits, each bit shall represent a different
power of 2 between 1 and 2 N −1 , so that objects of that type shall be capable of representing values from 0 to 2 N − 1 using a pure binary representation; this shall be known as the value representation.
and
If the sign bit is zero, it shall not affect the resulting value.
If any bit in an integer type is one, does the integer as a whole represent non-zero? (if this is a "yes" then some representations like sign-and-magnitude would be additionally restricted)
Not necessarily, and in fact sign-and-magnitude is explicitly supported in 6.2.6.2.
Is there a guaranteed way to check if any bit is not set?
If you do not care about padding and sign bits, you could just compare to 0, but this would not work with a 1's complement representation (which is allowed) seeing as all bits 0 and all bits 1 both represent the value 0.
Otherwise: you can read the value of each byte via an unsigned char *, and compare the result to 0:
Values stored in unsigned bit-fields and objects of type unsigned char
shall be represented using a pure binary notation
If you want to check a specific value bit, you could construct a suitable bitmask using (1u << n), but this will not necessarily let you inspect the sign bit.
Is there a guaranteed way to check if any bit is set?
The answer is essentially the same as to the previous question.
Is there a guaranteed way to set the left-most and/or right-most bits?
Do you mean left-most value bit? You could count the bits in INT_MAX or UINT_MAX or equivalent depending on the type, and use that to construct a value (via 1 << n) with which to OR the original value.
If the answer to #1 is "no" how could one iterate over the bits in an integer type and check if each one was a 1 or a 0?
You can do so using a bitmask which you left shift repeatedly, but you can check only the value bits this way and not the sign bit.
For the bitmanipulations you could make a struct with 8 one unsigned bit fields and let the pointer of that struct point to your char. In that way you can easily access each bit. But the compiler will probably do masking under the hood, so it is only a cleaner way for the programmer I think. You must check that your compiler doesn't change the order of the fields when doing this.
yourstruct* pChar=(yourstruct*)(&c)
pChar.Bit7=1;
Let me caveat this by saying I'm addressing C and C++ in general (e.g. C90 and lower, MS Visual C++, etc): the "greatest common denominator" (vs. the latest/greatest cx11 "standard").
Q: If all the bits in an integer type are zero, does the integer as whole represent zero?
A: Yes
Q: If any bit in an integer type is one, does the integer as a whole represent non-zero? (if this is a "yes" then some representations like sign-and-magnitude would be additionally restricted)
A: Yes. This includes the sign bit, for a signed int.
I'm frankly not familiar with "magnitude"
Q: Is there a guaranteed way to check if any bit is not set?
A: "And'ing" a bitmask is always guaranteed.
Q: Is there a guaranteed way to check if any bit is set?
A: Again, "and'ing" a bitmask is always guaranteed.
Q: Is there a guaranteed way to set the left-most and/or right-most bits?
A: I believe you should always have a "MAX_INT" available for all implementations/all architectures to determine the leftmost bit.
I'm prepared to be flamed ... but I believe the above is accurate. And I hope it helps.
IMHO...