Does sizeof(T) * CHAR_BIT guarantee bit size? - c++

There doesn't appear to be any library function for calculating the size of a type in bits.
Am I right to assume that this can be done in the following way?
#include <climits>
template <typename T>
size_t Size_In_Bits(){
return sizeof(T) * CHAR_BIT;
}
Will this always give back the amount of bits that can be targeted on a type?

This is guaranteed to give you size (storage) in bits, but not the width (number of value bits). The latter could be less if the type has padding bits. For unsigned types there you can measure the number of value bits directly by converting -1 to the type (to get the max possible value in the type) and counting them. For signed types, std::numeric_limits<T>::max() can be used to get the max. Or, if you know the specific type already, you can use the xxx_MAX macros from limits.h or stdint.h.

sizeof(T) * CHAR_BIT returns the numbers of bits the type takes up in memory.
Yet the size of bits may be more than the bits the integer can be mathematically used - (consider padding bits).
Detail: integers have value bits, sign bit (signed integers) and possible padding bits. All these bits contribute to the storage size.
unsigned char will never have padding bits.

Related

Why is the result of sizeof implementation defined? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
In C99, §6.5.3.4:
2 The sizeof operator yields the size (in bytes) of its operand,
which may be an expression or the parenthesized name of a type. ...
4 The value of the result is implementation-defined, and its type (an
unsigned integer type) is size_t, defined in <stddef.h> (and other
headers).
In C++14, §5.3.3:
1 The sizeof operator yields the number of bytes in the object
representation of its operand. ... The result of sizeof applied to any
other fundamental type (3.9.1) is implementation-defined.
The only guaranteed values are sizeof(char), sizeof(unsigned char) and sizeof(signed char) which is one.
However, "the number of bytes in the object representation" seems pretty iron-clad to me. For example, in C99 §6.2.6.1:
4 Values stored in non-bit-field objects of any other object type
consist of n × CHAR_BIT bits, where n is the size of an object
of that type, in bytes. ...
So why is it implementation-defined if it seems pretty defined?
Many of you seem to be misinterpretating my question. I never claimed that:
A) The size of types are defined or the same on all systems,
B) implementation-defined means it can return "random values"
What I'm getting at here is that n * CHAR_BITS is a fixed formula. The formula itself can't changed between implementations. Yes, an int may be 4 bytes or 8 bytes. I get that. But between all implementations, the value must n * CHAR_BITS.
The result of sizeof is implementation defined because the size of the various basic types are implementation defined. The only guarantees we have on the size of the types in C++ is that
sizeof(char) = 1 and sizeof(char) <= sizeof(short) <= sizeof(int) <=
sizeof(long) <= sizeof(long long)
And that each type has a minimum value it must support C11 [Annex E (informative) Implementation limits]/1
[...]The minimum magnitudes shown shall be replaced by implementation-defined magnitudes with the same sign.[...]
#define CHAR_BIT 8
#define CHAR_MAX UCHAR_MAX or SCHAR_MAX
#define CHAR_MIN 0 or SCHAR_MIN
#define INT_MAX +32767
#define INT_MIN -32767
#define LONG_MAX +2147483647
#define LONG_MIN -2147483647
#define LLONG_MAX +9223372036854775807
#define LLONG_MIN -9223372036854775807
#define MB_LEN_MAX 1
#define SCHAR_MAX +127
#define SCHAR_MIN -127
#define SHRT_MAX +32767
#define SHRT_MIN -32767
#define UCHAR_MAX 255
#define USHRT_MAX 65535
#define UINT_MAX 65535
#define ULONG_MAX 4294967295
#define ULLONG_MAX 18446744073709551615
So per the standard a int has to be able to store a number that could be stored in 16 bits but it can be bigger and on most of today's systems it is 32 bits.
What I'm getting at here is that n * CHAR_BITS is a fixed formula. The formula itself can't changed between implementations. Yes, an int may be 4 bytes or 8 bytes. I get that. But between all implementations, the value must n * CHAR_BITS.
You are correct but n is defined per C99 §6.2.6.1 as
where n is the size of an object of that type
emphasis mine
So the formula may be fixed but n is not fixed and different implementations on the same system can use a different value of n.
The result of sizeof is not implementation defined. The standard does not say that; it says:
The value of the result is implementation-defined, [...]
That is semantically different. The result of sizeof is well defined:
[...] the size (in bytes) of its operand [...]
Both the bit width of a byte in this context and the number of bytes in non char types is implementation defined.
Because the sizes of basic types are defined in terms of efficiency, not in terms of exact number of bits. An "int" must be something that the CPU can manipulate efficiently. For most modern systems, this quantity turns out to be 32 bits (or 64 bits). For older systems, it was quite often 16 bits. However, if a 35 bits CPU were to exist, an int on such a system would be 35 bits. In other words, C++ does not apply a penalty to enforce a bit-width a CPU might not support at all.
Of course, one could argue that notions of exotic bit widths for basic types have been overtaken by history. I cannot think of any modern CPU that does not support the standard set of 8, 16, and 32 bits (feel free to disagree with this statement, but at least be so kind to give an example!), and 64 bits is also pretty common (and not a big deal to support in software if hardware support is unavailable).
Arguably the C++ language has already moved away from having variable numbers of bits for char; as far as I know, u8"..." converts to char *, but the unicode specification demands that u8 is encoded in 8 bits.
If a char of 8 bits is size 1, then an int of 32 bits is size 4. If a char of 16 bits is size 1, then an int of 32 bits is only size 2. Both situations are equally valid in C++, if such sizes happen to be good choices for their respective hardware.
Padding bits are "unspecified" not "implementation-defined".
Wrong. Very, very wrong. The values of padding bytes are unspecified. The intention here is that the values of these bits may represent trap values, but not necessarily.
The standard tells you sizeof returns bytes * CHAR_BITS, but doesn't specify a size (other than the exact-width types). The number of bytes a type occupies is implementation-defined, hence sizeof must be as well.
Implementation-defined is decribed as:
unspecified value where each implementation documents how the choice
is made
When you declare a new variable in example like this:
size_t a;
it will be equal with this:
unsigned short int a; // unsigned short a
On 32-bit computers size of the integer number (int) is 4 bytes.
Size of the short int is 2 bytes.
In C programming languange 'size_t' is the return type of the 'sizeof()' operator.When you use 'sizeof()' operator he will give you the size of the object.Argument of the 'sizeof()' must be an l-value type. Size of the element(object) cannot be a negative number and it must be an integer.

how 256 stored in char variable and unsigned char

Up to 255, I can understand how the integers are stored in char and unsigned char ;
#include<stdio.h>
int main()
{
unsigned char a = 256;
printf("%d\n",a);
return(0);
}
In the code above I have an output of 0 for unsigned char as well as char.
For 256 I think this is the way the integer stored in the code (this is just a guess):
First 256 converted to binary representation which is 100000000 (totally 9 bits).
Then they remove the remove the leftmost bit (the bit which is set) because the char datatype only have 8 bits of memory.
So its storing in the memory as 00000000 , that's why its printing 0 as output.
Is the guess correct or any other explanation is there?
Your guess is correct. Conversion to an unsigned type uses modular arithmetic: if the value is out of range (either too large, or negative) then it is reduced modulo 2N, where N is the number of bits in the target type. So, if (as is often the case) char has 8 bits, the value is reduced modulo 256, so that 256 becomes zero.
Note that there is no such rule for conversion to a signed type - out-of-range values give implementation-defined results. Also note that char is not specified to have exactly 8 bits, and can be larger on less mainstream platforms.
On your platform (as well as on any other "normal" platform) unsigned char is 8 bit wide, so it can hold numbers from 0 to 255.
Trying to assign 256 (which is an int literal) to it results in an unsigned integer overflow, that is defined by the standard to result in "wraparound". The result of u = n where u is an unsigned integral type and n is an unsigned integer outside its range is u = n % (max_value_of_u +1).
This is just a convoluted way to say what you already said: the standard guarantees that in these cases the assignment is performed keeping only the bits that fit in the target variable. This norm is there since most platform already implement this at the assembly language level (unsigned integer overflow typically results in this behavior plus some kind of overflow flag set to 1).
Notice that all this do not hold for signed integers (as often plain char is): signed integer overflow is undefined behavior.
yes, that's correct. 8 bits can hold 0 to 255 unsigned, or -128 to 127 signed. Above that and you've hit an overflow situation and bits will be lost.
Does the compiler give you warning on the above code? You might be able to increase the warning level and see something. It won't warn you if you assign a variable that can't be determined statically (before execution), but in this case it's pretty clear you're assigning something too large for the size of the variable.

Reliably determine the size of char

I was wondering how to reliably determine the size of a character in a portable way. AFAIK sizeof(char) can not be used because this yields always 1, even on system where the byte has 16 bit or even more or less.
For example when dealing with bits, where you need to know exactly how big it is, I was wondering if this code would give the real size of a character, independent on what the compiler thinks of it. IMO the pointer has to be increased by the compiler to the correct size, so we should have the correct value. Am I right on this, or might there be some hidden problem with pointer arithmetics, that would yield also wrong results on some systems?
int sizeOfChar()
{
char *p = 0;
p++;
int size_of_char = (int)p;
return size_of_char;
}
There's a CHAR_BIT macro defined in <limits.h> that evaluates to exactly what its name suggests.
IMO the pointer has to be increased by the compiler to the correct size, so we should have the correct value
No, because pointer arithmetic is defined in terms of sizeof(T) (the pointer target type), and the sizeof operator yields the size in bytes. char is always exactly one byte long, so your code will always yield the NULL pointer plus one (which may not be the numerical value 1, since NULL is not required to be 0).
I think it's not clear what you consider to be "right" (or "reliable", as in the title).
Do you consider "a byte is 8 bits" to be the right answer? If so, for a platform where CHAR_BIT is 16, then you would of course get your answer by just computing:
const int octets_per_char = CHAR_BIT / 8;
No need to do pointer trickery. Also, the trickery is tricky:
On an architecture with 16 bits as the smallest addressable piece of memory, there would be 16 bits at address 0x00001, another 16 bits at address 0x0001, and so on.
So, your example would compute the result 1, since the pointer would likely be incremented from 0x0000 to 0x0001, but that doesn't seem to be what you expect it to compute.
1 I use a 16-bit address space for brevity, it makes the addresses easier to read.
The size of one char (aka byte ) in bits is determined by the macro CHAR_BIT in <limits.h> (or <climits> in C++).
The sizeof operator always returns the size of a type in bytes, not in bits.
So if on some system CHAR_BIT is 16 and sizeof(int) is 4, that means an int has 64 bits on that system.

Is it possible to know how many bits in integer type?

I want to use some integer type as a bit mask. I want to know for which n it's guaranteed that any number from 0 to 2^n-1 is avaliablle in this type. (Actually I'm going to use uintmax_t)
I know it's usually 8 * sizeof(uintmax_t) (or probably CHAR_BIT * sizeof(uintmax_t)), but I guess it isn't guaranteed.
So I want, to find this n other way.
How do I achieve this?
There is nothing wrong with using the sizeof operator in combination with CHAR_BIT
const std::size_t nBits = CHAR_BIT * sizeof(some_integer_type);
This would also work for other built-int types, as well as user defined types.
Use the cstdint include.
It provides cross-platform fixed-size typedefs for integer types and macro constants for its limits.
#include <cstdint>
std::int8_t Signed = 0; // granted to 8bit size.
std::uint8_t Unsigned = 0; // granted to 8bit size.
Signed = INT8_MAX; // store the max value for a signed 8bit value.
Unsigned = UINT8_MAX; // store the max value for an unsigned 8bit value.
Hope it helps.
The answer would be 1+log2((UINTMAX_MAX>>1)+1)
It can also be derived by counting bits with repeated shifting.

Truncating an int (or long, or whatever) to a specific size (n bytes), signed and unsigned

Say I have these two variables:
size_t value;
size_t size;
And I want to "cast" value to the size of size. So if size is 4, value is casted to be 4 bytes long. If size is 3, value is presumably truncated to 3 bytes long, preserving sign (assume a signed int may be loaded into value then taken out later to be cast back to signed) and stored in an int/uint depending on sign choice. Preferably with a method that would work to turn, for example, an unsigned long, or whatever other integral type, to any arbitrary size in bytes along with being signed/unsigned.
The cast to long is to preserve the sign, and long is supposed to be at least as big as size_t (though I think that's not actually true in MS compilers). If it's not true, pick a different signed type as big as size_t and replace the three references to long.
size_t casted = size_t(long(value) << (8 * (sizeof(long) - size))) >> (8 * (sizeof(long) - size)));
For an unsigned version use size_t instead of long.
This is untested.
It depends on what you mean by truncate. If your intent is just to clear the bytes beyond the truncation point to zero, you could probably get away with something like:
size_t mask[] = {0x00000000, 0xff000000, 0xffff0000, 0xffffff00, 0xffffffff};
value &= mask[size];
So, where size is zero, nothing is preserved. Where size is two, only the upper two bytes are preserved.
Obviously, this will depend on the actual widths of your data types so is implementation specific. But that's the case anyway since you're casting between size_t and other data types - those types are not necessarily compatible.