Why -1 equals 4th byte of integer -5? - c++

Why does this code deduce a true (or 1 without std::boolalpha)
char* arr = new char[4];
int* i = new (arr) int(-5);
char c = -1;
std::cout << std::boolalpha << (arr[3] == c) << std::endl;

Why does this code deduce a true
Depending on the system used to run the program, the output could be either true or false, or behaviour of the program could be undefined.
On systems where negative numbers are represented using two's complement (which is very common, and will be guaranteed since C++20) and where byte-endianness is little endian (which is somewhat common on desktop systems; not so much elsewhere) and where the size of int is exactly 4, it just so happens that the byte arr[3] has the value -1. An example of CPU architecture where all of these conditions match is x86 a non-matching example is AVR32.
On big endian systems, this would not be the case and output wouldn't be true. And on systems where the size of int is less than 4 bytes, the byte could be uninitialised in which case output could be either true or false. In case where size of int is greater than 4 bytes, the behaviour of the program would be undefined.

If you inspect arr in a debugger you will probably see that it has the value 0xfbffffff. This will be true if your computer uses little-endian byte order, in which the least significant byte is stored at the lowest address. For your code to execute as you seem to expect, you should be examining arr[0]. Also, your code as written would probably execute as you expect on a machine that uses big-endian byte order.

Related

Typecasting char to long

Say I have a variable, a
char a = 0x01;
and I want to cast this to a long, as in
long b;
b = (long)a;
Will the upper 3 bytes in b be guaranteed to be 0? With my setup they are 0, but I'm not sure if this is compiler-dependent.
Yes, b is guaranteed to have the value 0x1 after this assignment even without the cast. The assignment operator in c++ is generally semantic or value driven, it will copy the value or state, rather than preform bit wise copy (even if the two are sometimes equivalent, such as for trivial types).
In some cases, specially because of operator overloading, this may not be the case. Developers are very strongly encouraged to keep to this concept when they design new types, but a careless programmer could overload the assignment operator for non-fundamental types to do anything he/she wants.
As a long can represent all values for a char (be it signed or unsigned) the conversion is guaranteed to not change the value.
If you initially have a positive value, because either char is signed in you architecture or because the char values is between 0 and 127 (assuming 8 bit characters), the resulting long is guaranteed to be positive and less that 256. So in an architecture where long is 4 bytes large, the 3 high order bytes are guaranteed to be 0.
If char is signed and if the initial value is negative, things will be different! The value will be unchanged and will still be negative. In a common 2'complement architecture, the 3 high order bits will be 0xFF
The answer already given is right, but I thought I'd add that for C++, it is recommended to use one of the C++-specific casting notations, to make it abundantly clear what you are doing. Here, you would use:
long b;
b = static_cast<long>(a);
This makes it very clear what you are doing (a cast whereby how the cast is performed is calculated at compile time to a long), and you know that the "right" sort of cast will be performed.
char a = 0x01;
long b;
b = (long)a;
C and C++ are two different (but closely related) languages. Their rules happen to be the same in this case.
The cast (not "typecast") is not necessary. The assignment could, and probably should, be written as:
b = a;
which causes an implicit conversion from char to long. Since the value being converted is within the representable range of type long, the result of the conversion is 1. The result of the conversion is specified in terms of values, not representations.
The representation of the value 1 in type long probably has a 1 in the low-order bit, and 0s in all the other bits. (And the position of the low-order bit can vary; some systems are big-endian, some are little-endian, and there are other possibilities.)
There is no guarantee that type long even has three high-order bytes. Type long is at least 32 bits wide, but a byte can be wider than 8 bits. It's even possible that there are values of type char that exceed LONG_MAX (if plain char is signed and long is 1 byte, which implies CHAR_BIT >= 32).
It's also possible that the representation of type long includes padding bits, bits that do not contribute to the value. It's guaranteed that the sign bit is 0, the low-order value bit is 1, and all other value bits are 0, but if there are padding bits their values are not guaranteed. (Some combinations of padding bits can result in a trap representation that does not represent any value, but that can't happen in this particular case.)
Most of these exotic possibilities are very unlikely to occur in real life. C implementations for some DSPs do have bytes wider than 8 bits, but any system you're using almost certainly has 8-bit bytes.
The point is that the result of the conversion is defined in terms of values, not representations, and 99% of the time that's all you need to care about. If you write:
char a = 1; /* same as 0x01 */
long b = a;
printf("b = %ld\n", b);
it will print b = 1, even if you're using some exotic system where the value 1 is represented strangely.
b will be 1; this is always, compiler and endianness-independent, true. Additionally, the following expressions will be true:
b == 1
b == 01
b == 0x1
b == 0x00000001
b == 0x00000000000000000000000000000000000000000000000000001
The right hand side in all cases is an int constant with the value 1; not more, not less. Note that the zeroes do not represent bytes in memory (an int most likely does not have the number of bytes the last expression appears to suggest). The hexadecimal notation is just another way to write down a 1, exactly like 1.
In particular, we don't know where in memory the byte with the value 1 is located, because that is architecture dependent. It may be the one at the address of the int, or it may be the other end, or even in between.
Now comes the sweet thing: C does not care how the memory in an int is laid out. None of the ways to write an integer constant is architecture dependent. That seems self-understood with decimal constants — did we expect that the meaning of int i = 1 is architecture dependent? Certainly not. Nor is int i = 0x00000001;. The same is true for the bit shift operators: << shifts towards more significant bits, >> towards less significant bits. The digits in (decimal or hexadecimal) integer constants are ordered so that the most significant digits are on the left side, aligning with the "direction" indicated by the arrow-like bit shift operators. That may or may not reflect your machine's int representation; on a PC it does not.
Bottom line: If you use the standard C (or C++) means to test the "upper 3 bytes", you are home free, and the following is always true, independent of the implementation or architecture:
char a = 0x01;
long b = a;
(b & 0x11) == 1 // least significant byte is 1
(b & 0x00000011) == 1 // exactly the same as above
(b & 0x11111100) == 0 // more significant three bytes are all 0
It's possible that your long has more bits, but that is implementation dependent. How many more there are: they are all zero, save for the least significant one.

Storing 8 logical true/false values inside 1 byte?

I'm working on a microcontroller with only 2KB of SRAM and desperately need to conserve some memory. Trying to work out how I can put 8 0/1 values into a single byte using a bitfield but can't quite work it out.
struct Bits
{
int8_t b0:1, b1:1, b2:1, b3:1, b4:1, b5:1, b6:1, b7:1;
};
int main(){
Bits b;
b.b0 = 0;
b.b1 = 1;
cout << (int)b.b0; // outputs 0, correct
cout << (int)b.b1; // outputs -1, should be outputting 1
}
What gives?
All of your bitfield members are signed 1-bit integers. On a two's complement system, that means they can represent only either 0 or -1. Use uint8_t if you want 0 and 1:
struct Bits
{
uint8_t b0:1, b1:1, b2:1, b3:1, b4:1, b5:1, b6:1, b7:1;
};
As a word of caution - the standard doesn't really enforce an implementation scheme for bitfields. There is no guarantee that Bits will be 1 byte, and hypothetically it is entirely possible for it to be larger.
In practice however the actual implementations usually follow the obvious logic and it will "almost always" be 1 byte in size, but again, there is no requirement that it is guaranteed. Just in case you want to be sure, you could do it manually.
BTW -1 is still true but it -1 != true
As noted, these variables consist of only a sign bit, so the only available values are 0 and -1.
A more appropriate type for these bitfields would be bool. C++14 §9.6/4:
If the value true or false is stored into a bit-field of type bool of any size (including a one bit bit-field), the original bool value and the value of the bit-field shall compare equal.
Yes, std::uint8_t will do the job, but you might as well use the best fit. You won't need things like the cast for std::cout << (int)b.b0;.
Signed and unsigned integers are the answer.
Keep in mind that signaling is just an interpretation of bits, -1 or 1 is just the 'print' serializer interpreting the "variable type", as it was "revealed" to cout functions (look operator overloading) by compiler, the bit is the same, its value also (on/off) - since you have only 1 bit.
Don't care about that, but is a good practice to be explicit, so prefer to declare your variable with unsigned, it instructs the compiler to mount a proper code when you set or get the value to a serializer like "print" (cout).
"COUT" OPERATOR OVERLOADING:
"cout" works through a series of functions which the parameter overloading instructs the compiler which function to call. So, there are two functions, one receives an unsigned and another signed, thus they can interpret the same data differently, and you can change it, instructing the compiler to call another one using cast. See cout << myclass

Reliably determine the size of char

I was wondering how to reliably determine the size of a character in a portable way. AFAIK sizeof(char) can not be used because this yields always 1, even on system where the byte has 16 bit or even more or less.
For example when dealing with bits, where you need to know exactly how big it is, I was wondering if this code would give the real size of a character, independent on what the compiler thinks of it. IMO the pointer has to be increased by the compiler to the correct size, so we should have the correct value. Am I right on this, or might there be some hidden problem with pointer arithmetics, that would yield also wrong results on some systems?
int sizeOfChar()
{
char *p = 0;
p++;
int size_of_char = (int)p;
return size_of_char;
}
There's a CHAR_BIT macro defined in <limits.h> that evaluates to exactly what its name suggests.
IMO the pointer has to be increased by the compiler to the correct size, so we should have the correct value
No, because pointer arithmetic is defined in terms of sizeof(T) (the pointer target type), and the sizeof operator yields the size in bytes. char is always exactly one byte long, so your code will always yield the NULL pointer plus one (which may not be the numerical value 1, since NULL is not required to be 0).
I think it's not clear what you consider to be "right" (or "reliable", as in the title).
Do you consider "a byte is 8 bits" to be the right answer? If so, for a platform where CHAR_BIT is 16, then you would of course get your answer by just computing:
const int octets_per_char = CHAR_BIT / 8;
No need to do pointer trickery. Also, the trickery is tricky:
On an architecture with 16 bits as the smallest addressable piece of memory, there would be 16 bits at address 0x00001, another 16 bits at address 0x0001, and so on.
So, your example would compute the result 1, since the pointer would likely be incremented from 0x0000 to 0x0001, but that doesn't seem to be what you expect it to compute.
1 I use a 16-bit address space for brevity, it makes the addresses easier to read.
The size of one char (aka byte ) in bits is determined by the macro CHAR_BIT in <limits.h> (or <climits> in C++).
The sizeof operator always returns the size of a type in bytes, not in bits.
So if on some system CHAR_BIT is 16 and sizeof(int) is 4, that means an int has 64 bits on that system.

short pointer to a float

i run this code in c++:
#include <iostream>
using namespace std;
int main()
{
float f = 7.0;
short s = *(short *)&f;
cout << sizeof(float) << endl
<< sizeof(short) << endl
<< s << endl;
return 0;
}
i get the following out pot:
4
2
0
but, in a lecture given in Stanford university, Professor Jerry Cain says he is sure the out pot well not be 0.
the lecture is can be fond here. he says that around the 48 minute.
is he wrong, or that some standard change since? or is there a difference between platforms?
I'm using g++ to compile my code.
EDIT: in the next lecture he does mention "big endian" and "small endian" and says that they well affect the result.
static void bitPrint(float f)
{
assert(sizeof(int) == sizeof(float));
int *data = reinterpret_cast<int*>(&f);
for (int i = 0; i < sizeof(int) * 8; ++i)
{
int bit = (1 << i) & *data;
if (bit) bit = 1;
cout << bit;
}
cout << endl;
}
int main()
{
float f = 7.0;
bitPrint(f);
return 0;
}
This program prints 00000000000000000000011100000010
Since the sizeof(short) == 2 on your platform you get the first 2 bytes which are both zeros
Note that since size of types and possibly float implementation (not sure about this) are implementation defined different output can be seen on different platforms.
Well, let's see. First you write a float into the memory. It occupies 4 bytes, and it's value is 7. A float in the memory looks something like "sign bit -> exponent bits -> mantissa bits". I'm not sure how many bits are there for each part exactly, probably that depends on your platform.
Since the float's value is 7, it only occupies some of the least-significant bits on the right (I assume big-endian).
Your short pointer points to the beginning of the float, which means to the most significant bit. Since the value is greater than 0, the sign bit is zero. Since the float value is far on the right, we can say that those two most significant bytes are filled with zeros.
Now, provided that a size of short is 2, which means we will only take two bytes out of float's 4 bytes, we get our 0.
I believe though, that this result is rather UB and can differ on different platforms, compilers, etc.
Accessing data through a pointer to a different type than it was stored as gives (except in a few special cases) undefined behavour.
Firstly it's platform dependent how the data it stored so different systems may well give different values, and secondly the compiler might well generate code that doesn't even see the value you'd expect as it's allowed to do anything it likes when you do this (It's undefined behavour due to the strict aliases rules).
Having said that there are probably reasons why the number you are seeing is valid, but you can't rely on it unless you specifically know your platform will do what you expect, it's not guarenteed by the standard.
He's "pretty" sure it's not zero, he says that explicitly.
However, given that the representation of a short can be big-endian or little-endian, I wouldn't be so certain. In any case, this is a throwaway line at the end of a fifty-minute lecture so we can forgive him a little. It may be he came back in the next lecture with a clarification.
You would need to examine the underlying bits at (at least) a byte-by-byte level to understand what's going on.

C++ vector max_size();

On 32 bit System.
std::vector<char>::max_size() returns 232-1, size of char — 1 byte
std::vector<int>::max_size() returns 230-1, size of int — 4 byte
std::vector<double>::max_size() returns 229-1, size of double — 8 byte
can anyone tell me max_size() depends on what?
and what will be the return value of max_size() if it runs on 64 bit system.
max_size() is the theoretical maximum number of items that could be put in your vector. On a 32-bit system, you could in theory allocate 4Gb == 2^32 which is 2^32 char values, 2^30 int values or 2^29 double values. It would appear that your implementation is using that value, but subtracting 1.
Of course, you could never really allocate a vector that big on a 32-bit system; you'll run out of memory long before then.
There is no requirement on what value max_size() returns other than that you cannot allocate a vector bigger than that. On a 64-bit system it might return 2^64-1 for char, or it might return a smaller value because the system only has a limited memory space. 64-bit PCs are often limited to a 48-bit address space anyway.
max_size() returns
the maximum potential size the vector
could reach due to system or library
implementation limitations.
so I suppose that the maximum value is implementation dependent. On my machine the following code
std::vector<int> v;
cout << v.max_size();
produces output:
4611686018427387903 // built as 64-bit target
1073741823 // built as 32-bit target
so the formula 2^(64-size(type))-1 looks correct for that case as well.
Simply get the answer by
std::vector<dataType> v;
std::cout << v.max_size();
Or we can get the answer by (2^nativePointerBitWidth)/sizeof(dataType) - 1. For example, on a 64 bit system, long long is (typically) 8 bytes wide, so we have (2^64)/8 - 1 == 2305843009213693951.