Functioning of sizeof() operator in C++ - c++

I wrote a simple program in order to understand the functioning of the function of the standard c++ library sizeof().
It follows:
const char* array[] = {
"1234",
"5678"
};
std::cout << sizeof(array) << std::endl;//16
std::cout << sizeof (array[0]) << std::endl;//8
std::cout << printf("%lu\n",sizeof (char) );//1
std::cout << printf("%lu\n",sizeof (int) );//24
std::cout << printf("%lu\n",sizeof (float) );//24
std::cout << printf("%lu",sizeof (double) );//281
It is possible to see by the output reported the characters has dimension 1 byte in my OS, as
expectable. But I do not understand why the dimension of '''array[0]''' is 8, as it contains 4 charcaters and at least other 2 charcaters for the end sequence "\n" which is contained in a string. Thus, I supposed that the number of bytes occupied by the first element of the array should be 6 and not 8.
Moreover, if I increase/decrease the number of charcaters contained in the first element of the array, the its size does not change.
Clearly, I am wrong. If somebody can explain me this functioning, I would really appreciate.
Thanks,

I wrote a simple program in order to understand the functioning of the function of the standard c++ library sizeof().
Wrong terminology. Please read n3337 (a C++ standard) and the wikipage on sizeof.
sizeof is a compile-time operator, not a function. If v is some variable, sizeof(v) only depends on the type of v and never on its value (in contrast, for most functions f, the value of f(v) depends upon the value of v).
And a good way to understand something about C++ is to refer to documents like standards or good web pages about it.
If somebody can explain me
Yes. Read a good book about C++. This one is written by the main designer of C++. Try to understand more and better the (difficult) semantics of C++. You could also study the source code of existing open source C++ compilers such as GCC or Clang/LLVM (thus effectively using one of your free software freedoms).
BTW, with a lot of pain you might find C++ implementations with sizeof(int) being 1 (e.g. for some DSP processors). On cheap 32 bits ARM processors (those in cheap mobile phones today, for instance; then you would probably use some cross-compiler) or on some Raspberry Pis (or perhaps some mainframes) you could have sizeof(array[0]) or sizeof(void*) being 4 even in 2019.

Let's break down the meaning of the somewhat confusing output values you see!
First, the sizeof(array) and sizeof(array[0]) (where your output method is fine). You have delared/defined array as an array of two char* values, each of which is a pointer. The size of a pointer on your system is 8 bytes, so the total size of array is: 8 * 2 = 16. For array[0]: this is a single pointer, so its size is simply 8 bytes.
Does all this make sense so far? If so, then let's look at the second part of your code …
The values for sizeof(char), sizeof(int), sizeof(float) and sizeof(double) are, on your system, in order, 1, 4, 4, and 8. These values are actually being output! However, as you are also outputting the return value of printf(), which is the number of characters it has written, you are getting the extra values, "2", "2", "2" and "1" inserted (in a confusing, and possibly undefined, order), for the four calls (the last one has no newline, so it's only one character; all others are one digit + newline = 2 characters).
Change the second part of your code as follows, to get the correct outputs:
printf("%zu\n", sizeof(char)); //1
printf("%zu\n", sizeof(int)); //4
printf("%zu\n", sizeof(float)); //4
printf("%zu\n", sizeof(double)); //8

Related

Why is this pointer 8 bytes?

I am learning C++, and read that when an array is passed into a function it decays into a pointer. I wanted to play around with this and wrote the following function:
void size_print(int a[]){
cout << sizeof(a)/sizeof(a[0]) << endl;
cout << "a ->: " << sizeof(a) << endl;
cout << "a[0] ->" << sizeof(a[0]) << endl;
}
I tried inputting an array with three elements, let's say
int test_array[3] = {1, 2, 3};
With this input, I was expecting this function to print 1, as I thought a would be an integer pointer (4 bytes) and a[0] would also be 4 bytes. However, to my surprise the result is 2 and sizeof(a) = 8.
I cannot figure out why a takes up 8 bytes, but a[0] takes up 4. Shouldn't they be the same?
Shouldn't they be the same?
No. a is (meant to be) an array (but because it's a function argument, has been adjusted to a pointer to the 1st element), and as such, has the size of a pointer. Your machine seems to have 64 bit addresses, and thus, each address (and hence, each pointer) is 64 bits (8 bytes) long.
a[0], on the other hand, is of the type that an element of that array has (an int), and that type has 32 bits (4 bytes) on your machine.
A pointer is just an address of memory where the start of the variable is located. That address is 8 bytes.
a[0] is a variable in the first place of the array. It technically could be anything of whatever size. When you take a pointer to it, the pointer just contains an address of memory (integer) without knowing or caring what this address contains. (This is just to illustrate the concept, in the example in the question, a[] is an integer array but the same logic works with anything).
Note, the size of the pointer is actually different on different architectures. This is where the 32-bit, 64-bit, etc. comes in. It can also depend on the compiler but this is beyond the question.
The size of the pointer depends on the system and implementation. Your uses 64 bits (8 bytes).
a[0] is an integer and the standard only gives an indication of the minimum max value it has to store. It can be anything from 2 bytes up. Most modern implementations use 32 bits (4 bytes) integers.
sizeof(a)/sizeof(a[0]) will not work on the function parameters. Arrays are passed by the reference and this division will only give you information how many times size of the pointer is larger than the size of an integer, but not the size of the object referenced by the pointer.

What does *(int*) mean in C++?

I encountered the following line in a OpenGL tutorial and I wanna know what does the *(int*) mean and what is its value
if ( *(int*)&(header[0x1E])!=0 )
Let's take this a step at a time:
header[0x1E]
header must be an array of some kind, and here we are getting a reference to the 0x1Eth element in the array.
&(header[0x1E])
We take the address of that element.
(int*)&(header[0x1E])
We cast that address to a pointer-to-int.
*(int*)&(header[0x1E])
We dereference that pointer-to-int, yielding an int by interpreting the first sizeof(int) bytes of header, starting at offset 0x1E, as an int and gets the value it finds there.
if ( *(int*)&(header[0x1E])!=0 )
It compares that resulting value to 0 and if it isn't 0, executes whatever is in the body of the if statement.
Note that this is potentially very dangerous. Consider what would happen if header were declared as:
double header [0xFF];
...or as:
int header [5];
It's truly a terrible piece of code, but what it's doing is:
&(header[0x1E])
takes the address of the (0x1E + 1)th element of array header, let's call it addr:
(int *)addr
C-style cast this address into a pointer to an int, let's call this pointer p:
*p
dereferences this memory location as an int.
Assuming header is an array of bytes, and the original code has been tested only on intel, it's equivalent with:
header[0x1E] + header[0x1F] << 8 + header[0x20] << 16 + header[0x21] << 24;
However, besides the potential alignment issues the other posters mentioned, it has at least two more portability problems:
on a platform with 64 bit ints, it will make an int out of bytes 0x1E to 0x25 instead of the above; it will be also wrong on a platform with 16 bit ints, but I suppose those are too old to matter
on a big endian platform the number will be wrong, because the bytes will get reversed and it will end up as:
header[0x1E] << 24 + header[0x1F] << 16 + header[0x20] << 8 + header[0x21];
Also, if it's a bmp file header as rici assumed, the field is probably unsigned and the cast is done to a signed int. In this case it doesn't matter as it's being compared to zero, but in some other case it may.

short pointer to a float

i run this code in c++:
#include <iostream>
using namespace std;
int main()
{
float f = 7.0;
short s = *(short *)&f;
cout << sizeof(float) << endl
<< sizeof(short) << endl
<< s << endl;
return 0;
}
i get the following out pot:
4
2
0
but, in a lecture given in Stanford university, Professor Jerry Cain says he is sure the out pot well not be 0.
the lecture is can be fond here. he says that around the 48 minute.
is he wrong, or that some standard change since? or is there a difference between platforms?
I'm using g++ to compile my code.
EDIT: in the next lecture he does mention "big endian" and "small endian" and says that they well affect the result.
static void bitPrint(float f)
{
assert(sizeof(int) == sizeof(float));
int *data = reinterpret_cast<int*>(&f);
for (int i = 0; i < sizeof(int) * 8; ++i)
{
int bit = (1 << i) & *data;
if (bit) bit = 1;
cout << bit;
}
cout << endl;
}
int main()
{
float f = 7.0;
bitPrint(f);
return 0;
}
This program prints 00000000000000000000011100000010
Since the sizeof(short) == 2 on your platform you get the first 2 bytes which are both zeros
Note that since size of types and possibly float implementation (not sure about this) are implementation defined different output can be seen on different platforms.
Well, let's see. First you write a float into the memory. It occupies 4 bytes, and it's value is 7. A float in the memory looks something like "sign bit -> exponent bits -> mantissa bits". I'm not sure how many bits are there for each part exactly, probably that depends on your platform.
Since the float's value is 7, it only occupies some of the least-significant bits on the right (I assume big-endian).
Your short pointer points to the beginning of the float, which means to the most significant bit. Since the value is greater than 0, the sign bit is zero. Since the float value is far on the right, we can say that those two most significant bytes are filled with zeros.
Now, provided that a size of short is 2, which means we will only take two bytes out of float's 4 bytes, we get our 0.
I believe though, that this result is rather UB and can differ on different platforms, compilers, etc.
Accessing data through a pointer to a different type than it was stored as gives (except in a few special cases) undefined behavour.
Firstly it's platform dependent how the data it stored so different systems may well give different values, and secondly the compiler might well generate code that doesn't even see the value you'd expect as it's allowed to do anything it likes when you do this (It's undefined behavour due to the strict aliases rules).
Having said that there are probably reasons why the number you are seeing is valid, but you can't rely on it unless you specifically know your platform will do what you expect, it's not guarenteed by the standard.
He's "pretty" sure it's not zero, he says that explicitly.
However, given that the representation of a short can be big-endian or little-endian, I wouldn't be so certain. In any case, this is a throwaway line at the end of a fifty-minute lecture so we can forgive him a little. It may be he came back in the next lecture with a clarification.
You would need to examine the underlying bits at (at least) a byte-by-byte level to understand what's going on.

Creating integer variable of a defined size

I want to define an integer variable in C/C++ such that my integer can store 10 bytes of data or may be a x bytes of data as defined by me in the program.
for now..!
I tried the
int *ptr;
ptr = (int *)malloc(10);
code. Now if I'm finding the sizeof ptr, it is showing as 4 and not 10. Why?
C and C++ compilers implement several sizes of integer (typically 1, 2, 4, and 8 bytes {8, 16, 32, and 64 bits}), but without some helper code to preform arithmetic operations you can't really make arbitrary sized integers.
The declarations you did:
int *ptr;
ptr = (int *)malloc(10);
Made what is probably a broken array of integers. Broken because unless you are on a system where (10 % sizeof(int) ) == 0) then you have extra bytes at the end which can't be used to store an entire integer.
There are several big number Class libraries you should be able to locate for C++ which do implement many of the operations you may want preform on your 10 byte (80 bit) integers. With C you would have to do operation as function calls because it lacks operator overloading.
Your sizeof(ptr) evaluated to 4 because you are using a machine that uses 4 byte pointers (a 32 bit system). sizeof tells you nothing about the size of the data that a pointer points to. The only place where this should get tricky is when you use sizeof on an array's name which is different from using it on a pointer. I mention this because arrays names and pointers share so many similarities.
Because on you machine, size of a pointer is 4 byte. Please note that type of the variable ptr is int *. You cannot get complete allocated size by sizeof operator if you malloc or new the memory, because sizeof is a compile time operator, meaning that at compile time the value is evaluated.
It is showing 4 bytes because a pointer on your platform is 4 bytes. The block of memory the pointer addresses may be of any arbitrary size, in your case it is 10 bytes. You need to create a data structure if you need to track that:
struct VariableInteger
{
int *ptr;
size_t size;
};
Also, using an int type for your ptr variable doesn't mean the language will allow you to do arithmetic operations on anything of a size different than the size of int on your platform.
Because the size of the pointer is 4. Try something like:
typedef struct
{
int a[10];
} big_int_t;
big_int_t x;
printf("%d\n", sizeof(x));
Note also that an int is typically not 1 byte in size, so this will probably print 20 or 40, depending on your platform.
Integers in C++ are of a fixed size. Do you mean an array of integers? As for sizeof, the way you are using it, it tells you that your pointer is four bytes in size. It doesn't tell you the size of a dynamically allocated block.
Few or no compilers support 10-byte integer arithmetic. If you want to use integers bigger than the values specified in <limits.h>, you'll need to either find a library with support for big integers or make your own class which defines the mathematical operators.
I believe what you're looking for is known as "Arbitrary-precision arithmetic". It allows you to have numbers of any size and any number of decimals. Instead of using fixed-size assembly level math functions, these libraries are coded to do math how one would do them on paper.
Here's a link to a list of arbitrary-precision arithmetic libraries in a few different languages, compliments of Wikipedia: link.

What's C++ Really Doing When I Accidently Use a Variables to Declare Array Length?

I was helping a friend with some C++ homework. I warned said friend that the kind of programming I do (PHP, Perl, Python) is pretty different from C++, and there were no guarantees I wouldn't tell horrible lies.
I was able to answer his questions, but not without stumbling over my own dynamic background. While I was reacquainting myself with C++ array semantics, I did something stupid like this (simplified example to make my question clearer)
#include <iostream>
#include <cstring>
using namespace std;
int main()
{
char easy_as_one_two_three[] = {'A','B','C'};
int an_int = 1;
//I want an array that has a length of the value
//that's currently in an_int (1)
//This clearly (to a c++ programmer) doesn't do that.
//but what is it doing?
char breaking_things[an_int];
cout << easy_as_one_two_three << endl;
return 1;
}
When I compile and run this program, it produces the following output
ABC????
However, if I comment out my bogus array declaration
#include <iostream>
#include <cstring>
using namespace std;
int main()
{
char easy_as_one_two_three[] = {'A','B','C'};
int an_int = 1;
//I want an array that has a length of the value
//that's currently in an_int (1)
//This clearly (to a c programmer) doesn't do that.
//but what is it doing?
//char breaking_things[an_int];
cout << easy_as_one_two_three << endl;
return 1;
}
I get the output I expect:
ABC
So, what exactly is happening here? I understand (vaguely) that when you create an array, you're pointing to a specific memory address, and when you give an array a length, you're telling the computer "reserve the next X blocks for me".
What I don't understand is, when I use a variable in an array declaration, what am I telling the computer to do, and why does it have an effect on a completely separate array?
Compiler is g++, version string is
science:c++ alanstorm$ g++ -v
Using built-in specs.
Target: i686-apple-darwin9
Configured with: /var/tmp/gcc/gcc-5493~1/src/configure --disable-checking -enable-werror --prefix=/usr --mandir=/share/man --enable-languages=c,objc,c++,obj-c++ --program-transform-name=/^[cg][^.-]*$/s/$/-4.0/ --with-gxx-include-dir=/include/c++/4.0.0 --with-slibdir=/usr/lib --build=i686-apple-darwin9 --with-arch=apple --with-tune=generic --host=i686-apple-darwin9 --target=i686-apple-darwin9
Thread model: posix
gcc version 4.0.1 (Apple Inc. build 5493)
Update:
Neil pointed out in his comment to the question that you will get error if you compile this with -Wall and -pedantic flags in g++.
error: ISO C++ forbids variable-size array
You are getting ABC???? because it prints the contents of the array (ABC) and continues to print until it encounters a \0.
Had the array been {'A','B','C', '\0'};, the output will be just ABC as expected.
Variable-length arrays were introduced in C99 - this doesn't seem to apply to C++ though.
It is undefined behavior. Even if you comment out the bogus declaration, the printed output is not always what you expect (ABC). Try giving ASCII values of some printable character (something between 32 and 126) to an_int instead of 1 and you will see the difference.
an_int output
------------------------
40 ABC(
65 ABCA
66 ABCB
67 ABCC
296 ABC(
552 ABC(
1064 ABC(
1024*1024 + 40 ABC(
See the pattern here? Apparently it interprets the last byte (LSB) of the an_int as a char, prints it, somehow finds a null char afterwards and stops printing. I think the "somehow" has to do something with the MSB portion of an_int being filled with zeros, but I'm not sure (and couldn't get any results to support this argument either).
UPDATE: It is about the MSB being filled zeros. I got the following results.
ABC( for 40 - (3 zero bytes and a 40),
ABC(( for 10280 (which is (40 << 8) + 40) - (2 zero bytes and two 40s),
ABC((( for 2631720 (which is (10280 << 8) + 40) - (1 zero byte and three 40s),
ABC((((°¿® for 673720360 (which is (2631720 << 8) + 40) - no zero bytes and hence prints random chars until a zero byte is found.
ABCDCBA0á´¿á´¿® for (((((65 << 8) + 66) << 8) + 67) << 8) + 68;
These results were obtained on a little endian processor with 8-bit atomic element size and 1-byte address increment, where 32 bit integer 40 (0x28 in hex) is represented as 0x28-0x00-0x00-0x00 (LSB at the lowest address). Results might vary from compiler to compiler and platform to platform.
Now if you try uncommenting the bogus declaration, you will find that all the outputs are of the form ABC-randomchars-char_corresponding_to_an_int. This again is the result of undefined behavior.
That will not "reacquaint" you "with c++ array semantics" since in C++ it is simply illegal. In C++ arrays can only be declared with sizes defined by Integral Constant Expressions (ICE). In your example the size is not an ICE. It only compiles because of GCC-specific extension.
From the C point of view, this is actually perfectly legal in C99 version of the language. And it does produce a so-called Variable Length Array of length 1. So your "clearly" comment is incorrect.
It isn't invalid syntax. It's syntactically just fine.
It's semantically invalid C++, and rejected by my compiler (VC++). g++ seems to have an extension that allow the use of C99 VLAs in C++.
The reason for the question marks is that your array of three characters is not null terminated; it's printing until it finds a null on the stack. The layout of the stack is influenced by the variables declared on the stack. With the array, the layout is such that there's garbage prior to the first null; without the array there isn't. That is all.
You get the output that you expect or don't expect by dumb luck. Because you didn't null terminate the characters in your array, when you go to print it out to cout it'll print the A, the B, and the C, and whatever else it finds until it hits a NULL character. With the array declaration, there's probably something that the compiler is pushing onto the stack to make the array sized at runtime that's leaving you with garbage characters after the A, B, and C whereas when you don't there just happens to be a 0 after the C on the stack.
Again, it's just dumb luck. To always get what you expect you should do: char easy_as_one_two_three[] = { 'A','B','C','\0'}; or, probably more usefully char easy_as_one_two_three[] = "ABC";, which will properly null terminate the string.
char breaking_things[an_int] is allocating char array of size an_int (in your case 1), It's called variable length array and it's a relatively new feature.
In case like this it's more common to dynamically allocate memory using new:
char* breaking_things = new char[an_int]; // C++ way, C programmer would use malloc
It's probably not breaking_things that broke things. The first array is not a NUL (\0) terminated string, which explains the output - cout will print whatever comes after ABC up until the first NUL it encounters.
As for the size of breaking_things, I would suspect it differs between compilers. I believe at least earlier versions of gcc used whatever value the variable happened to have at compile time, which can be tricky to determine.
Output is like this since it will print the content of the char array until it finds a null character .
Make sure that char array must be null terminated string and specify the size of the array --> total chars + 1 (for null char) .