Difference between Array initializations - c++

Please see the following statements:
char a[5]="jgkl"; // let's call this Statement A
char *b="jhdfjnfnsfnnkjdf"; // let's call this Statement B , and yes i know this is not an Array
char c[5]={'j','g','k','l','\0'}; // let's call this Statement C
Now, is there any difference between Statements A and C?
I mean both should be on Stack dont they? Only b will be at Static location.
So wouldn't that make "jgkl" exist at the static location for the entire life of the program? Since it is supposed to be read-only/constant?
Please clarify.

No, because the characters "jgkl" from Statement A are used to initialize a, it does not create storage in the executable for a character string (other than the storage you created by declaring a). This declaration creates an array of characters in read-write memory which contain the bytes {'j','g','k','l','\0'}, but the string which was used to initialize it is otherwise not present in the executable result.
In Statement B, the string literal's address is used as an initializer. The variable char *b is a pointer stored in read-write memory. It points to the character string "jhdfjnfnsfnnkjdf". This string is present in your executable image in a segment often called ".sdata", meaning "static data." The string is usually stored in read-only memory, as allowed by the C standard.
That is one key difference between declaring an array of characters and a string constant: Even if you have a pointer to the string constant, you should not modify the contents.
Attempting to modify the string constant is "undefined behavior" according to ANSI C standard section 6.5.7 on initialization.

If a[] is static then so is c[] - the two are equivalent, and neither is a string literal. The two could equally well be declared so that they were on the stack - it depends where and how they are declared, not the syntax used to specify their contents.

the value "jgkl" may never be loaded into working memory. Before main is ever called, a function (often called cinit) is run. One of the things this function does is initialize static and file-scope variables. On the DSP compiler I use, the initial values are stored in a table which is part of the program image. The table's format is unrelated to the format of the variables that are being initialized. The initializer table remains part of the program image and is never copied to RAM. Simply put, there is nowhere in memory that I can reliably access "jgkl".
Small strings like a might not even be stored in that table at all. The optimizer may reduce that to the equivalent (pseudoinstruction) store reg const(152<<24|167<<16|153<<8|154)
I suspect most compilers are similar.

A and C are exactly equivalent. The syntax used in A is an abbreviation for the syntax in C.
Each of the objects named a and c is an array of bytes of length 5, stored at a certain location in memory which is fixed during execution. The program can change the element bytes at any time. It is the compiler's responsibility to decide how to initialize the objects. The compiler might generate something similar to a[0] = 'j'; a[1] = 'g'; ..., or something similar to memcpy(a, static_read_only_initialization_data[1729], 5), or whatever it chooses. The array is on the (conceptual) stack if the declaration occurs in a function, or in global writable memory if the declaration occurs at file scope.
The object named b is a pointer to a byte. Its initial value is a pointer to string literal memory, which is read-only on many implementations that have read-only memory, but not all. The value of b could change (for example to point to a different string, or to NULL). The program is not allowed to change the contents of jhdfjnfnsfnnkjdf", though as usual in C the implementation may not enforce this.

C-Literals always are read-only.
a) allocates 5 bytes memory and store the
CONTENT from literal incl. '\0' in it
b) allocates sizeof(size_t) bytes
memory and store the literal-address in it
c) allocates 5 bytes memory and
store the 5 character-values in it

Related

Storage of literal constants in c++

I would like to know where literal constants are actually stored in the memory?
example:
int i = 5;
char* data = char* &("abcdefgh");
the storage sections of i and data depends on where they are declared.
But does the compiler store 5 and "abcdefgh" before actually copying it to the variables?
And here I can get the address of "abcdefgh" where it is stored, but why can't I get the address of 5?
Integer literals like 5 can be part of machine instructions. For example:
LD A, 5
would load the value 5 into processor register A for some imaginary architecture, and as the 5 is actually part of the instruction, it has no address. Few (if any) architectures have the ability to create string literals inline in the machine instructions, so these have to actually be stored elsewhere in memory and accessed via pointers. Exactly where "elsewhere" is is not specified by the C++ Standard.
On the language level, string literals and numeric literals are different beasts.
The C and C++ standard essentially specify that string literals are treated "as if" you defined a constant array of characters with the appropriate size and content, and then you used its name in place of the literal. IOW, when you write
const char *foo = "hello";
it's as if you wrote
// in global scope
const char hello_literal[6] = {'h', 'e', 'l', 'l', 'o', '\0'};
...
const char *foo = hello_literal;
(there are some backwards-compatibility exceptions that allow you to even write char *foo = "hello";, without the const, but that's deprecated and it's undefined behavior anyway to try to write through such a pointer)
So, given this equivalence it's normal that you can have the address of the string literal. Integral literals, OTOH, are rvalues, for which the standard specifies that you cannot take any address - you can roughly think of them as values that the standard expect not to have a backing memory location in the conventional sense.
Now, this distinction actually descends from the fact that on the machine level they are usually implemented differently.
A string literal generally is stored as data somewhere in memory, typically in a read-only data section that gets mapped in memory straight from the executable. When the compiler needs its address it's easy to oblige, since it is data stuff that is already in memory, and thus it does have an address.
Instead, when you do something like
int a = 5;
the 5 does not really have a separate memory location like the "hello world" array above, but it's usually embedded into the machine code as an immediate value.
It's quite complicated to have a pointer to it, since it would be a pointer pointing halfway into an instruction, and in general pointing to data in a different format than what be expected for a regular int variable to which you can point - think x86 where for small numbers you use more compact encodings, or PowerPC/ARM and other RISC architectures where some values are built from an immediate manipulated by the implicit barrel shifter and you cannot even have immediates for some values - you have to compose them out of several instructions, or Harvard architectures where data and code live in different address spaces.
For this reason, you cannot take the address of numeric literals (as well as of numeric expressions evaluation results and much other temporary stuff); if you want to have the address of a number you have to first assign it to a variable (which can provide an in-memory storage), and then ask for its address.
Although the C and C++ standards don't dictate where the literals are stored, common practice stores them in one of two places: in the code (see #NeilButterworth answer), or in a "constants" segment.
Common executable files have a code section and a data section. The data segment may be split up into read-only, uninitialized read/write and initialized read-write. Often, the literals are placed into the read-only section of the executable.
Some tools may also place the literals into a separate data file. This data file may be used to program the data into read-only memory devices (ROM, PROM, Flash, etc.).
In summary, the placement of literals is implementation dependent. The C and C++ standards state that writing to the location of literals is undefined behavior. Preferred practice with character literals is to declare the variable as const so compiler can generate warnings or errors when a write to a literal occurs.

Which is more efficient way for storing strings, an array or a pointer in C/C++? [duplicate]

This question already has answers here:
What is the difference between char s[] and char *s?
(14 answers)
Closed 6 years ago.
We can store string using 2 methods.
Method 1: using array
char a[]="str";
Method 2:
char *b="str";
In method 1 the memory is used only in storing the string "str" so the memory used is 4 bytes.
In method 2 the memory is used in storing the string "str" on 'Read-Only-Memory' and then in storing the pointer to the 1st character of the string.
So the memory used must be 4 bytes for storing string in ROM and then 8 bytes for storing pointer (in 64-bit machine) to the first character.
In total the 1st method uses 4 bytes and the method 2 uses 12 bytes. So is the method 1 always better than method 2 for storing strings in C/C++.
Except if you use a highly resource limited system, you should not care too much for the memory used by a pointer. Anyway, optimizing compilers could lead to same code in both cases.
You should care more about Undefined Behaviour in second case!
char a[] = "str";
correctly declares a non const character array which is initialized to "str". That means that a[0] = 'S'; is perfectly allowed and will change a to "Str".
But with
char *b = "str";
you declare a non const pointer to a litteral char array which is implicitely const. That means that b[0] = 'S'; tries to modify a litteral string and is Undefined Behaviour => it can work, segfault or anything in between including not changing the string.
All of the numbers that you cite, and the type of the memory where the string literal is stored are platform specific.
Which is more efficient way for storing strings, an array or a pointer
Some pedantry about terminology: A pointer can not store a string; it stores an address. The string is always stored in an array, and a pointer can point to it. String literals in particular are stored in an array of static storage duration.
Method 1: using array char a[]="str";
This makes a copy of the content of the string literal, into a local array of automatic storage duration.
Method 2: char *b="str";
You may not bind a non const pointer to a string literal in standard C++. This is ill-formed in that language (since C++11; prior to that the conversion was merely deprecated). Even in C (and extensions of C++) where this conversion is allowed, this is quite dangerous, since you might accidentally pass the pointer to a function that might try to modify the pointed string. Const correctness replaces accidental UB with a compile time error.
Ignoring that, this doesn't make a copy of the literal, but points to it instead.
So is the method 1 always better than method 2 for storing strings in C/C++.
Memory use is not the only metric that matters. Method 1 requires copying of the string from the literal into the automatic array. Not making copies is usually faster than making copies. This becomes more and more important with increasingly longer strings.
The major difference between method 1 and 2, are that you may modify the local array of method 1 but you may not modify string literals. If you need a modifiable buffer, then method 2 doesn't give you that - regardless of its efficiency.
Additional considerations:
Suppose your system is not a RAM-based PC computer but rather a computer with true non-volatile memory (NVM), such as a microcontroller. The string literal "str" would then in both cases get stored in NVM.
In the array case, the string literal has to be copied down from NVM in run-time, whereas in the pointer case you don't have to make a copy, you can just point straight at the string literal.
This also means that on such systems, assuming 32 bit, the array version will occupy 4 bytes of RAM for the array, while the pointer version will occupy 4 bytes of RAM for the pointer. Both cases will have to occupy 4 bytes of NVM for the string literal.

Initializing an Array using Size of Operator?

I have an array that I would like to initialize
char arr[sizeof(int)];
Would this expression evaluate to a compile time constant or result in a function call?
char arr[sizeof(int)];
As far as the language is concerned, it is fine, though the array is only declared (and defined), it is NOT initialized if it is a local variable. If it is declared at namespace level, then it is statically zero-initialized.
Note that sizeof(int) is a constant expression of size_t type; its value is known at the compile time.
This is an initialization:
char arr[sizeof(int)] = { 'A', 'B', '0', 'F' };
This of course assumes that sizeof(int) is (at least) 4, or it will fail to compile.
And to answer the actual (new) question:
sizeof() is a compile time operator. In C++ [according to the standard, some compilers do allow C style variable length arrays], it will not result in anything other than a compile time constant. In C, with variable length arrays, it can become a simple calculation (number of elements * size of each element - where number of elements is the variable part).
There is no initialization here. There's nothing wrong with declaring or defining an array with sizeof(int) elements, except that it might look a little odd to readers of the code. But if that's what you need, that's what you should write.
It really depends on how you intend to use the array.
sizeof(int) may vary with different implementations so you just need to be careful how you access the elements in the array. Don't go assuming that an element that is accessible on your machine is accessible on another, unless it's within the minimum sizes specified in the C++ standard.
sizeof is evaluated at compile time, the only time sizeof would would be run time evaluated would be in the case of variable length arrays in C99 code or in gcc or other c++ compilers that support VLA as an extension. So this code is valid:
char arr[sizeof(int)];
Although if it is local variable it won't be initialized.

C++: Changing value of Pointer Strings

There is something that has been bugging me for a while and I need an anwer for it,
char *p = "hello world";
p="wazzup";
p="Hey";
Here I declare an pointer to point to a string (or in other words I made a string by using a pointer)
I have had some strange results with this that i normally wouldnt have gotten if I used an char array string
cout <<p<< endl; //"Hey" Gets printer
cout <<p+8<< endl; // I kept adding numbers till "wazzup" got printed
cout <<p+29<< endl; // No matter how much I increment, I cant print "Hello World"
So my question is:
When I change the value that a char pointer is pointing to. Does it
overwrite the original Data like it would do with char array;
or it creates a new string right before it in the memory and points to it;
or does it add the new string at the begining of the old one(including null);
or does it create a new string in a new place in the memory and I was able to print "wazzup" only by chance
It does none of the options above. Changing the value of a pointer merely changes to the address in memory it points to. In each case of the assignments to p, it is set to point to the first character of a (different) string literal - which is stored in memory.
The behaviour of using a pointer that points beyond the end a string literal such as
cout <<p+8<< endl
is undefined. This is why using pointers is fraught with danger.
The behaviour you are seeing is implementation dependant: The compiler stores the strings literals adjacent in memory, so running off the end of one runs into another. Your program might equally have crashed when compiled with a different compiler.
By adding to the pointer, you are increasing the address value... so while printing it will print the value stored at that memory location and nothing else...
If you could print
"hello world"
"wazzup"
that will be a fluke :)
All three of these are string literal constants. They appear directly in your executable's binary, and each time you assign p, you point to these locations in memory. They are completely separate memory; reassigning p to another one does not modify any string data.
You are only making the pointer point to something else each time you assign to it. So there is no scope for data being overwritten anywhere. What happens under the hood is implementation dependent, so what you see is just by pure chance. When you do this:
cout <<p+8<< endl;
you are going beyond the bounds of the string literal, invoking undefined behaviour.
When I change the value that a char pointer is pointing to. Does it
-overwrite the original Data like it would do with char array.
No, the data part of your address space contains all the three strings "Hello World", "wazzup", and "Hey". When you change the pointer, you are just changing the value in p to the starting address of either of above strings. Changing the address a pointer is pointing to and changing the value a pointer is pointing to are two different things.
-or it creates a new string right before it in the memory and points to it.
The compiler creates the strings(character bytes) at the compile time and NOT at run-time.
-or does it create a new string in a new place in the memory and I was able to print "wazzup" only by chance
I think above answer covers this question.
-or does it add the new string at the begining of the old one.(including null)
It depends on the compiler specification.
The three strings are not located in same memory position. This three memory may be sequential or different location.If compiler allocation three memory then you find it using +/- some value. It totally depends on compiler. In c you can ready any memory location so you never get any error while +/- then pointer p.
Since they 3 are not same strings, they are located at different parts of memory. I think they could be separated by 64 byte aligning. try p+64 :)
Only same strings sit in the same location of memory and only if compiler supports it.
There is a probability "wazzup" at p+64 and "hey" at p+128(if you use VC++ 2010 express and you have pentium-m cpu and you use windows xp sp-3)
cout <<*(p+64)<< endl;
cout <<*(p+128)<< endl;

String class e char in C++

When I have
char anything[20];
cout << sizeof anything;
it prints 20.
However
string anymore;
cout << sizeof anymore; // it prints 4
getline(cin, anymore); // let's suppose I type more than one hundred characters
cout << sizeof anymore; // it still prints 4 !
I would like to understand how c++ manages this. Thanks
sizeof is a compile-time construct. It has nothing to do with runtime, but rather gives a fixed result based on the type passed to it (or the type of the value passed to it). So char[20] is 20 bytes, but a string might be 4 or 8 bytes or whatever depending on the implementation. The sizeof isn't telling you how much storage the string allocated dynamically to hold its contents.
sizeof is a compile-time operator. It tells you the size of the type.
It's because of anything - is array with 20 characters. Sizeof of each character 1 byte - so, totally 20 bytes.
And string class contain pointer for the begin of char-array and size_t(unsigned int for example) - it's 4 bytes. sizeof doesn't know how many memory you allocated for the string, it know just that you have pointer for something, because it's compile-time function.
sizeof isn't what you have decided that it should be. It doesn't magically perceive the semantics of whatever type you throw at it. All it knows is how many bytes are used up, directly for storing the instance of a type.
For an array of five characters, that's 5. For a pointer (to anything, including an array), that's usually 4 or 8. For an std::string, it's however many bytes your C++ Standard Library implementation happens to need to do its work. This work usually involves dynamic allocation, so the four bytes you're looking likely represent just enough storage for a pointer.
It is not to be confused with specific "size" semantics. For std::string, that's anymore.length(), which uses whatever internal magic is required to calculate the length of the buffer of characters that it's stored somewhere, possibly (and usually) indirectly.
For what it's worth, I'm very surprised that a std::string could take up only four bytes. I'd expect it'd store at least "length" and a pointer, which is usually going to take more than four bytes.
The 'string' type is a class template. String instances are accessed by references, AKA pointers. 4 bytes on 32-bit systems.