C++ strings: [] vs. * - c++

Been thinking, what's the difference between declaring a variable with [] or * ? The way I see it:
char *str = new char[100];
char str2[] = "Hi world!";
.. should be the main difference, though Im unsure if you can do something like
char *str = "Hi all";
.. since the pointer should the reference to a static member, which I don't know if it can?
Anyways, what's really bugging me is knowing the difference between:
void upperCaseString(char *_str) {};
void upperCaseString(char _str[]) {};
So, would be much appreciated if anyone could tell me the difference? I have a hunch that both might be compiled down the same, except in some special cases?
Ty

Let's look into it (for the following, note char const and const char are the same in C++):
String literals and char *
"hello" is an array of 6 const characters: char const[6]. As every array, it can convert implicitly to a pointer to its first element: char const * s = "hello"; For compatibility with C code, C++ allows one other conversion, which would be otherwise ill-formed: char * s = "hello"; it removes the const!. This is an exception, to allow that C-ish code to compile, but it is deprecated to make a char * point to a string literal. So what do we have for char * s = "foo"; ?
"foo" -> array-to-pointer -> char const* -> qualification-conversion -> char *. A string literal is read-only, and won't be allocated on the stack. You can freely make a pointer point to them, and return that one from a function, without crashing :).
Initialization of an array using a String literal
Now, what is char s[] = "hello"; ? It's a whole other thing. That will create an array of characters, and fill it with the String "hello". The literal isn't pointed to. Instead it is copied to the character-array. And the array is created on the stack. You cannot validly return a pointer to it from a function.
Array Parameter types.
How can you make your function accept an array as parameter? You just declare your parameter to be an array:
void accept_array(char foo[]);
but you omit the size. Actually, any size would do it, as it is just ignored: The Standard says that parameters declared in that way will be transformed to be the same as
void accept_array(char * foo);
Excursion: Multi Dimensional Arrays
Substitute char by any type, including arrays itself:
void accept_array(char foo[][10]);
accepts a two-dimensional array, whose last dimension has size 10. The first element of a multi-dimensional array is its first sub-array of the next dimension! Now, let's transform it. It will be a pointer to its first element again. So, actually it will accept a pointer to an array of 10 chars: (remove the [] in head, and then just make a pointer to the type you see in your head then):
void accept_array(char (*foo)[10]);
As arrays implicitly convert to a pointer to their first element, you can just pass an two-dimensional array in it (whose last dimension size is 10), and it will work. Indeed, that's the case for any n-dimensional array, including the special-case of n = 1;
Conclusion
void upperCaseString(char *_str) {};
and
void upperCaseString(char _str[]) {};
are the same, as the first is just a pointer to char. But note if you want to pass a String-literal to that (say it doesn't change its argument), then you should change the parameter to char const* _str so you don't do deprecated things.

The three different declarations let the pointer point to different memory segments:
char* str = new char[100];
lets str point to the heap.
char str2[] = "Hi world!";
puts the string on the stack.
char* str3 = "Hi world!";
points to the data segment.
The two declarations
void upperCaseString(char *_str) {};
void upperCaseString(char _str[]) {};
are equal, the compiler complains about the function already having a body when you try to declare them in the same scope.

Okay, I had left two negative comments. That's not really useful; I've removed them.
The following code initializes a char pointer, pointing to the start of a dynamically allocated memory portion (in the heap.)
char *str = new char[100];
This block can be freed using delete [].
The following code creates a char array in the stack, initialized to the value specified by a string literal.
char [] str2 = "Hi world!";
This array can be modified without problems, which is nice. So
str2[0] = 'N';
cout << str2;
should print Ni world! to the standard output, making certain knights feel very uncomfortable.
The following code creates a char pointer in the stack, pointing to a string literal... The pointer can be reassigned without problems, but the pointed block cannot be modified (this is undefined behavior; it segfaults under Linux, for example.)
char *str = "Hi all";
str[0] = 'N'; // ERROR!
The following two declarations
void upperCaseString(char *_str) {};
void upperCaseString(char [] _str) {};
look the same to me, and in your case (you want to uppercase a string in place) it really doesn't matters.
However, all this begs the question: why are you using char * to express strings in C++?

As a supplement to the answers already given, you should read through the C FAQ regarding arrays vs. pointers. Yes it's a C FAQ and not a C++ FAQ, but there's little substantial difference between the two languages in this area.
Also, as a side note, avoid naming your variables with a leading underscore. That's reserved for symbols defined by the compiler and standard library.

Please also take a look at the http://c-faq.com/aryptr/aryptr2.html The C-FAQ might prove to be an interesting read in itself.

The first option dynamically allocates 100 bytes.
The second option statically allocates 10 bytes (9 for the string + nul character).
Your third example shouldn't work - you're trying to statically-fill a dynamic item.
As to the upperCaseString() question, once the C-string has been allocated and defined, you can iterate through it either by array indexing or by pointer notation, because an array is really just a convenient way to wrap pointer arithmetic in C.
(That's the simple answer - I expect someone else will have the authoritative, complicated answer out of the spec :))

Related

how character pointer could be used to point a string in c++?

First of all I am beginner in C++. I was trying to learn about type casting in C++ with strings and character pointer. Is it possible to point a string with a character pointer?
int main() {
string data="LetsTry";
cout<<(&data)<<"\n";
cout<<data<<"\n"<<"size "<<sizeof(data)<<"\n";
//char *ptr = static_cast<char*>(data);
//char *ptr=(char*)data;
char *ptr = reinterpret_cast<char*>(&data);
cout<<(ptr)<<"\n";
cout<<*ptr;
}
The above code yields outcome as below:
0x7ffea4a06150
LetsTry
size 32
`a���
`
I understand as ptr should output the address 0x7ffea4a06150
Historically, in C language strings were just a memory areas filled with characters. Consequently, when a string was passed to a function, it was passed as a pointer to its very first character, of type char *, for mutable strings, or char const *, if the function had no intent to modify string's contents. Such strings were delimited with a zero-character ((char)0 a.k.a. '\0') at the end, so for a string of length 3 you had to allocate at least four bytes of memory (three characters of the string itself plus the zero terminator); and if you only had a pointer to a string's start, to know the size of the string you'd have to iterate it to find how far is the zero-char (the standard function strlen did it). Some standard functions accepted en extra parameter for a string size if you knew it in advance (those starting with strn or, more primitive and effective, those starting with mem), others did not. To concatenate two strings you first had to allocate a sufficient buffer to contain the result etc.
The standard functions that process char pointers can still be found in STL, under the <cstring> header: https://en.cppreference.com/w/cpp/header/cstring, and std::string has synonymous methods c_str() and data() that return char pointers to its contents, should you need it.
When you write a program in C++, its main function has the header of int main(int argc, char *argv[]), where argv is the array of char pointers that contains any command-line arguments your program was run with.
Ineffective as it is, this scheme could still be regarded as an advantage over strings of limited capacity or plain fixed-size character arrays, for instance in mid-nineties, when Borland introduced the PChar type in Turbo Pascal and added a unit that exported Pascal implementations of functions from C's string.h.
std::string and const char* are different types, reinterpret_cast<char*>(&data) means reinterpret the bits located at &data as const char*, which is not we want in this case.
so assuming we have type A and type B:
A a;
B b;
the following are conversion:
a = (A)b; //c sytle
// and
a = A(b);
// and
a = static_cast<A>(b); //c++ style
the following are bit reinterpretation:
a = *(A*)&b; //c style
// and
a = *reinterpret_cast<A*>(&b); //c++ style
finally, this should works:
int main() {
string data = "LetsTry";
const char *ptr = data.c_str();
cout<< ptr << "\n";
}
bit reinterpretation is sometimes used, like when doing bit manipulation of a floating point number, but there are some rules to follow like this one What is the strict aliasing rule?
also note that cout << ptr << "\n"; is a specially case because feeds a pointer to std::cout usually output the address that pointer points to, but std::cout treats char* specially so that it output the content of that char array instead
In C++, string is class and what you doing is creating a string object. So, to use are char * you need to convert it using c_str()
You can refer below code:
std::string data = "LetsTry";
// declaring character array
char * cstr = new char [data.length()+1];
// copying the contents of the
// string to char array
std::strcpy (cstr, data.c_str());
Now, you can get use char * to point your data.

Is it possible for separately initialized string variables to overlap?

If I initialize several string(character array) variables in the following ways:
const char* myString1 = "string content 1";
const char* myString2 = "string content 2";
Since const char* is simply a pointer a specific char object, it does not contain any size or range information of the character array it is pointing to.
So, is it possible for two string literals to overlap each other? (The newly allocated overlap the old one)
By overlap, I mean the following behaviour;
// Continue from the code block above
std::cout << myString1 << std::endl;
std::cout << myString2 << std::endl;
It outputs
string costring content 2
string content 2
So the start of myString2 is somewhere in the middle of myString1. Because const char* does not "protect"("possess") a range of memory locations but only that one it points to, I do not see how C++ can prevent other string literals from "landing" on the memory locations of the older ones.
How does C++/compiler avoid such problem?
If I change const char* to const char[], is it still the same?
Yes, string literals are allowed to overlap in general. From lex.string#9
... Whether all string-literals are distinct (that is, are stored in nonoverlapping objects) and whether successive evaluations of a string-literal yield the same or a different object is unspecified.
So it's up to the compiler to make a decision as to whether any string literals overlap in memory. You can write a program to check whether the string literals overlap, but since it's unspecified whether this happens, you may get different results every time you run the program.
A string is required to end with a null character having a value of 0, and can't have such a character in the middle. So the only case where this is even possible is when two strings are equal from the start of one to the end of both. That is not the case in the example you gave, so those two particular strings would never overlap.
Edit: sorry, I didn't mean to mislead anybody. It's actually easy to put a null character in the middle of a string with \0. But most string handling functions, particularly those in the standard library, will treat that as the end of a string - so your strings will get truncated. Not very practical. Because of that the compiler won't try to construct such a string unless you explicitly ask it to.
The compiler knows the size of each string, because it can "see" it in your code.
Additionally, they are not allocated the same way, that you would allocate them at run-time. Instead, if the strings are constant and defined globally, they are most likely located in the .text section of the object file, not on the heap.
And since the compiler knows the size of a constant string at compile-time, it can simply put its value in the free space of the .text section. The specifics depend on the compiler you use, but be assured the people who wrote are smart enough to avoid this issue.
If you define these strings inside some function instead, the compiler can choose between the first option and allocating space on the stack.
As for the const char[], most compilers will treat it the same way as const char*.
Two string literals will not likely overlap unless they are the same. In that case though the pointers will be pointing to the same thing. (This isn't guaranteed by the standard though, but I believe any modern compiler should make this happen.)
const char *a = "Hello there."
const char *b = "Hello there."
cout << (a == b);
// prints "1" which means they point to the same thing
The const char * can share a string though.
const char *a = "Hello there.";
const char *b = a + 6;
cout << a;
// prints "Hello there."
cout << b;
// prints "there."
I think to answer your second question an explanation of c-style strings is useful.
A const char * is just a pointer to a string of characters. The const means that the characters themselves are immutable. (They are stored as part of the executable itself and you wouldn't want your program to change itself like this. You can use the strings command on unix to see all the strings in an executable easily i.e. strings a.out. You will see many more strings than what you coded as many exist as part of the standard library other required things for an executable.)
So how does it know to just print the string and then stop at the end? Well a c-style string is required to end with a null byte (\0). The complier implicitly puts it there when you declare a string. So "string content 1" is actually "string content 1\0".
const char *a = "Hello\0 there.";
cout << a;
// prints "Hello"
For the most part const char *a and const char a[] are the same.
// These are valid and equivalent
const char *a = "Hello";
const char b[] = "there."
// This is valid
const char *c = b + 3; // *c = "re."
// This, however, is not valid
const char d[] = b + 3;

Initialization of pointers in c++

I need to clarify my concepts regarding the basics of pointer initialization in C++. As per my understanding, a pointer must be assigned an address before putting some value using the pointer.
int *p;
*p=10; //inappropriate
cout << *p <<"\n";
This would probably show the correct output (10) but this may cause issue in larger programs since p initially had garbage address which can be anything & may later be used somewhere else in the program as well.So , I believe this is incorrrect, the correct way is:
int *p;
int x=10;
p=&x; //appropriate
cout << *p <<"\n";
My question is, if the above understanding is correct, then does the same apply on char* as well?:
const char *str="hello"; // inappropriate
cout << str << "\n";
//OR
const string str1= "hello";
const char str2[6] ="world";
const char *str=str1; //appropriate
const char *st=str2; //appropriate
cout << str << st << "\n";
Please advice
Your understanding of strings is incorrect.
Lets take for example the very first line:
const char *str="hello";
This is actually correct. A string literal like "hello" is turned into a constant array by the compiler, and like all arrays it can decay to a pointer to its first element. So what you are doing is making str point to the first character of the array.
Then lets continue with
const string str1= "hello";
const char *str=str1;
This is actually wrong. A std::string object have no casting operator defined to cast to a const char *. The compiler will give you an error for this. You need to use the c_str function go get a pointer to the contained string.
Lastly:
const char str2[6] ="world";
const char *st=str2; //appropriate
This is really no different than the first line when you declare and initialize str. This is, as you say, "appropriate".
About that first example with the "inappropriate" pointer:
int *p;
*p=10; //inappropriate
cout << *p <<"\n";
This is not only "inappropriate", this leads to undefined behavior and may actually crash your program. Also, the correct term is that the value of p is indeterminate.
When I declare a pointer
int *p;
I get an object p whose values are addresses. No ints are created anywhere. The thing you need to do is think of p as being an address rather than being an int.
At this point, this isn't particularly useful since you have no addresses you could assign to it other than nullptr. Well, technically that's not true: p itself has an address which you can get with &p and store it in an int**, or even do something horrible like p = reinterpret_cast<int*>(&p);, but let's ignore that.
To do something with ints, you need to create one. e.g. if you go on to declare
int x;
you now have an int object whose values are integers, and we could then assign its address to p with p = &x;, and then recover the object from p via *p.
Now, C style strings have weird semantics — the weirdest aspect being that C doesn't actually have strings at all: it's always working with arrays of char.
String literals, like "Hello!", are guaranteed to (act1 like they) exist as an array of const char located at some address, and by C's odd conversion rules, this array automatically converts to a pointer to its first element. Thus,
const char *str = "hello";
stores the address of the h character in that character array. The declaration
const char str2[6] ="world";
works differently; this (acts1 like it) creates a brand new array, and copies the contents of the string literal "world" into the new array.
As an aside, there is an obsolete and deprecated feature here for compatibility with legacy programs, but for some misguided reason people still use it in new programs these days so you should be aware of it and that it's 'wrong': you're allowed to break the type system and actually write
char *str = "hello";
This shouldn't work because "hello" is an array of const char, but the standard permits this specific usage. You're still not actually allowed to modify the contents of the array, however.
1: By the "as if" rule, the program only has to behave as if things happen as I describe, but if you peeked at the assembly code, the actual way things happen can be very different.

Initialization of Chars [duplicate]

This question already has answers here:
How to initialize all members of an array to the same value?
(26 answers)
Closed 8 years ago.
I have been wondering, why can I not write my code like so:
char myChar[50];
myChar = "This is a really cool char!";
Or at least like this:
char myChar[50];
myChar[0] = "This is a really cool char!";
The second way makes more sense that it should work, to me, seeing that I would
start the array at the point I want it to start moving the letters into each spot in the
array.
Does anyone know why C++ does not do this? And can you show me the reasoning behind the
right way to do it?
Thank you all in advance!
The first line:
char myChar[50];
...allocates an array of 50 characters on the stack. The second line:
myChar = "This is a really cool char!";
Is attempting to assign a const static string (which exists in read-only memory in the text segment of your code) to the address of the beginning of the array. This is an incompatible LVALUE/RVALUE matcing/assignment. This approach:
const char* myChar = "This is a really cool char";
Will work, as the assignment of a pointer to address a string literal must be done at initialization time. There are potential exceptions, as in assigning a const char* pointer to a string literal like so:
/*******************************************************************************
* Preprocessor Directives
******************************************************************************/
#include <stdio.h>
/*******************************************************************************
* Function Prototypes
******************************************************************************/
const char* returnErrorString(int iError);
/*******************************************************************************
* Function Definitions
******************************************************************************/
int main(void) {
int i;
for (i=(-1); i<3; i++) {
printf("i=%d - Error String:%s\n", returnErrorString(i));
}
return 0;
}
const char* returnErrorString(int iError) {
const char* ret = NULL;
switch (iError) {
case 0:
ret = "No error";
break;
case (-1):
ret = "Invalid input";
break;
default:
ret = "Unknown error";
break;
}
return ret;
}
You might benefit from reading the post in my references below. It will give you some info on how code, variables, constants, etc, are broken into different segments of the final binary, and why some approaches don't even make sense. Also, it would be beneficial to read up a bit on terminology like integer literals, string literals, l-values, r-values, etc.
Good luck!
References
Difference between declared string and allocated string, Accessed 2014-05-01, <https://stackoverflow.com/questions/16021454/difference-between-declared-string-and-allocated-string>
You must initialise the array of chars inside the declaration of the array. There is actually no reason for not doing so, as if not, your array will contain garbage values until you initialise it. I advise you to look at this link:
Char array declaration and initialization in C
Also, you are allocating a char array of size 50 but only using 28 elements of it, this would appear to me to be a waste...
Try the following for simple string initialisations:
char mychar[11] = "hello world";
Or...
char *mychar = "hello world";
I hope this helps...
If you want to think of this in holistic terms, the reason is because myChar isn't a string -- it's just an array of char. Hence "FooGHBar" and char [50] are completely different types. Related in a sense, but really not.
Now some might say, "but "FooBar" is a string, and char [50] is really just a string too." But that is going on the assumption that myChar is the same as "FooBar", and it's not. It's also assuming that the compiler understands that both char[50] and char* are pointers to strings. The compiler doesn't understand that. There could be any manner of thing stored in those places that have nothing to do with strings.
"But myChar is just a pointer?"
That is the reason why people think that the assignment should be a natural thing -- but the fundamental premise is wrong. myChar is not a pointer. It is an array. A name which refers to an array will decay into a pointer at the drop of a hat, but an array is not a pointer.

C++ Swap string

I am trying to create a non-recursive method to swap a c-style string. It throws an exception in the Swap method. Could not figure out the problem.
void Swap(char *a, char* b)
{
char temp;
temp = *a;
*a = *b;
*b = temp;
}
void Reverse_String(char * str, int length)
{
for(int i=0 ; i <= length/2; i++) //do till the middle
{
Swap(str+i, str+length - i);
}
}
EDIT: I know there are fancier ways to do this. But since I'm learning, would like to know the problem with the code.
It throws an exception in the Swap method. Could not figure out the problem.
No it doesn't. Creating a temporary character and assigning characters can not possibly throw an exception. You might have an access violation, though, if your pointers don't point to blocks of memory you own.
The Reverse_String() function looks OK, assuming str points to at least length bytes of writable memory. There's not enough context in your question to extrapolate past that. I suspect you are passing invalid parameters. You'll need to show how you call Reverse_String() for us to determine if the call is valid or not.
If you are writing something like this:
char * str = "Foo";
Reverse_String(str, 3);
printf("Reversed: '%s'.\n", str);
Then you will definitely get an access violation, because str points to read-only memory. Try the following syntax instead:
char str[] = "Foo";
Reverse_String(str, 3);
printf("Reversed: '%s'.\n", str);
This will actually make a copy of the "Foo" string into a local buffer you can overwrite.
This answer refers to the comment by #user963018 made under #André Caron's answer (it's too long to be a comment).
char *str = "Foo";
The above declares a pointer to the first element of an array of char. The array is 4 characters long, 3 for F, o & o and 1 for a terminating NULL character. The array itself is stored in memory marked as read-only; which is why you were getting the access violation. In fact, in C++, your declaration is deprecated (it is allowed for backward compatibility to C) and your compiler should be warning you as such. If it isn't, try turning up the warning level. You should be using the following declaration:
const char *str = "Foo";
Now, the declaration indicates that str should not be used to modify whatever it is pointing to, and the compiler will complain if you attempt to do so.
char str[] = "Foo";
This declaration states that str is a array of 4 characters (including the NULL character). The difference here is that str is of type char[N] (where N == 4), not char *. However, str can decay to a pointer type if the context demands it, so you can pass it to the Swap function which expects a char *. Also, the memory containing Foo is no longer marked read-only, so you can modify it.
std::string str( "Foo" );
This declares an object of type std::string that contains the string "Foo". The memory that contains the string is dynamically allocated by the string object as required (some implementations may contain a small private buffer for small string optimization, but forget that for now). If you have string whose size may vary, or whose size you do not know at compile time, it is best to use std::string.