Assigning strings of any size to a pointer to char - c++

Before all, I must state that I'm a beginner with C++ and programming overall.
I'll get straight to the point. I'm wondering if it's possible to assign a string of characters of any size to a pointer to a character (not arrays, just a char * pointer). Would that violate any Memory Addresses?
The book I'm learning from doesn't seem to say anything about that. I can't seem to find anything on Google either.

You have your character pointer and want to dynamically create C strings
char *str;
say. This pointer will be used to point to the first character of the string. The string is a series of sequential characters (bytes) in memory. What we what to achieve this in memory:
str -> +---+---+---+---+---+----+
| H | E | L | L | O | \0 |
+---+---+---+---+---+----+
Note the final byte - This byte has the value 0 and is call the null character - it represents the end of the string and enables one to easilty know when we have come to the end.
To give str a value ne allocate this memory. In C++ this is done by the new operator like this
str = new char[6];
Note new has two versions new[] and new - one is to allocate an array of object, the other is to allocate a single object. ALWAYS use delete[] when you have allocated it with new[], similarly new/delete should be used. DO NOT MIX new[] with delete, and new with delete[]
This will allocate an array of 6 characters to place the string into. To place the characters into the string we cold do this.
str[0] = `H`;
str[1] = `E];
...
str[5] = 0;
But this would be tedious. Instead we can use strcpy to do this for us:
strcpy(str, "hello");
It knows all about the null character. There is a range of functions that operate on these types of strings - please see string
This is C strings. Once upon a time somebody invented this new language called C++. This language uses a different idea called objects that makes this stuff a lot easier. You need to look at the standard template library (or STL). Notes on these strings can be found at string. There is lots of goodies in the STL - here is a reference STL
Hope this helps

A char pointer can point to a string of any length, because the length of the string is determined by when you run into a NUL (0) byte in the string. When you store strings this way, it becomes a C-string. For instance:
const char* str = NULL; // at this point,
// doesn't point to anything (not even a string)
str = ""; // valid
str = "a"; // valid
str = "hello"; // valid
str = "farewell, cruel world"; // valid

Related

How to code a strcat function that works with two dynamic arrays

As we know, the strcat function concatinates one c-string onto another to make one big c-string containing two others.
My question is how to make a strcat function that works with two dynamically allocated arrays.
The desired strcat function should be able to work for any sized myStr1 and myStr2
//dynamic c-string array 1
char* myStr1 = new char [26];
strcpy(myStr1, "The dog on the farm goes ");
//dynamic c-string array 2
char* myStr2 = new char [6];
strcpy(myStr2, "bark.");
//desired function
strcat(myStr1,myStr2);
cout<<myStr1; //would output 'The dog on the farm goes bark.'
This is as far as I was able to get on my own:
//*& indicates that the dynamic c-string str1 is passed by reference
void strcat(char*& str1, char* str2)
{
int size1 = strlen(str1);
int size2 = strlen(str2);
//unknown code
//str1 = new char [size1+size2]; //Would wipe out str1's original contents
}
Thanks!
You need first to understand better how pointers work. Your code for example:
char* myStr1 = new char [25];
myStr1 = "The dog on the farm goes ";
first allocates 25 characters, then ignores the pointer to that allocated area (the technical term is "leaks it") and sets myStr1 to point to a string literal.
That code should have used strcpy instead to copy from the string literal into the allocated area. Except that the string is 25 characters so you will need to allocate space for at least 26 as one is needed for the ASCII NUL terminator (0x00).
Correct code for that part should have been:
char* myStr1 = new char [26]; // One more than the actual string length
strcpy(myStr1, "The dog on the farm goes ");
To do the concatenation of C strings the algorithm could be:
measure the lengths n1 and n2 of the two strings (with strlen)
allocate n1+n2+1 charaters for the destination buffer (+1 is needed for the C string terminator)
strcpy the first string at the start of the buffer
strcat the second string to the buffer (*)
delete[] the memory for the original string buffers if they are not needed (if this is the right thing to do or not depends on who is the "owner" of the strings... this part is tricky as the C string interface doesn't specify that).
(*) This is not the most efficient way. strcat will go through all the characters of the string to find where it ends, but you already know that the first string length is n1 and the concatenation could be done instead with strcpy too by choosing the correct start as buffer+n1. Even better instead of strcpy you could use memcpy everywhere if you know the count as strcpy will have to check each character for being the NUL terminator. Before getting into this kind of optimization however you should understand clearly how things work... only once the string concatenation code is correct and for you totally obvious you are authorized to even start thinking about optimization.
PS: Once you get all this correct and working and efficient you will appreciate how much of a simplification is to use std::string objects instead, where all this convoluted code becomes just s1+s2.
You allocate memory and make your pointers point to that memory. Then you overwrite the pointers, making them point somewhere else. The assignment of e.g. myStr1 causes the variable to point to the string literal instead of the memory you allocated. You need to copy the strings into the memory you have allocated.
Of course, that copying will lead to another problem, as you seem to forget that C-strings need an extra character for the terminator. So a C-string with 5 characters needs space for six characters.
As for your concatenation function, you need to do copying here too. Allocate enough space for both strings plus a single terminator character. Then copy the first string into the beginning of the new memory, and copy the second string into the end.
Also you need a temporary pointer variable for the memory you allocate, as you otherwise "would wipe out str1's original contents" (not strictly true, you just make str1 point somewhere else, losing the original pointer).

char*/string concatenation without copying?

I would like to concatenate 2 strings in C or C++ without new memory allocation and copying. Is it possible?
Possible C code:
char* str1 = (char*)malloc(100);
char* str2 = (char*)malloc(50);
char* str3 = /* some code that concatenates these 2 strings
without copying to occupy a continuous memory region */
Then, when I don't need them any more, I just do:
free(str1);
free(str2);
Or if possible, I would like to achieve the same in C++, using std::string or maybe char*, but using new and delete (possibly void operator delete ( void* ptr, std::size_t sz ) operator (C++14) on the str3).
There are a lot of questions about strings concatenation, but I haven't found one that asks the same.
No, it is not possible
In C, malloc operations return blocks of memory that have no relationship to each other. But in C, strings must be a continuous array of bytes. So there is no way to extend str1 without copying, let alone concatenate.
For C++, perhaps ropes may be of interest: See this answer.
Ropes are allocated in chunks that do not have to be contiguous. This supports O(1) concatenation. However, the accessors make it appear as a single string of bytes. I'm certain that to convert ropes back to std::string or C style strings will take a copy however, but this is probably the closest to what you want.
Also, it is probably a premature optimization to worry about the costs of copying a few strings around. Unless you are moving lots of data, it won't matter
Text concatenation is possible by writing your own string data structure. Easier in C++ than C.
struct My_String
{
std::vector<char *> text_fragments;
};
You would have to implement all the text manipulation and searching algorithms based on this data structure. Nothing in the C library could be applied to the My_String structure. The std::string in C++ would not be compatible.
One of the issues is how to handle text modification. If one of the text fragments is a constant literal (that can't be modified), it would need to be copied before it could be modified. But copying is against the requirements. :-(
A "string" in C is a an array of chars with a null char at the end. And an array is "a data structure that lets you store one or more elements consecutively in memory". GNU C reference
You cannot concatenate two arrays that are not in consecutive memory blocks without copying one of them. You can do it however without allocating new memory. E.g.
char* str1 = malloc(100); // size 100 bytes, uninitialised
str1[0] = '\0'; // string length 0, size of str1 100
strcat(str1, "a"); // string length 1, size of str1 still 100
strcat(str1, "b"); // string length 2, size of str1 still 100
You could if you want retrieve chars of 2 strings as if they were one without copying or reallocating. Here is an example function to do that (simple example, don't use in production code)
char* str1 = (char*)malloc(100);
char* str2 = (char*)malloc(50);
char get_char(int i) {
if (i > 0 && i < 100) {
return str1[i];
}
if (i >= 100 && i < 150) {
return str2[i-100];
}
return 0;
}
But in such a case you couldn't have a char* str3 to perform pointer arithmetic with and access all 150 chars.
Tags C and C++ are contradictory. In C, I'd recommend exploring realloc. You can code something along following lines:
char* str = malloc(50);
str = realloc(ptr, 55);
If you are lucky, the realloc call will not reallocate new memory and just 'extened' the already allocated segment, but there is no guarantee for this. This way you at at least have a shot of avoiding reallocations of the string. You will still have to copy contents of the second string into neweley allocated memory.

confusion about char pointer in c++

I'm new in c++ language and I am trying to understand the pointers concept.
I have a basic question regarding the char pointer,
What I know is that the pointer is a variable that stores an address value,
so when I write sth like this:
char * ptr = "hello";
From my basic knowledge, I think that after = there should be an address to be assigned to the pointer, but here we assign "hello" which is set of chars.
So what does that mean ?
Is the pointer ptr points to an address that stores "hello"? or does it store the hello itself?
Im so confused, hope you guys can help me..
Thanks in advance.
ptr holds the address to where the literal "hello" is stored at. In this case, it points to a string literal. It's an immutable array of characters located in static (most commonly read-only) memory.
You can make ptr point to something else by re-assigning it, but before you do, modifying the contents is illegal. (its type is actually const char*, the conversion to char* is deprecated (and even illegal in C++11) for C compatibility.
Because of this guarantee, the compiler is free to optimize for space, so
char * ptr = "hello";
char * ptr1 = "hello";
might yield two equal pointers. (i.e. ptr == ptr1)
The pointer is pointing to the address where "hello" is stored. More precisely it is pointing the 'h' in the "hello".
"hello" is a string literal: a static array of characters. Like all arrays, it can be converted to a pointer to its first element, if it's used in a context that requires a pointer.
However, the array is constant, so assigning it to char* (rather than const char*) is a very bad idea. You'll get undefined behaviour (typically an access violation) if you try to use that pointer to modify the string.
The compiler will "find somewhere" that it can put the string "hello", and the ptr will have the address of that "somewhere".
When you create a new char* by assigning it a string literal, what happens is char* gets assigned the address of the literal. So the actual value of char* might be 0x87F2F1A6 (some hex-address value). The char* points to the start (in this case the first char) of the string. In C and C++, all strings are terminated with a /0, this is how the system knows it has reached the end of the String.
char* text = "Hello!" can be thought of as the following:
At program start, you create an array of chars, 7 in length:
{'H','e','l','l','o','!','\0'}. The last one is the null character and shows that there aren't any more characters after it. [It's more efficient than keeping a count associated with the string... A count would take up perhaps 4 bytes for a 32-bit integer, while the null character is just a single byte, or two bytes if you're using Unicode strings. Plus it's less confusing to have a single array ending in the null character than to have to manage an array of characters and a counting variable at the same time.]
The difference between creating an array and making a string constant is that an array is editable and a string constant (or 'string literal') is not. Trying to set a value in a string literal causes problems: they are read-only.
Then, whenever you call the statement char* text = "Hello!", you take the address of that initial array and stick it into the variable text. Note that if you have something like this...
char* text1 = "Hello!";
char* text2 = "Hello!";
char* text3 = "Hello!";
...then it's quite possible that you're creating three separate arrays of {'H','e','l','l','o','!','\0'}, so it would be more efficient to do this...
char* _text = "Hello!";
char* text1 = _text;
char* text2 = _text;
char* text3 = _text;
Most compilers are smart enough to only initialize one string constant automatically, but some will only do that if you manually turn on certain optimization features.
Another note: from my experience, using delete [] on a pointer to a string literal doesn't cause issues, but it's unnecessary since as far as I know it doesn't actually delete it.

What is a char*?

Why do we need the *?
char* test = "testing";
From what I understood, we only apply * onto addresses.
This is a char:
char c = 't';
It can only hold one character!
This is a C-string:
char s[] = "test";
It can hold multiple characters. Another way to write the above is:
char s[] = {'t', 'e', 's', 't', 0};
The 0 at the end is called the NUL terminator. It denotes the end of a C-string.
A char* stores the starting memory location of a C-string.1 For example, we can use it to refer to the same array s that we defined above. We do this by setting our char* to the memory location of the first element of s:
char* p = &(s[0]);
The & operator gives us the memory location of s[0].
Here is a shorter way to write the above:
char* p = s;
Notice:
*(p + 0) == 't'
*(p + 1) == 'e'
*(p + 2) == 's'
*(p + 3) == 't'
*(p + 4) == 0 // NUL
Or, alternatively:
p[0] == 't'
p[1] == 'e'
p[2] == 's'
p[3] == 't'
p[4] == 0 // NUL
Another common usage of char* is to refer to the memory location of a string literal:
const char* myStringLiteral = "test";
Warning: This string literal should not be changed at runtime. We use const to warn the programmer (and compiler) not to modify myStringLiteral in the following illegal manner:
myStringLiteral[0] = 'b'; // Illegal! Do not do this for const char*!
This is different from the array s above, which we are allowed to modify. This is because the string literal "test" is automatically copied into the array at initialization phase. But with myStringLiteral, no such copying occurs. (Where would we copy to, anyways? There's no array to hold our data... just a lonely char*!)
1 Technical note: char* merely stores a memory location to things of type char. It can certainly refer to just a single char. However, it is much more common to use char* to refer to C-strings, which are NUL-terminated character sequences, as shown above.
The char type can only represent a single character. When you have a sequence of characters, they are piled next to each other in memory, and the location of the first character in that sequence is returned (assigned to test). Test is nothing more than a pointer to the memory location of the first character in "testing", saying that the type it points to is a char.
You can do one of two things:
char *test = "testing";
or:
char test[] = "testing";
Or, a few variations on those themes like:
char const *test = "testing";
I mention this primarily because it's the one you usually really want.
The bottom line, however, is that char x; will only define a single character. If you want a string of characters, you have to define an array of char or a pointer to char (which you'll initialize with a string literal, as above, more often than not).
There are real differences between the first two options though. char *test=... defines a pointer named test, which is initialized to point to a string literal. The string literal itself is allocated statically (typically right along with the code for your program), and you're not supposed to (attempt to) modify it -- thus the preference for char const *.
The char test[] = .. allocates an array. If it's a global, it's pretty similar to the previous except that it does not allocate a separate space for the pointer to the string literal -- rather, test becomes the name attached to the string literal itself.
If you do this as a local variable, test will still refer directly to the string literal - but since it's a local variable, it allocates "auto" storage (typically on the stack), which gets initialized (usually from a normal, statically allocated string literal) on every entry to the block/scope where it's defined.
The latter versions (with an array of char) can act deceptively similar to a pointer, because the name of an array will decay to the address of the beginning of the array anytime you pass it to a function. There are differences though. You can modify the array, but modifying a string literal gives undefined behavior. Conversely, you can change the pointer to point at some other chars, so something like:
char *test = "testing";
if (whatever)
test = "not testing any more";
...is perfectly fine, but trying to do the same with an array won't work (arrays aren't assignable).
The main thing people forgot to mention is that "testing" is an array of chars in memory, there's no such thing as primitive string type in c++. Therefore as with any other array, you can't reference it as if it is an element.
char* represents the address of the beginning of the contiguous block of memory of char's. You need it as you are not using a single char variable you are addressing a whole array of char's
When accessing this, functions will take the address of the first char and step through the memory. This is possible as arrays use contiguous memory (i.e. all of the memory is consecutive in memory).
Hope this clears things up! :)
Using a * says that this variable points to a location in memory. In this case, it is pointing to the location of the string "testing". With a char pointer, you are not limited to just single characters, because now you have more space available to you.
In C a array is represented by a pointer to the first element in it.

Difference between using character pointers and character arrays

Basic question.
char new_str[]="";
char * newstr;
If I have to concatenate some data into it or use string functions like strcat/substr/strcpy, what's the difference between the two?
I understand I have to allocate memory to the char * approach (Line #2). I'm not really sure how though.
And const char * and string literals are the same?
I need to know more on this. Can someone point to some nice exhaustive content/material?
The excellent source to clear up the confusion is Peter Van der Linden, Expert C Programming, Deep C secrets - that arrays and pointers are not the same is how they are addressed in memory.
With an array, char new_str[]; the compiler has given the new_str a memory address that is known at both compilation and runtime, e.g. 0x1234, hence the indexing of the new_str is simple by using []. For example new_str[4], at runtime, the code picks the address of where new_str resides in, e.g. 0x1234 (that is the address in physical memory). by adding the index specifier [4] to it, 0x1234 + 0x4, the value can then be retrieved.
Whereas, with a pointer, the compiler gives the symbol char *newstr an address e.g. 0x9876, but at runtime, that address used, is an indirect addressing scheme. Supposing that newstr was malloc'd newstr = malloc(10);, what is happening is that, everytime a reference in the code is made to use newstr, since the address of newstr is known by the compiler i.e. 0x9876, but what is newstr pointing to is variable. At runtime, the code fetches data from physical memory 0x9876 (i.e. newstr), but at that address is, another memory address (since we malloc'd it), e.g 0x8765 it is here, the code fetches the data from that memory address that malloc assigned to newstr, i.e. 0x8765.
The char new_str[] and char *newstr are used interchangeably, since an zeroth element index of the array decays into a pointer and that explains why you could newstr[5] or *(newstr + 5) Notice how the pointer expression is used even though we have declared char *newstr, hence *(new_str + 1) = *newstr; OR *(new_str + 1) = newstr[1];
In summary, the real difference between the two is how they are accessed in memory.
Get the book and read it and live it and breathe it. Its a brilliant book! :)
Please go through this article below:
Also see in case of array of char like in your case, char new_str[] then the new_str will always point to the base of the array. The pointer in itself can't be incremented. Yes you can use subscripts to access the next char in array eg: new_str[3];
But in case of pointer to char, the pointer can be incremented new_str++ to fetch you the next character in the array.
Also I would suggest this article for more clarity.
This is a character array:
char buf [1000];
So, for example, this makes no sense:
buf = &some_other_buf;
This is because buf, though it has characteristics of type pointer, it is already pointing to the only place that makes sense for it.
char *ptr;
On the other hand, ptr is only a pointer, and may point somewhere. Most often, it's something like this:
ptr = buf; // #1: point to the beginning of buf, same as &buf[0]
or maybe this:
ptr = malloc (1000); // #2: allocate heap and point to it
or:
ptr = "abcdefghijklmn"; // #3: string constant
For all of these, *ptr can be written to—except the third case where some compiling environment define string constants to be unwritable.
*ptr++ = 'h'; // writes into #1: buf[0], #2: first byte of heap, or
// #3 overwrites "a"
strcpy (ptr, "ello"); // finishes writing hello and adds a NUL
The difference is that one is a pointer, the other is an array. You can, for instance, sizeof() array. You may be interested in peeking here
If you're using C++ as your tags indicate, you really should be using the C++ strings, not the C char arrays.
The string type makes manipulating strings a lot easier.
If you're stuck with char arrays for some reason, the line:
char new_str[] = "";
allocates 1 byte of space and puts a null terminator character into it. It's subtly different from:
char *new_str = "";
since that may give you a reference to non-writable memory. The statement:
char *new_str;
on its own gives you a pointer but nothing that it points to. It can also have a random value if it's local to a function.
What people tend to do (in C rather than C++) is to do something like:
char *new_str = malloc (100); // (remember that this has to be freed) or
char new_str[100];
to get enough space.
If you use the str... functions, you're basically responsible for ensuring that you have enough space in the char array, lest you get all sorts of weird and wonderful practice at debugging code. If you use real C++ strings, a lot of the grunt work is done for you.
The type of the first is char[1], the second is char *. Different types.
Allocate memory for the latter with malloc in C, or new in C++.
char foo[] = "Bar"; // Allocates 4 bytes and fills them with
// 'B', 'a', 'r', '\0'.
The size here is implied from the initializer string.
The contents of foo are mutable. You can change foo[i] for example where i = 0..3.
OTOH if you do:
char *foo = "Bar";
The compiler now allocates a static string "Bar" in readonly memory and cannot be modified.
foo[i] = 'X'; // is now undefined.
char new_str[]="abcd";
This specifies an array of characters (a string) of size 5 bytes (one byte for each character plus one for the null terminator). So it stores the string 'abcd' in memory and we can access this string using the variable new_str.
char *new_str="abcd";
This specifies a string 'abcd' is stored somewhere in the memory and the pointer new_str points to the first character of that string.
To differentiate them in the memory allocation side:
// With char array, "hello" is allocated on stack
char s[] = "hello";
// With char pointer, "hello" is stored in the read-only data segment in C++'s memory layout.
char *s = "hello";
// To allocate a string on heap, malloc 6 bytes, due to a NUL byte in the end
char *s = malloc(6);
s = "hello";
If you're in c++ why not use std::string for all your string needs? Especially anything dealing with concatenation. This will save you from a lot of problems.