I'm trying to concatenate two strings into a new one (finalString) like this:
finalString = string1 + '&' + string2
Firstly, I allocate the memory for finalString, then i use strcat().
finalString = new char[strlen(string1 ) + strlen(string2) + 2];
cout << finalString << endl;
finalString = strcat(finalString , string1 );
finalString = strcat(finalString , "&");
finalString = strcat(finalString , string2);
cout << finalString << endl;
I'll suppose that string1 is "Mixt" and string2 is "Supermarket".
The output looks like this:
═════════════════řřřř //(which has 21 characters)
═════════════════řřřřMixt&Supermarket
I know that if I use round brackets in "new char" the string will be initialized to 0 and I'll get the desired result, but my question is why does the first output has 21 characters, supposing that I allocated only 17. And even so, why does the final string length exceed the initial allocation size (21 > 17) ?
Thanks in advance!
Two words for you "buffer overrun"
The reason you have 21 characters initially is because there is a '/0' (also called null) character 22 characters away from the memory address that finalString points to. This may or may not be consistent based on what is in your memory.
As for the reason why you have a longer than what you wanted again you wrote outside the initial buffer into random memory. You did not crash because you did not write over something important.
strcat will take the memory address given, find the first '/0' it finds and from that place on it will copy the data from the second memory pointer you provide until the first '/0' it finds there.
What you are doing is VERY DANGEROUS, if you do not hit a /0' before you hit something vital you will cause a crash or at minimum bad behavior.
Undersand in C/C++ a char[] is just a pointer to the initial memory location of the first element. THERE ARE NO SAFEGUARDS! You alone must be careful with that..
if you set the first character of the finalString[0] = 0 then you the logic will work better.
As a different answer, why not use std::string:
std::string a, b, c;
a = "part1";
b = "part2";
c = a + " & " + b;
std::cout << c << '\n';
part1 & part2
Live example: http://ideone.com/pjqz9T
It will make your life easier! You should always look to use stl types with c++.
If you really do need a char * then at the end you can do c.c_str().
Your string is not initialized which leads to undefined behavior. In strcat, string will be appended when it finds the null character.
So, as others already mentioned, either you can do
finalString[0] = 0;
or in place of your first strcat use strcpy. This will copy the first string and put a null character at the end.
why 21 characters?
This is due to undefined behavior. It will keep on printing until it won't find a null or else it will crash as soon as it tries to access any illegal memory.
Related
I'm stuck on an assignment which converts contents of an array (input from the user) to a pre-declared shorthand.
I want it to be as simple as strcpy(" and ", "+");
to change the word 'and' within a string, to a '+' sign.
Unfortunately, no matter how I structure the function; I get a deprecated conversion warning (variant loops, and direct applications, attempted).
Side note; this is assignment based, so my string shortcuts are severely limited, and no pointers (I've seen several versions of clearing the fault using them).
I'm not looking for someone to do my homework; just guidance on how strcpy can be applied without creating the dep. warning. Perhaps I shouldn't be using strcpy at all?
strcpy copies the contents of the second string into the memory of the first string. Since you're copying a string literal into a string literal it can't do it (you can't write to a string literal) and so it complains.
Instead you need to build your own search and replace system. You can use strstr() to search for a substring within a string, and it returns the pointer in memory to the start of that found string (if it's found).
Let's take the sample string Jack and Jill went up the hill.
char *andloc = strstr(buffer, " and ");
That would return the address of the start of the string (say 0x100) plus the offset of the word " and " (including spaces) within it (0x100 + 4) which would be 0x104.
Then, if found, you can replace it with the & symbol. However you can't use strcpy for that as it'll terminate the string. Instead you can set the bytes manually, or use memcpy:
if (andloc != NULL) { // it's been found
andloc[1] = '&';
andloc[2] = ' ';
}
or:
if (andloc != NULL) { // it's been found
memcpy(andloc, " & ", 3);
}
That would result in Jack & d Jill went up the hill. We're not quite there yet. Next you have to shuffle everything down to cover the "d " from the old " and ". For that you'd think you could now use strcpy or memcpy, however that's not possible - the strings (source and destination) overlap, and the manual pages for both specifically state that the strings must not overlap and to use memmove instead.
So you can move the contents of the string after the "d " to after the "& " instead:
memmove(andloc + 3, andloc + 5, strlen(andloc + 5) + 1);
Adding a number to a string like that adds to the address of the pointer. So we're looking at copying the data from 5 characters further on in the string that the old "and" location into a space starting at 3 characters on from the start of the old "and" location. The amount to copy is the length of the string from 5 characters on from the start of the "and" location plus one so it copies the NULL character at the end of the string.
Another manual way of doing it would be to iterate through each character until you find the end of the string:
char *to = andloc + 3;
char *from = andloc + 5;
while (*from) { // Until the end of the string
*to = *from; // Copy one character
to++; // Move to the ...
from++; // ... next character pair
}
*to = 0; // Add the end of string marker.
So now either way the string memory contains:
Jack & Jill went up the hill\0l\0
The \0 is the end of string marker, so the actual string "content" is only up as far as the first \0 and the l\0 is now ignored.
Note that this only works if you are replacing a part with something that is smaller. If you are replacing it with something bigger, so the string grows in size, you will be forced to use memmove, which first copies the content to a scratchpad, and ensure that your buffer has enough room in it to store the finished string (this kind of thing is often a big source of "buffer overruns" which are a security headache and one of the biggest causes of systems being hacked). Also you have to do the whole thing backwards - move the latter part of the string first to make room, then modify the gap between the two halves.
Firstly, Sorry about my bad english.
I wanna ask something that I expect amazing. I'm not sure this is amazing for everyone, but It is for me :)
Let me give example code
char Text[9] = "Sandrine";
for(char *Ptr = Text; *Ptr != '\0'; ++Ptr)
cout << Ptr << endl;
This code prints
Sandrine
andrine
ndrine
drine
rine
ine
ne
e
I know it's a complicated issue in C++. Why İf I call Ptr to print out screen it prints all of array. However if Text array is a dynamic array, Ptr prints only first case of dynamic array(Text). Why do it happen? Please explain C++ array that how it goes for combination of pointing array.
thanks for helping.
There is nothing particular special about arrays here. Instead, the special behavior is for char const*: in C, pointers to a sequence of characters with a terminating null characters are used to represent strings. C++ inherited this notion of strings in the form of string literals. To support output of these strings, the output operator for char const* interprets a pointer to a char to be actually a pointer to the start of a string and prints the sequence up to the first null character.
When you write
char Text[9] = "Sandrine";
the "Text" is an address in memory, it is the starting address of your string and in its first location there is a 'S' followed by the rest of the characters. A string in C is delimited by a \0 i.e. "S a n d r i n e \0"
When you write
for(char *Ptr = Text; *Ptr != '\0'; ++Ptr)
cout << Ptr << endl;
when the for loop runs the first time it prints the whole string because Ptr points to the start of the string char* Ptr = Text when you increment Ptr
you are pointing to the next character Text + 1 i.e. 'a' and so on once Ptr finds \0 the for loop quits.
I have a fixed length character array I want to assign to a string. The problem comes if the character array is full, the assign fails. I thought of using the assign where you can supply n however that ignores \0s. For example:
std::string str;
char test1[4] = {'T', 'e', 's', 't'};
str.assign(test1); // BAD "Test2" (or some random extra characters)
str.assign(test1, 4); // GOOD "Test"
size_t len = strlen(test1); // BAD 5
char test2[4] = {'T', 'e', '\0', 't'};
str.assign(test2); // GOOD "Te"
str.assign(test2, 4); // BAD "Tet"
size_t len = strlen(test2); // GOOD 2
How can I assign a fixed length character array to a string correctly for both cases?
Use the "pair of iterators" form of assign.
str.assign(test1, std::find(test1, test1 + 4, '\0'));
Character buffers in C++ are either-or: either they are null terminated or they are not (and fixed-length). Mixing them in the way you do is thus not recommended. If you absolutely need this, there seems to be no alternative to manual copying until either the maximum length or a null terminator is reached.
for (char const* i = test1; i != test1 + length and *i != '\0'; ++i)
str += *i;
You want both NULL termination and fixed length? This is highly unusual and not recommended. You'll have to write your own function and push_back each individual character.
For the first case, when you do str.assign(test1) and str.assign(test2), you have to have /0 in your array, otherwise this is not a "char*" string and you can't assign it to std::string like this.
saw your serialization comment -- use std::vector<char>, std::array<char,4>, or just a 4 char array or container.
Your second 'bad' example - the one which prints out "Tet" - actually does work, but you have to be careful about how you check it:
str.assign(test2, 4); // BAD "Tet"
cout << "\"" << str << "\"" << endl;
does copy exactly four characters. If you run it through octal dump(od) on Linux say, using my.exe | od -c you'd get:
0000000 " T e \0 t " \n
0000007
I am trying to iterate through a string and copy chunks of information based off of an initial key value and a key value that identifies the end of the chunk of info. However when I try to subtract my initial and final values to find the length of the chunk im looking for, I receive a seemingly arbitrary value.
So the start and end indicies are found by:
currentstringlocation = mystring.find("value_im_looking_to_start_at, 0);
endlocation = mystring.find("value_im_looking_to_stop_at", currentstringlocation);
I'm then trying to do something like:
mystring.copy(newstring,(endlocation-currentlocation), currentlocation);
This however isn't giving me the results I want. Here's an excerpt from my code and the output it yields.
stringlocation2=topoinfo.find("\n",stringlocation+11);
topoinfo.copy(address,(stringlocation2-stringlocation+11),stringlocation+11);
cout << (stringlocation2-stringlocation+11) << "\n";
cout << stringlocation2 << "\t" << stringlocation+11 << "\n";
output:
25
59 56
So clearly the chunk of info I'm trying to capture spans 3 characters, however when I subtract the two I get 25. Can someone explain to me why this happens and how I can work around it?
You are calculating the length wrong, try instead something like:
topoinfo.copy(address, stringlocation2 - (stringlocaion + 11),
stringlocation + 11);
After this, address will contain the copied string. Remember though: If address is a character array or a character pointer, then you should add the terminating '\0' character yourself!
A better solution to get a substring is to actually use the std::string::substr function:
std::string address = topoinfo.substr(stringlocation + 11,
stringlocation2 - (stringlocaion + 11));
Should be
topoinfo.copy(address,stringlocation2-(stringlocation+11),stringlocation+11);
cout << stringlocation2-(stringlocation+11) << "\n";
You got your brackets wrong.
I am a student learning C++, and I am trying to understand how null-terminated character arrays work. Suppose I define a char array like so:
char* str1 = "hello world";
As expected, strlen(str1) is equal to 11, and it is null-terminated.
Where does C++ put the null terminator, if all 11 elements of the above char array are filled with the characters "hello world"? Is it actually allocating an array of length 12 instead of 11, with the 12th character being '\0'? CPlusPlus.com seems to suggest that one of the 11 would need to be '\0', unless it is indeed allocating 12.
Suppose I do the following:
// Create a new char array
char* str2 = (char*) malloc( strlen(str1) );
// Copy the first one to the second one
strncpy( str2, str1, strlen(str1) );
// Output the second one
cout << "Str2: " << str2 << endl;
This outputs Str2: hello worldatcomY╗°g♠↕, which I assume is C++ reading the memory at the location pointed to by the pointer char* str2 until it encounters what it interprets to be a null character.
However, if I then do this:
// Null-terminate the second one
str2[strlen(str1)] = '\0';
// Output the second one again
cout << "Terminated Str2: " << str2 << endl;
It outputs Terminated Str2: hello world as expected.
But doesn't writing to str2[11] imply that we are writing outside of the allocated memory space of str2, since str2[11] is the 12th byte, but we only allocated 11 bytes?
Running this code does not seem to cause any compiler warnings or run-time errors. Is this safe to do in practice? Would it be better to use malloc( strlen(str1) + 1 ) instead of malloc( strlen(str1) )?
In the case of a string literal the compiler is actually reserving an extra char element for the \0 element.
// Create a new char array
char* str2 = (char*) malloc( strlen(str1) );
This is a common mistake new C programmers make. When allocating the storage for a char* you need to allocate the number of characters + 1 more to store the \0. Not allocating the extra storage here means this line is also illegal
// Null-terminate the second one
str2[strlen(str1)] = '\0';
Here you're actually writing past the end of the memory you allocated. When allocating X elements the last legal byte you can access is the memory address offset by X - 1. Writing to the X element causes undefined behavior. It will often work but is a ticking time bomb.
The proper way to write this is as follows
size_t size = strlen(str1) + sizeof(char);
char* str2 = (char*) malloc(size);
strncpy( str2, str1, size);
// Output the second one
cout << "Str2: " << str2 << endl;
In this example the str2[size - 1] = '\0' isn't actually needed. The strncpy function will fill all extra spaces with the null terminator. Here there are only size - 1 elements in str1 so the final element in the array is unneeded and will be filled with \0
Is it actually allocating an array of length 12 instead of 11, with the 12th character being '\0'?
Yes.
But doesn't writing to str2[11] imply that we are writing outside of the allocated memory space of str2, since str2[11] is the 12th byte, but we only allocated 11 bytes?
Yes.
Would it be better to use malloc( strlen(str1) + 1 ) instead of malloc( strlen(str1) )?
Yes, because the second form is not long enough to copy the string into.
Running this code does not seem to cause any compiler warnings or run-time errors.
Detecting this in all but the simplest cases is a very difficult problem. So the compiler authors simply don't bother.
This sort of complexity is exactly why you should be using std::string rather than raw C-style strings if you are writing C++. It's as simple as this:
std::string str1 = "hello world";
std::string str2 = str1;
The literal "hello world" is a char array that looks like:
{ 'h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd', '\0' }
So, yes, the literal is 12 chars in size.
Also, malloc( strlen(str1) ) is allocating memory for 1 less byte than is needed, since strlen returns the length of the string, not including the NUL terminator. Writing to str[strlen(str1)] is writing 1 byte past the amount of memory that you've allocated.
Your compiler won't tell you that, but if you run your program through valgrind or a similar program available on your system it'll tell you if you're accessing memory you shouldn't be.
I think you are confused by the return value of strlen. It returns the length of the string, and it should not be confused with the size of the array that holds the string. Consider this example :
char* str = "Hello\0 world";
I added a null character in the middle of the string, which is perfectly valid. Here the array will have a length of 13 (12 characters + the final null character), but strlen(str) will return 5, because there are 5 characters before the first null character. strlen just counts the characters until a null character is found.
So if I use your code :
char* str1 = "Hello\0 world";
char* str2 = (char*) malloc(strlen(str1)); // strlen(str1) will return 5
strncpy(str2, str1, strlen(str1));
cout << "Str2: " << str2 << endl;
The str2 array will have a length of 5, and won't be terminated by a null character (because strlen doesn't count it). Is this what you expected?
For a standard C string the length of the array that is storing the string is always one character longer then the length of the string in characters. So your "hello world" string has a string length of 11 but requires a backing array with 12 entries.
The reason for this is simply the way those string are read. The functions handling those strings basically read the characters of the string one by one until they find the termination character '\0' and stop at this point. If this character is missing those functions just keep reading the memory until they either hit a protected memory area that causes the host operating system to kill your application or until they find the termination character.
Also if you initialize a character array with the length 11 and write the string "hello world" into it will yield massive problems. Because the array is expected to hold at least 12 characters. That means the byte that follows the array in the memory is overwritten. Resulting in unpredictable side effects.
Also while you are working with C++, you might want to look into std:string. This class is accessible if you are using C++ and provides better handling of strings. It might be worth looking into that.
I think what you need to know is that char arrays starts from 0 and goes until array length-1 and on position array length has the terminator('\0').
In your case:
str1[0] == 'h';
str1[10] == 'd';
str1[11] == '\0';
This is why is correct str2[strlen(str1)] = '\0';
The problem with the output after the strncpy is because it copys 11 elements(0..10) so you need to put manually the terminator(str2[11] = '\0').