std::string resize is ruining compare operator (==) - c++

std::string resize causes strings that appear to be equal to no longer be equal. It can appear to be misleading when I hover over the variable in my debugger and they appear to hold the same value.
I think it comes down to the fact that I expected the == operator to stop at the first null character but it keeps going till the end of the size. I'm sure this is working as intended but I was stuck on an issue caused by this for a while so I wanted to see why you would keep comparing characters even after the first null character. thanks!
int main(void)
{
std::string test1;
test1.resize(10);
test1[0] = 'a';
std::string test2 = "a";
//they are not equal
bool same = (test1 == test2);
return 0;
}

test1 is the string "a\0\0\0\0\0\0\0\0\0". test2 is the string "a". They are not equal.
std::string can contain null characters. Its length is not the distance to the first null character. It does also guarantee that the memory buffer containing the characters of the string ends with an additional null character 1 beyond its length.
If you don't intend for the string to be longer but just want the memory, use std::string::reserve. Note that you cannot access elements beyond the end with [] legally, but pushing back or whatever won't cause any new memory allocations until you pass the reserve limit.

This is the intended behavior of std::string. Unlike a c-string a std::string can have as many null characters as you want. For instance "this\0 is\0 a\0 legal\0 std::string\0" would be legal to have as the contents for a std::string. You have to build it like
std::string nulls_inside("this\0 is\0 a\0 legal\0 std::string\0", sizeof("this\0 is\0 a\0 legal\0 std::string\0");
but you can also insert null characters into an existing std::string. In your case you're comparing
"a\0\0\0\0\0\0\0\0\0\0"
against
"a\0"
so it fails.

Related

Make C++ std::string end at specified character?

Let's say I am traversing a string of length n. I want it to end at a specific character that fulfils some conditions. I know that C style strings can be terminated at the i'th position by simply assigning the character '\0' at position i in the character array.
Is there any way to achieve the same result in an std::string (C++ style string)? I can think of substr, erase, etc. but all of them are linear in their complexity, which I cannot afford to use.
TL;DR, is there any "end" character for an std::string? Can I make the end iterator point to the current character somehow?
You can use resize:
std::string s = /* ... */;
if (auto n = s.find(c); n != s.npos) {
s.resize(n);
}
The logical answer here is basic_string::resize. What the standard says about this function is:
Effects: Alters the length of the string designated by *this as follows:
If n <= size(), the function replaces the string designated by *this with a string of length n whose elements are a copy of the initial elements of the original string designated by *this.
If n > size(), the function replaces the string designated by *this with a string of length n whose first size() elements are a copy of the original string designated by *this, and whose remaining elements are all initialized to c.
Now, that looks very much like linear time. However, the standard does not specifically state that things will happen this way. They only state that it will be "as if" things happen this way. Therefore, an implementation is completely free to implement the shrinking version of resize by shifting one pointer and writing a NUL character. Nothing in the standard would forbid such an implementation.
So the real question is... are standard library implementations written by complete morons? It's certainly possible that they are. But it's probably wise not to assume so.
Personally, I'd just use resize on the assumption that the library implementers know what they're doing. After all, if they can't write an optimization as simple as that, then who knows what other things they're doing wrong? If you can't trust your standard library implementation not to do stupid things, then you shouldn't be using it in performance-critical code.
is there any "end" character for an std::string?
No. It is possible to define a std::string that is not null terminated. You won't be able to do a few things for such strings, such as treat the return value of std::string:data() as a null terminated C string 1, but a std::string can be constructed that way.
Can I make the end iterator point to the current character somehow?
To get a std::string::iterator point to a certain character, you'll have to traverse the string.
E.g.
std::string str = "This is a string";
auto iter = str.begin();
auto end = iter;
while ( end != str.end() && *end != 'r' )
++end;
After that, the range defined by iter and end contains the string "This is a st".
If that is not acceptable, you'll have to adapt your code to check the value of the character for every step.
std::string str = "This is a string";
auto iter = str.begin();
// Break when 'r' is encountered or end of string is reached.
while ( iter != str.end() && *iter != 'r' )
{
// Use *iter
...
}
1 Thanks are due to #Cubbi for pointing out an error in what I stated. std::string::data() can return a char const* that is not null terminated if using a version of C++ earlier than C++11. If using C++11 or later, std::string::data() is required to return a null terminated char const*.
std::string does not have an "end character" like c style strings. You can have many null terminators inside a single std::string. If you want to the string to end after a certain character then you need to erase the rest of the characters in the string after that last character.
In your case that would give you something like
string_variable.erase(pos_of_last_character + 1)
TL;DR, is there any "end" character for an std::string? Can I make the end iterator point to the current character somehow?
Not really. std::string uses the std::string::size() function to keep track of the number of characters stored and maintained independently of any sentinel characters like '\0'.
Though these are considered when a std::string is initialized from a const char*.

Unexpected output of char arrays in c++

I'm pretty inexperienced in c++, and I wrote the following code to see how characters and strings work.
#include "stdio.h"
#include <iostream>
#include <string>
using namespace std;
int main()
{
char asdf[] = "hello";
char test[5] = {'h','e','l','l','o'};
cout << test;
}
I was expected it to output "hello", but instead I got "hellohello", which is really puzzling to me. I did some experimenting:
If I change the asdf to another string of a different length, it outputs "hello" normally.
If I change the amount of characters in test it outputs "hello" normally.
I thought this only happened when the two were the same length, but when I change them both to "hell" it seems to output "hell" normally.
To make things more confusing, when I asked a friend to run this code on their computer, it outputted "hello" and then a random character.
I'm running a fresh install of code blocks on Ubuntu. Anyone have any idea what is going on here?
This is undefined behaviour.
Raw char* or char[] strings in C and C++ must be NULL-terminated. That is, the string needs to end with a '\0' character. Your test[5] does not do that, so the function printing the output continues after the last o, because it is still looking for the NULL-termination.
Due to how the strings are stored on the stack (the stack usually grows towards lower addresses), the next bytes it encounters are those of asdf[], to which you assigned "hello". This is how the memory layout actually looks like, the arrow indicates the direction in which memory addresses (think pointers) increase:
---->
+-------------------
|hellohello\0 ...
+-------------------
\_ asdf
\_ test
Now in C++ and C, string literals like "hello" are NULL-terminated implicitly, so the compiler writes a hidden '\0' behind the end of the string. The output function continues to print the contents of asdf char-by-char until it reaches that hidden '\0' and then it stops.
If you were to remove the asdf, you would likely see a bit of garbage after the first hello and then a segmentation fault. But this is undefined behaviour, because you are reading out of the bounds of the test array. This also explains why it behaves differently on different systems: for example, some compilers may decide to lay out the variables in a different order on the stack, so that on your friends system, test is actually lower on the stack (remember, lower on the stack means at a higher address):
---->
+-------------------
|hello\0hello ...
+-------------------
\_ test
\_ asdf
Now when you print the contents of test, it will print hello char-by-char, then continue reading the memory until a \0 is found. The contents of ... are highly specific to architecture and runtime used, possibly even phase of the moon and time of day (not entirely serious), so that on your friends machine it prints a "random" character and stops then.
You can fix this by adding a '\0' or 0 to your test array (you will need to change the size to 6). However, using const char test[] = "hello"; is the sanest way to solve this.
You have to terminate your test array with an ascii 0 char. What happens now is that in memory it is adjacent to your asdf string, so since test isn't terminated, the << will just continue until it meets the ascii 0 at the end of asdf.
In case you wonder: When filling asdf, this ascii 0 is added automatically.
The reason for this is that C style strings need the null character to mark the end of the string.
As you have not put this into the array test it will just keep printing characters until it finds one. In you case the array asdf happens to follow test in memory - but this cannot be guaranteed.
Instead change the code to this:
char test[] = {'h','e','l','l','o', 0};
cout is printing all characters starting from the beginning of the given address (test here, or &test[0] in equivalent notation) up to the point where it finds a null terminator. As you haven't put a null terminator into the test array it will continue to print until it accidently finds one in memory. Up from this point it's pretty much undefined behavior what happens.
Last character should be '\0' to indicate end of string.
char test[6] = {'h','e','l','l','o','\0'};
Unless there is an overload of operator<< for a reference to an array of 5 chars, the array will "decay" to a pointer to char and treated as a C style string by the operator. C style strings are by convention terminated with a 0 char, which your array is lacking. Therefore the operator continues outputting the bytes in memory, interpreting them as printable chars. It just so happens that on the stack, the two arrays were adjacent so that the operator ran into asdf's memory area, outputting those chars and finally encountering the implicit 0 char which is at the end of "hello". If you omit the other declaration it's likely that your program will crash, namely if the next 0 byte comes later than the memory boundary of your program.
It is undefined behavior to access memory outside an object (here: test) through a pointer to that object.
Character sequences need a null terminator (\0).
char asdf[] = "hello"; // OK: String literals have '\0' appended at the end
char test[5] = {'h','e','l','l','o'}; // Oops, not null terminated. UB
Corrected:
char test[6] = {'h','e','l','l','o','\0'}; // OK
// ^ ^^^^

Char pointer giving me some really strange characters

When I run the example code, the wordLength is 7 (hence the output 7). But my char array gets some really weird characters in the end of it.
wordLength = word.length();
cout << wordLength;
char * wordchar = new char[wordLength]; //new char[7]; ??
for (int i = 0; i < word.length(); i++) //0-6 = 7
{
wordchar[i] = 'a';
}
cout << wordchar;
The output: 7 aaaaaaa²²²²¦¦¦¦¦ÂD╩2¦♀
Desired output is: aaaaaaa... What is the garbage behind it?? And how did it end up there?
You should add \0 at the end of wordchar.
char * wordchar = new char[wordLength +1];
//add chars as you have done
wordchar[wordLength] = `\0`
The reason is that C-strings are null terminated.
C strings are terminated with a '\0' character that marks their end (in contrast, C++ std::string just stores the length separately).
In copying the characters to wordchar you didn't terminate the string, thus, when operator<< outputs wordchar, it goes on until it finds the first \0 character that happens to be after the memory location pointed to by wordchar, and in the process it prints all the garbage values that happen to be in memory in between.
To fix the problem, you should:
make the allocated string 1 char longer;
add the \0 character at the end.
Still, in C++ you'll normally just want to use std::string.
Use: -
char * wordchar = new char[wordLength+1]; // 1 extra for null character
before for loop and
wordchar[i] ='\0'
after for loop , C strings are null terminated.
Without this it keeps on printing, till it finds the first null character,printing all the garbage values.
You avoid the trailing zero, that's the cause.
In C and C++ the way the whole eco-system treats string length is that it assumes a trailing zero ('\0' or simply 0 numerically). This is different then for example pascal strings, where the memory representation starts with the number which tells how many of the next characters comprise the particular string.
So if you have a certain string content what you want to store, you have to allocate one additional byte for the trailing zero. If you manipulate memory content, you'll always have to keep in mind the trailing zero and preserve it. Otherwise strstr and other string manipulation functions can mutate memory content when running off the track and keep on working on the following memory section. Without trailing zero strlen will also give a false result, it also counts until it encounters the first zero.
You are not the only one making this mistake, it often gets important roles in security vulnerabilities and their exploits. The exploit takes advantage of the side effect that function go off trail and manipulate other things then what was originally intended. This is a very important and dangerous part of C.
In C++ (as you tagged your question) you better use STL's std::string, and STL methods instead of C style manipulations.

How to make sure nothing else is in array

I'm doing an assignment for school and I seem to have a problem with other characters being in the array. (Amongst other problems...)
In one function, I declared the array..
const int MAXSIZE = 100;
char inFix[MAXSIZE];
And this code is used to put chars into the array
//loop to store user input until enter is pressed
while((inputChar = static_cast<char>(cin.get()))!= '\n')
{
//if function to decide if the input should be stored or not
if(isOperator(inputChar) || isdigit(static_cast<int>(inputChar)) || inputChar == '(' || inputChar == ')')
{
inFix[a] = inputChar; //stores input
a++;
}
}
At the end of this, I append the null character to the array though I wasn't sure if I should do this:
inFix[MAXSIZE] = '\0';
Or if I should've used strcat.. either way... in my next function, I use strcat to append a parenthesis to the end.
But I've been having problems with the code, so I ran a for loop to print what is inside the infix array at the beginning of my next function, just to see...
And I get this annoying beeping sound, and a string of weird characters like hearts, and music signs... and.. a whole list of odd characters. What could be the problem? Thanks.
EDIT: by the way, I input 9*4, and I run the for loop after i append the parenthesis, so at the beginning of the output, I get:
9*4) and then the string of odd characters...
so I ran a for loop to print what is inside the infix array at the beginning of my next function, just to see...
And I get this annoying beeping sound, and a string of weird characters like hearts, and music signs... and.. a whole list of odd characters. What could be the problem?
The problem is that you're printing out array elements which you never initialized. The answer you have currently accepted advises you to initialize all those elements. Although this will not cause errors it is a mistake to let this answer prevent you from fully understanding the problem you encountered.
Reconsider your code where you inserted a null character at the end of the array:
inFix[MAXSIZE] = '\0';
You apparently know that a null character has something to do with marking the end of the string, but you've mistaken how to do that correctly. Everything from the beginning of the array until a null character will be treated as part of your string. If you copy three characters from the input 9*4 into your array then you should only want those three characters to be seen as part of your string. What you do not want is for everything in the array past those three characters, up to MAXSIZE to also be treated as part of your string. So you need to put the end-of-string marker, the '\0', right after the characters you care about.
(BTW, inFix[MAXSIZE] = '\0'; not only puts the end-of-string marker at the end of the array, it puts it outside the array, which you are not allowed to do. The program will behave unpredictably.)
inFix[0] = '9';
inFix[1] = '*';
inFix[2] = '4';
inFix[3] = '\0'; // <-- this is where you need to put the end-of-string marker, because this is the end of the characters you care about.
Putting the end-of-string marker at the end of the array effectively does this:
inFix[0] = '9';
inFix[1] = '*';
inFix[2] = '4';
inFix[3] = ???
inFix[4] = ???
.
.
.
inFix[98] = ???
inFix[99] = ??? // annoying bell noise? musical note symbol?
inFix[100] = '\0'; // Yes, Please!
The reason initializing the array to all zeros (which can also be done like this char inFix[MAXSIZE] = {};, empty braces instead of a 0) worked for you is because that means that no matter where you stop writing characters you care about, the next character will be a '\0'. That position, right after the characters you care about, is the only place it matters.
Since the loop that's copying the characters knows exactly where it stops it also knows exactly where to insert the end-of-string marker. It would be easy to just insert a single '\0' in the correct place.
Try appending the '\0' to position a in the array instead - that is, exactly after the last char you've read. Otherwise you're putting your characters in the array, then a random sequence of what was in the array before, and then after that is the \0 (or, in this case, it's one after the end of the array, which is even worse).
You say you null terminate string using inFix[MAXSIZE] = '\0' as you currently understand this should converted into inFix[MAXSIZE - 1] = '\0' but in this case you say that you have an string that ended in last possible character!! so how you should add something( for example a parenthesis ) to it?? because it is full and it can't accept any more characters.
So may be this is better to use inFix[a] = '\0' because you know that a point to end of string, and now if a < MAXSIZE then you have enough space to add more items to the string otherwise you may increase MAXSIZE (using strcat for example) or show some error to the user.
You declared the array with:
char inFix[MAXSIZE];
At this point in time, the content of the array is not initialized. Printing out these uninitialized characters will lead to the behavior you see (strange characters, sounds, etc.).
However, the contents of that array are not initialized. If you want to initialize this array, change this line to:
char inFix[MAXSIZE] = {0};
This initialization syntax causes the elements in the array to be initialized. The first elements are initialized to the values provided in the brackets, while any values not in the brackets will be initialized to 0. In the case above, all of the array elements will be initialized to 0 (which happens to correspond to the NULL character \0).
Since all the characters are initialized to NULL in this case, you don't need to append the null character after your input.
That said, a better solution would be not to use a C-Style array at all, but a std::vector<char>, or simply a std::string, as pointed out above.

String going crazy if I don't give it a little extra room. Can anyone explain what is happening here?

First, I'd like to say that I'm new to C / C++, I'm originally a PHP developer so I am bred to abuse variables any way I like 'em.
C is a strict country, compilers don't like me here very much, I am used to breaking the rules to get things done.
Anyway, this is my simple piece of code:
char IP[15] = "192.168.2.1";
char separator[2] = "||";
puts( separator );
Output:
||192.168.2.1
But if I change the definition of separator to:
char separator[3] = "||";
I get the desired output:
||
So why did I need to give the man extra space, so he doesn't sleep with the man before him?
That's because you get a not null-terminated string when separator length is forced to 2.
Always remember to allocate an extra character for the null terminator. For a string of length N you need N+1 characters.
Once you violate this requirement any code that expects null-terminated strings (puts() function included) will run into undefined behavior.
Your best bet is to not force any specific length:
char separator[] = "||";
will allocate an array of exactly the right size.
Strings in C are NUL-terminated. This means that a string of two characters requires three bytes (two for the characters and the third for the zero byte that denotes the end of the string).
In your example it is possible to omit the size of the array and the compiler will allocate the correct amount of storage:
char IP[] = "192.168.2.1";
char separator[] = "||";
Lastly, if you are coding in C++ rather than C, you're better off using std::string.
If you're using C++ anyway, I'd recommend using the std::string class instead of C strings - much easier and less error-prone IMHO, especially for people with a scripting language background.
There is a hidden nul character '\0' at the end of each string. You have to leave space for that.
If you do
char seperator[] = "||";
you will get a string of size 3, not size 2.
Because in C strings are nul terminated (their end is marked with a 0 byte). If you declare separator to be an array of two characters, and give them both non-zero values, then there is no terminator! Therefore when you puts the array pretty much anything could be tacked on the end (whatever happens to sit in memory past the end of the array - in this case, it appears that it's the IP array).
Edit: this following is incorrect. See comments below.
When you make the array length 3, the extra byte happens to have 0 in it, which terminates the string. However, you probably can't rely on that behavior - if the value is uninitialized it could really contain anything.
In C strings are ended with a special '\0' character, so your separator literal "||" is actually one character longer. puts function just prints every character until it encounters '\0' - in your case one after the IP string.
In C, strings include a (invisible) null byte at the end. You need to account for that null byte.
char ip[15] = "1.2.3.4";
in the code above, ip has enough space for 15 characters. 14 "regular characters" and the null byte. It's too short: should be char ip[16] = "1.2.3.4";
ip[0] == '1';
ip[1] == '.';
/* ... */
ip[6] == '4';
ip[7] == '\0';
Since no one pointed it out so far: If you declare your variable like this, the strings will be automagically null-terminated, and you don't have to mess around with the array sizes:
const char* IP = "192.168.2.1";
const char* seperator = "||";
Note however, that I assume you don't intend to change these strings.
But as already mentioned, the safe way in C++ would be using the std::string class.
A C "String" always ends in NULL, but you just do not give it to the string if you write
char separator[2] = "||". And puts expects this \0 at the ned in the first case it writes till it finds a \0 and here you can see where it is found at the end of the IP address. Interesting enoiugh you can even see how the local variables are layed out on the stack.
The line: char seperator[2] = "||"; should get you undefined behaviour since the length of that character array (which includes the null at the end) will be 3.
Also, what compiler have you compiled the above code with? I compiled with g++ and it flagged the above line as an error.
String in C\C++ are null terminated, i.e. have a hidden zero at the end.
So your separator string would be:
{'|', '|', '\0'} = "||"