Getting Extra characters at the end when Creating std::string from char* - c++

I have just started learning C++. Now i am learning about arrays. So i am trying out different examples. One such example is given below:
int main()
{
const char *ptr1 = "Anya";
char arr[] = {'A','n','y','a'};
std::string name1(ptr1); //this works
std::cout << name1 << std::endl;
std::string name2(arr);
std::cout << name2 << std::endl; //this prints extra characters at the end?
return 0;
}
In the above example at the last cout statement i am getting some extra characters at the end. My question is that how can i prevent this from happening in the above code and what is wrong with the code so that i don't make the same mistake in future?

char arr[] = {'A','n','y','a'}; is not null terminated so you will read it out of bounds when creating the string which in turn makes your program have undefined behavior (and could therefore do anything).
Either make it null terminated:
char arr[] = {'A','n','y','a','\0'};
Or, create the string from iterators:
#include <iostream>
#include <iterator>
#include <string>
int main() {
char arr[] = {'A', 'n', 'y', 'a'};
std::string name2(std::begin(arr), std::end(arr));
std::cout << name2 << '\n'; // now prints "Anya"
}
Or create it with the constructor taking the length as an argument:
std::string name2(arr, sizeof arr); // `sizeof arr` is here 4

The problem is that you're constructing a std::string using a non null terminated array as explained below.
When you wrote:
char arr[] = {'A','n','y','a'}; //not null terminated
The above statement creates an array that is not null terminated.
Next when you wrote:
std::string name2(arr); //undefined behavior
There are 2 important things to note about the above statement:
arr decays to a char* due to type decay.
This char* is passed as an argument to a std::string constructor that have a parameter of type const char*. Essentially the above statement creates a std::string object from a non null terminated array.
But note that whenever we create a std::string using a const char*, the array to which the pointer points must be null terminated. Otherwise the result is undefined behavior.
Undefined behavior means anything1 can happen including but not limited to the program giving your expected output. But never rely(or make conclusions based) on the output of a program that has undefined behavior.
For example here the program gives expected output but here it doesn't. So as i said, don't rely on the output of a program that have UB.
Solution
You can solve this by making your array null terminated as shown below.
char arr[] = {'A','n','y','a','\0'}; //arr is null terminated
// char arr[] = "Anya"; //this is also null terminated
1For a more technically accurate definition of undefined behavior see this where it is mentioned that: there are no restrictions on the behavior of the program.

Related

Why runtime error occur when we try to get the length of the NULL string?

Why the following code gives an error?
// This a CPP Program
#include <bits/stdc++.h>
using namespace std;
// Driver code
main()
{
string s=NULL;
s.length();
}
I know that a runtime error will occur because I am trying to get the length of the null string but I want to know why it is happening?
You invoke the following overload of the std::string constructor (overload 5):
basic_string( const CharT* s, const Allocator& alloc = Allocator());
And this is the explanation belonging to the constructor (emphasis mine):
Constructs the string with the contents initialized with a copy of the null-terminated character string pointed to by s. The length of the string is determined by the first null character. The behavior is undefined if [s, s + Traits::length(s)) is not a valid range (for example, if s is a null pointer).
Thus, you have undefined behavior at work. Referring back to your question, that outrules any further thoughts on "why it is happening", because UB can result in anything. You could wonder why it's specified as UB in the first place - this is because std::string shall by design work with C-style, zero-terminated strings (char*). However, NULL is not one. The empty, zero-terminated C-style string is e.g. "".
Why the following code gives an error?
main must be declared to return int.
Also, to declare an empty string, make it string s; or string s="";
This would compile:
#include <iostream>
#include <string>
int main()
{
std::string s;
std::cout << s.length() << '\n'; // prints 0
}
On a sidenote: Please read Why should I not #include <bits/stdc++.h>?
There is no such thing as the null string unless by "null" you mean empty, which you don't.

Why this c++ code is not returning address of each character in the array?

I am trying to get the addresses of each character in the array as follows:
#include <bits/stdc++.h>
using namespace std;
int main() {
char arr[] = {'a', 'b', 'c'};
cout<<&arr[0]<<endl;
cout<<&arr[1]<<endl;
cout<<&arr[2]<<endl;
return 0;
}
But the output I am getting is as follows:
abc0╒#
bc0╒#
c0╒#
Press any key to continue . . .
The output does not look like an address with hexadecimal digits, but just some random characters. Am I missing some concepts here? I want to get the address of each character in the array arr.
The type of &arr[i] is a char*.
The class of which cout is an instance has an overloaded << operator for a const char*. It treats the pointer as the start of a NUL-terminated string, and outputs the data as text.
You are observing the effects of undefined behaviour as a NUL-terminator is not reached. If you had written
char arr[] = {'a', 'b', 'c', 0};
then the program behaviour would be defined.
If you want to output addresses then use cout << (const void*)&arr[0] << endl; &c.
If you want to print out the address, you could cast them (from char*) to void* firstly.
cout<<static_cast<void*>(&arr[0])<<endl;
cout<<static_cast<void*>(&arr[1])<<endl;
cout<<static_cast<void*>(&arr[2])<<endl;
Otherwise, they will be considererd as c-style string, and the content of the string is trying to be printed out. Since arr doesn't have the null terminator '\0' at last, the behavior is undefined here.

strcat on an std::string cast to a char*

I found this (simplified) piece of code in our code base and it's leaving me feeling unpleasant. It either works, doesn't work, or is never called anyway. I would expect some buffer overflow, but when I try it in an online compiler it certainly doesn't work, but doesn't overflow either. I'm looking at the definition of strcat and it will write the source to the destination starting at its null terminator, but I am assuming in this scenario, the destination buffer (which was created as a std::string) should be too small..
#include <iostream>
#include "string.h"
using namespace std;
void addtostring(char* str){
char str2[12] = "goodbye";
strcat(str, str2);
}
int main()
{
std::string my_string = "hello";
addtostring((char*)my_string.c_str());
cout << my_string << endl;
return 0;
}
What would be the actual behaviour of this operation?
What would be the actual behaviour of this operation?
The behavior is undefined. First, writing to any character through c_str is undefined behavior. Secondly, had you used data instead to get a char*, overwriting the null terminator is also undefined behavior. Lastly, both c_str and data only give you a pointer (p) that has a valid range of elements from [p, p + size()]. Writing to any element outside that range is also undefined behavior.
If you want to modify the string you need to use the string's member/free functions to do so. Your function could be rewritten to
void addtostring(std::string& str){
str += "goodbye";
}
and that will have well defined behavior.

Using remove_if with C null-terminated string

I have a situation where I want to efficiently remove a character from a NULL-terminated char *. I can assume the incoming string is large (i.e. it wouldn't be efficient to copy); but I can also assume that I don't need to de-allocate the unused memory.
I thought I could use std::remove_if for this task (replacing the character at the returned iterator with a NULL-terminator), and set up the following test program to make sure I got the syntax correct:
#include <algorithm>
#include <iostream>
bool is_bad (const char &c) {
return c == 'a';
}
int main (int argc, char *argv[]) {
char * test1 = "123a45";
int len = 6;
std::cout << test1 << std::endl;
char * new_end = std::remove_if(&test1[0], &test1[len], is_bad);
*new_end = '\0';
std::cout << test1 << std::endl;
return 0;
}
This program compiles, however, I'm getting a Segmentation Fault somewhere in remove_if - here's the output from gdb:
Program received signal SIGSEGV, Segmentation fault.
0x0000000000400914 in std::remove_copy_if<char*, char*, bool (*)(char const&)> (__first=0x400c2c "45", __last=0x400c2e "", __result=0x400c2b "a45",
__pred=0x4007d8 <is_bad(char const&)>) at /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/bits/stl_algo.h:1218
1218 *__result = *__first;
This is with gcc 4.1.2 on RedHat 4.1.2-52.
My understanding was that raw pointers can be used as ForwardIterators, but perhaps not? Any suggestions?
The program has undefined behaviour as it is attempting to modify a string literal:
char * test1 = "123a45";
Change to:
char test1[] = "123a45"; // 'test1' is a copy of the string literal.
char * new_end = std::remove_if(test1, test1 + sizeof(test1), is_bad);
See http://ideone.com/yzeo4k.
Your program has undefined behavior, since you are trying to modify an array of const characters (string literals are arrays of const characters). Per paragraph 7.1.6.1/4 of the C++11 Standard:
Except that any class member declared mutable (7.1.1) can be modified, any attempt to modify a const
object during its lifetime (3.8) results in undefined behavior.
Notice, that since C++11 the conversion from a string literal to a char* is illegal, and in C++03 is deprecated (GCC 4.7.2 gives me a warning for that).
To fix your program with a minimal change, declare test1 as an array of characters and initialize it from the string literal:
char test1[] = "123a45";
Here is a live example.

atof and non-null terminated character array

using namespace std;
int main(int argc, char *argv[]) {
char c[] = {'0','.','5'};
//char c[] = "0.5";
float f = atof(c);
cout << f*10;
if(c[3] != '\0')
{
cout << "YES";
}
}
OUTPUT: 5YES
Does atof work with non-null terminated character arrays too? If so, how does it know where to stop?
Does atof work with non-null terminated character arrays too?
No, it doesn't. std::atof requires a null-terminated string in input. Failing to satisfy this precondition is Undefined Behavior.
Undefined Behavior means that anything could happen, including the program seeming to work fine. What is happening here is that by chance you have a byte in memory right after the last element of your array which cannot be interpreted as part of the representation of a floating-point number, which is why your implementation of std::atof stops. But that's something that cannot be relied upon.
You should fix your program this way:
char c[] = {'0', '.', '5', '\0'};
// ^^^^
No, atof does not work with non-null terminated arrays: it stops whenever it discovers zero after the end of the array that you pass in. Passing an array without termination is undefined behavior, because it leads the function to read past the end of the array. In your example, the function has likely accessed bytes that you have allocated to f (although there is no certainty there, because f does not need to follow c[] in memory).
char c[] = {'0','.','5'};
char d[] = {'6','7','8'};
float f = atof(c); // << Undefined behavior!!!
float g = atof(d); // << Undefined behavior!!!
cout << f*10;
The above prints 5.678, pointing out the fact that a read past the end of the array has been made.
No... atof() requires a null terminated string.
If you have a string you need to convert that is not null terminated, you could try copying it into a target buffer based on the value of each char being a valid digit. Something to the effect of...
char buff[64] = { 0 };
for( int i = 0; i < sizeof( buff )-1; i++ )
{
char input = input_string[i];
if( isdigit( input ) || input == '-' || input == '.' )
buff[i] = input;
else
break;
}
double result = atof( buff );
From the description of the atof() function on MSDN (probably applies to other compilers) :
The function stops reading the input string at the first character that it cannot recognize as part of a number. This character may be the null character ('\0' or L'\0') terminating the string.
It must either be 0 terminated or the text must contain characters that do not belong to the number.
std::string already terminate a string with NULL!
So why not
std::string number = "7.6";
double temp = ::atof(number.c_str());
You can also do it with the stringstream or boost::lexical_cast
http://www.boost.org/doc/libs/1_53_0/doc/html/boost_lexical_cast.html
http://www.cplusplus.com/reference/sstream/stringstream/
Since C++11, we have std::stof. By replacing atof with std::stof, it would be easier to handle.
I made a handy wrapper if you always pass a known size of char array.
Live Demo
#include <fmt/core.h>
#include <type_traits>
#include <iostream>
// SFINAE fallback
template<typename T, typename =
std::enable_if< std::is_pointer<T>::value >
>
float charArrayToFloat(const T arr){ // Fall back for user friendly compiler errors
static_assert(false == std::is_pointer<T>::value, "`charArrayToFloat()` dosen't allow conversion from pointer!");
return -1;
}
// Valid for both null or non-null-terminated char array
template<size_t sz>
float charArrayToFloat(const char(&arr)[sz]){
// It doesn't matter whether it's null terminated or not
std::string str(arr, sz);
return std::stof(str);
}
int main() {
char number[4] = {'0','.','4','2'};
float ret = charArrayToFloat(number);
fmt::print("The answer is {}. ", ret);
return 0;
}
Output: The answer is 0.42.
Does atof work with non-null terminated character arrays too?
No, this function expects a pointer to a null terminated string. Failing to do so, say for example by passing a pointer to a non-null terminated string(or a non-null terminated character array) is undefined behavior.
Undefined behavior means anything1 can happen including but not limited to the program giving your expected output. But never rely(or make conclusions based) on the output of a program that has undefined behavior.
So the output that you're seeing(maybe seeing) is a result of undefined behavior. And as i said don't rely on the output of a program that has UB. The program may just crash.
So the first step to make the program correct would be to remove UB. Then and only then you can start reasoning about the output of the program.
1For a more technically accurate definition of undefined behavior see this where it is mentioned that: there are no restrictions on the behavior of the program.