I came across a subtle bug a couple of days ago where the code looked something like this:
ostringstream ss;
int anInt( 7 );
ss << anInt << "HABITS";
ss << ends;
string theWholeLot = ss.str();
The problem was that the ends was sticking a '\0' into the ostringstream so theWholeLot actually looked like "7HABITS\0" (i.e. a null at the end)
Now this hadn't shown up because theWholeLot was then being used to take the const char * portion using string::c_str() That meant that the null was masked as it became just a delimiter. However, when this changed to use strings throughout, the null suddenly meant something and comparisons such as:
if ( theWholeLot == "7HABITS" )
would fail. This got me thinking: Presumably the reason for ends is a throwback to the days of ostrstream when the stream was not normally terminated with a null and had to be so that str() (which then cast out not a string but a char *) would work correctly.
However, now that it's not possible to cast out a char * from a ostringstream, using ends is not only superfluous, but potentially dangerous and I'm considering removing them all from my clients code.
Can anyone see an obvious reason to use ends in a std::string only environment?
You've essentially answered your own question is as much detail that's needed. I certainly can't think of any reason to use std::ends when std::string and std::stringstream handle all that for you.
So, to answer your question explicitly, no, there is no reason to use std::ends in a std::string only environment.
There are some APIs that expect a "string array" with multiple zero terminated strings, a double zero to mark the end. Raymond Chang just recently blogged about it, most of all to demonstrate how often that this gets fumbled.
Related
If I want to construct a std::string with a line like:
std::string my_string("a\0b");
Where i want to have three characters in the resulting string (a, null, b), I only get one. What is the proper syntax?
Since C++14
we have been able to create literal std::string
#include <iostream>
#include <string>
int main()
{
using namespace std::string_literals;
std::string s = "pl-\0-op"s; // <- Notice the "s" at the end
// This is a std::string literal not
// a C-String literal.
std::cout << s << "\n";
}
Before C++14
The problem is the std::string constructor that takes a const char* assumes the input is a C-string. C-strings are \0 terminated and thus parsing stops when it reaches the \0 character.
To compensate for this, you need to use the constructor that builds the string from a char array (not a C-String). This takes two parameters - a pointer to the array and a length:
std::string x("pq\0rs"); // Two characters because input assumed to be C-String
std::string x("pq\0rs",5); // 5 Characters as the input is now a char array with 5 characters.
Note: C++ std::string is NOT \0-terminated (as suggested in other posts). However, you can extract a pointer to an internal buffer that contains a C-String with the method c_str().
Also check out Doug T's answer below about using a vector<char>.
Also check out RiaD for a C++14 solution.
If you are doing manipulation like you would with a c-style string (array of chars) consider using
std::vector<char>
You have more freedom to treat it like an array in the same manner you would treat a c-string. You can use copy() to copy into a string:
std::vector<char> vec(100)
strncpy(&vec[0], "blah blah blah", 100);
std::string vecAsStr( vec.begin(), vec.end());
and you can use it in many of the same places you can use c-strings
printf("%s" &vec[0])
vec[10] = '\0';
vec[11] = 'b';
Naturally, however, you suffer from the same problems as c-strings. You may forget your null terminal or write past the allocated space.
I have no idea why you'd want to do such a thing, but try this:
std::string my_string("a\0b", 3);
What new capabilities do user-defined literals add to C++? presents an elegant answer: Define
std::string operator "" _s(const char* str, size_t n)
{
return std::string(str, n);
}
then you can create your string this way:
std::string my_string("a\0b"_s);
or even so:
auto my_string = "a\0b"_s;
There's an "old style" way:
#define S(s) s, sizeof s - 1 // trailing NUL does not belong to the string
then you can define
std::string my_string(S("a\0b"));
The following will work...
std::string s;
s.push_back('a');
s.push_back('\0');
s.push_back('b');
You'll have to be careful with this. If you replace 'b' with any numeric character, you will silently create the wrong string using most methods. See: Rules for C++ string literals escape character.
For example, I dropped this innocent looking snippet in the middle of a program
// Create '\0' followed by '0' 40 times ;)
std::string str("\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00", 80);
std::cerr << "Entering loop.\n";
for (char & c : str) {
std::cerr << c;
// 'Q' is way cooler than '\0' or '0'
c = 'Q';
}
std::cerr << "\n";
for (char & c : str) {
std::cerr << c;
}
std::cerr << "\n";
Here is what this program output for me:
Entering loop.
Entering loop.
vector::_M_emplace_ba
QQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQ
That was my first print statement twice, several non-printing characters, followed by a newline, followed by something in internal memory, which I just overwrote (and then printed, showing that it has been overwritten). Worst of all, even compiling this with thorough and verbose gcc warnings gave me no indication of something being wrong, and running the program through valgrind didn't complain about any improper memory access patterns. In other words, it's completely undetectable by modern tools.
You can get this same problem with the much simpler std::string("0", 100);, but the example above is a little trickier, and thus harder to see what's wrong.
Fortunately, C++11 gives us a good solution to the problem using initializer list syntax. This saves you from having to specify the number of characters (which, as I showed above, you can do incorrectly), and avoids combining escaped numbers. std::string str({'a', '\0', 'b'}) is safe for any string content, unlike versions that take an array of char and a size.
In C++14 you now may use literals
using namespace std::literals::string_literals;
std::string s = "a\0b"s;
std::cout << s.size(); // 3
Better to use std::vector<char> if this question isn't just for educational purposes.
anonym's answer is excellent, but there's a non-macro solution in C++98 as well:
template <size_t N>
std::string RawString(const char (&ch)[N])
{
return std::string(ch, N-1); // Again, exclude trailing `null`
}
With this function, RawString(/* literal */) will produce the same string as S(/* literal */):
std::string my_string_t(RawString("a\0b"));
std::string my_string_m(S("a\0b"));
std::cout << "Using template: " << my_string_t << std::endl;
std::cout << "Using macro: " << my_string_m << std::endl;
Additionally, there's an issue with the macro: the expression is not actually a std::string as written, and therefore can't be used e.g. for simple assignment-initialization:
std::string s = S("a\0b"); // ERROR!
...so it might be preferable to use:
#define std::string(s, sizeof s - 1)
Obviously you should only use one or the other solution in your project and call it whatever you think is appropriate.
I know it is a long time this question has been asked. But for anyone who is having a similar problem might be interested in the following code.
CComBSTR(20,"mystring1\0mystring2\0")
Almost all implementations of std::strings are null-terminated, so you probably shouldn't do this. Note that "a\0b" is actually four characters long because of the automatic null terminator (a, null, b, null). If you really want to do this and break std::string's contract, you can do:
std::string s("aab");
s.at(1) = '\0';
but if you do, all your friends will laugh at you, you will never find true happiness.
Is it possible to use an std::string for read() ?
Example :
std::string data;
read(fd, data, 42);
Normaly, we have to use char* but is it possible to directly use a std::string ? (I prefer don't create a char* for store the result)
Thank's
Well, you'll need to create a char* somehow, since that's what the
function requires. (BTW: you are talking about the Posix function
read, aren't you, and not std::istream::read?) The problem isn't
the char*, it's what the char* points to (which I suspect is what
you actually meant).
The simplest and usual solution here would be to use a local array:
char buffer[43];
int len = read(fd, buffer, 42);
if ( len < 0 ) {
// read error...
} else if ( len == 0 ) {
// eof...
} else {
std::string data(buffer, len);
}
If you want to capture directly into an std::string, however, this is
possible (although not necessarily a good idea):
std::string data;
data.resize( 42 );
int len = read( fd, &data[0], data.size() );
// error handling as above...
data.resize( len ); // If no error...
This avoids the copy, but quite frankly... The copy is insignificant
compared to the time necessary for the actual read and for the
allocation of the memory in the string. This also has the (probably
negligible) disadvantage of the resulting string having an actual buffer
of 42 bytes (rounded up to whatever), rather than just the minimum
necessary for the characters actually read.
(And since people sometimes raise the issue, with regards to the
contiguity of the memory in std:;string: this was an issue ten or more
years ago. The original specifications for std::string were designed
expressedly to allow non-contiguous implementations, along the lines of
the then popular rope class. In practice, no implementor found this
to be useful, and people did start assuming contiguity. At which point,
the standards committee decided to align the standard with existing
practice, and require contiguity. So... no implementation has ever not
been contiguous, and no future implementation will forego contiguity,
given the requirements in C++11.)
No, you cannot and you should not. Usually, std::string implementations internally store other information such as the size of the allocated memory and the length of the actual string. C++ documentation explicitly states that modifying values returned by c_str() or data() results in undefined behaviour.
If the read function requires a char *, then no. You could use the address of the first element of a std::vector of char as long as it's been resized first. I don't think old (pre C++11) strings are guarenteed to have contiguous memory otherwise you could do something similar with the string.
No, but
std::string data;
cin >> data;
works just fine. If you really want the behaviour of read(2), then you need to allocate and manage your own buffer of chars.
Because read() is intended for raw data input, std::string is actually a bad choice, because std::string handles text. std::vector seems like the right choice to handle raw data.
Using std::getline from the strings library - see cplusplus.com - can read from an stream and write directly into a string object. Example (again ripped from cplusplus.com - 1st hit on google for getline):
int main () {
string str;
cout << "Please enter full name: ";
getline (cin,str);
cout << "Thank you, " << str << ".\n";
}
So will work when reading from stdin (cin) and from a file (ifstream).
I hope this question is appropriate for stackoverflow... What is the difference between storing raw data bytes (8 bits) in a std::string rather than storing them in std::vector<char>. I'm reading binary data from a file and storing those raw bytes in a std::string. This works well, there are no problems or issues with doing this. My program works as expected. However, other programmers prefer the std::vector<char> approach and suggest I stop using std::string as it's unsafe for raw bytes. So I'm wondering why might it be unsafe to use std::string to hold raw data bytes? I know std::string is most often used to store ASCII text, but a byte is a byte, so I don't understand the preference of the std::vector<char>.
Thanks for any advice!
The problem is not really whether it works or it doesn't. The problem is that it is utterly confusing for the next guy reading your code. std::string is meant for displaying text. Anybody reading your code will expect that. You'll declare your intent much better with a std::vector<char>.
It increases your WTF/min in code reviews.
In C++03, using std::string to store an array of byte data was not a good idea. By the standard, std::string did not have to store data contiguously. C++11 fixed that so that it's data does have to be contiguous.
So it would not be functional to do this in C++03. Not unless you have personally vetted your C++ standard library implementation of std::string to ensure that it is contiguous.
Either way, I would suggest vector<char>. Generally, when you see string, you expect it to be a... string. You know, a sequence of characters in some form of encoding. A vector<char> makes it obvious that it isn't a string, but an array of bytes.
Besides contiguous storage and code-clarity issues, I ran into some fairly insidious errors trying to use std::string to hold raw bytes.
Most of them centered around trying to convert a char array of bytes to std::string when interfacing with C libraries. For example:
std::string password = "pass\0word";
std::cout << password.length() << std::endl; // prints 4, not 9
Maybe you can fix that by specifying the length:
std::string password("pass\0word", 0, 9);
std::cout << password.length() << std::endl; // nope! still 4!
This is probably because the constructor expects to receive a C-string, not a byte array. There might be a better way, but I ended up with this:
std::string password("pass0word", 0, 9);
password[4] = '\0';
std::cout << password.length() << std::endl; // hurray! 9!
A little clunky. Thankfully I found this in unit testing, but I would have missed it if my test vectors didn't have null bytes. What makes this insidious is that the second approach above will work fine until the array contains a null byte.
So far std::vector<uint8_t> looks like a good option (thanks J.N. and Hurkyl):
char p[] = "pass\0word";
std::vector<uint8_t> password(p, p, p+9); // :)
Note: I haven't tried the iterator constructor with std::string, but this error is easy enough to make that it might be worth avoiding even the possibility.
Lessons learned:
Test byte-handling methods witih null byte-containing test vectors.
Be careful when (and I would say avoid) using std::string to hold raw bytes.
Ps: This is more of a conceptual question.
I know this makes things more complicated for no good reason, but here is what I'm wondering. If I'm not mistaken, a const char* "like this" in c++ is pointing to l and will be automatically zero terminated on compile time. I believe it is creating a temporary variable const char* to hold it, unless it is keeping track of the offset using a byte variable (I didn't check the disassembly). My question is, how would you if even possible, add characters to this string without having to call functions or instantiating strings?
Example (This is wrong, just so you can visualize what I meant):
"Like thi" + 's';
The closest thing I came up with was to store it to a const char* with enough spaces and change the other characters.
Example:
char str[9];
strcpy(str, "Like thi")
str[8] = 's';
Clarification:
Down vote: This question does not show any research effort; it is unclear or not useful
Ok, so the question has been highly down voted. There wasn't much reasoning on which of these my question was lacking on, so I'll try to improve all of those qualities.
My question was more so I could have a better understanding of what goes on when you simply create a string "like this" without storing the address of that string in a const char* I also wanted to know if it was possible to concatenate/change the content of that string without using functions like strcat() and without using the overloaded operator + from the class string. I'm aware this is not exactly useful for dealing with strings in C++, but I was curious whether or not there was a way besides the standard ways for doing so.
string example = "Like thi" + "s"; //I'm aware of the string class and its member functions
const char* example2 = "Like this"; //I'm also aware of C-type Strings (CString as well)
It is also possible that not having English as my native language made things even worst, I apologize for the confusion.
Instead of using a plain char string, you should use the string library provided by the C++ library:
#include <string>
#include <iostream>
using namespace std;
int main()
{
string str = "Like thi";
cout << str << endl;
str = str + "s";
cout << str << endl;
return 0;
}
Normally, it's not possible to simply concatenate plain char * strings in C or C++, because they are merely pointers to arrays of characters. There's almost no reason you should be using a bare character array in C++ if you intend on doing any string manipulations within your own code.
Even if you need access to the C representation (e.g. for an external library) you can use string::c_str().
First, there is nothing null terminated, but the zero terminated. All char* strings in C end with '\0'.
When you in code do something like this:
char *name="Daniel";
compiler will generate a string that has a contents:
Daniel\0
and will initialize name pointer to point at it at a certain time during program execution depending on the variable context (member, static, ...).
Appending ANYTHING to the name won't work as you expect, since memory pointed to by name isn't changeable, and you'll probably get either access violation error or will overwrite something else.
Having
const char* copyOfTheName = name;
won't create a copy of the string in question, it will only have copyOfTheName point to the original string, so having
copyOfTheName[6]='A';
will be exactly as
name[6]='A';
and will only cause problems to you.
Use std::strcat instead. And please, do some investigating how the basic string operations work in C.
Consider the following snippet that uses strtok to split the string madddy.
char* str = (char*) malloc(sizeof("Madddy"));
strcpy(str,"Madddy");
char* tmp = strtok(str,"d");
std::cout<<tmp;
do
{
std::cout<<tmp;
tmp=strtok(NULL, "dddy");
}while(tmp!=NULL);
It works fine, the output is Ma. But by modifying the strtok to the following,
tmp=strtok(NULL, "ay");
The output becomes Madd. So how does strtok exactly work? I have this question because I expected strtok to take each and every character that is in the delimiter string to be taken as a delimiter. But in certain cases it is doing that way but in few cases, it is giving unexpected results. Could anyone help me understand this?
"Trying to understand strtok" Good luck!
Anyway, we're in 2011. Tokenise properly:
std::string str("abc:def");
char split_char = ':';
std::istringstream split(str);
std::vector<std::string> token;
for (std::string each; std::getline(split, each, split_char); token.push_back(each));
:D
Fred Flintstone probably used strtok(). It predates multi threaded environments and beats up (modifies) the source string.
When called with NULL for the first parameter, it continues parsing the last string. This feature was convenient, but a bit unusual even in its day.
It seems you forget that you have call strtok the first time (out of loop) by delimiter "d".
The strtok is working fine. You should have a reference here.
For the second example(strtok("ay")):
First, you call strtok(str, "d"). It will look for the first "d", and seperate your string. Specifically, it sets tmp = "Ma", and str = "ddy" (dropping the first "d").
Then, you call strtok(str, "ay"). It will look for an "a" in str, but since your string now is only "ddy", no matching occurs. Then it will look for an "y". So str = "dd" and tmp = "".
It prints "Madd" as you saw.
Actually your code is wrong, no wonder you get unexpected results:
char* str = (char*) malloc(sizeof("Madddy"));
should be
char* str = (char*) malloc(strlen("Madddy") + 1);
I asked a question inspired from another question about functions causing security problems/bad practise functions and the c standard library.
To quote the answer given to me from there:
A common pitfall with the strtok()
function is to assume that the parsed
string is left unchanged, while it
actually replaces the separator
character with '\0'.
Also, strtok() is used by making
subsequent calls to it, until the
entire string is tokenized. Some
library implementations store
strtok()'s internal status in a
global variable, which may induce some
nasty suprises, if strtok() is
called from multiple threads at the
same time.
As you've tagged your question C++, use something else! If you want to use C, I'd suggest implementing your own tokenizer that works in a safe fashion.
Since you changed your tag to be C and not C++, I rewrote your function to use printf so that you can see what is happening. Hoang is correct. You seeing correct output, but I think that you are printing everything on the same line, so you got confused by the output. Look at Hoang's answer as he explains what is happening correctly. Also, as others have noted, strtok destroys the input string, so you have to be careful about that - and it's not thread safe. But if you need a quick an dirty tokenizer, it works. Also, I changed the code to correctly use strlen, and not sizeof as correctly pointed out by Anders.
Here is your code modified to be more C-like:
char* str = (char*) malloc(strlen("Madddy") + 1);
strcpy(str,"Madddy");
char* tmp = strtok(str,"d");
printf ("first token: %s\n", tmp);
do
{
tmp=strtok(NULL, "ay");
if (tmp != NULL ) {
printf ("next token: %s\n", tmp);
}
} while(tmp != NULL);