Let's say I am traversing a string of length n. I want it to end at a specific character that fulfils some conditions. I know that C style strings can be terminated at the i'th position by simply assigning the character '\0' at position i in the character array.
Is there any way to achieve the same result in an std::string (C++ style string)? I can think of substr, erase, etc. but all of them are linear in their complexity, which I cannot afford to use.
TL;DR, is there any "end" character for an std::string? Can I make the end iterator point to the current character somehow?
You can use resize:
std::string s = /* ... */;
if (auto n = s.find(c); n != s.npos) {
s.resize(n);
}
The logical answer here is basic_string::resize. What the standard says about this function is:
Effects: Alters the length of the string designated by *this as follows:
If n <= size(), the function replaces the string designated by *this with a string of length n whose elements are a copy of the initial elements of the original string designated by *this.
If n > size(), the function replaces the string designated by *this with a string of length n whose first size() elements are a copy of the original string designated by *this, and whose remaining elements are all initialized to c.
Now, that looks very much like linear time. However, the standard does not specifically state that things will happen this way. They only state that it will be "as if" things happen this way. Therefore, an implementation is completely free to implement the shrinking version of resize by shifting one pointer and writing a NUL character. Nothing in the standard would forbid such an implementation.
So the real question is... are standard library implementations written by complete morons? It's certainly possible that they are. But it's probably wise not to assume so.
Personally, I'd just use resize on the assumption that the library implementers know what they're doing. After all, if they can't write an optimization as simple as that, then who knows what other things they're doing wrong? If you can't trust your standard library implementation not to do stupid things, then you shouldn't be using it in performance-critical code.
is there any "end" character for an std::string?
No. It is possible to define a std::string that is not null terminated. You won't be able to do a few things for such strings, such as treat the return value of std::string:data() as a null terminated C string 1, but a std::string can be constructed that way.
Can I make the end iterator point to the current character somehow?
To get a std::string::iterator point to a certain character, you'll have to traverse the string.
E.g.
std::string str = "This is a string";
auto iter = str.begin();
auto end = iter;
while ( end != str.end() && *end != 'r' )
++end;
After that, the range defined by iter and end contains the string "This is a st".
If that is not acceptable, you'll have to adapt your code to check the value of the character for every step.
std::string str = "This is a string";
auto iter = str.begin();
// Break when 'r' is encountered or end of string is reached.
while ( iter != str.end() && *iter != 'r' )
{
// Use *iter
...
}
1 Thanks are due to #Cubbi for pointing out an error in what I stated. std::string::data() can return a char const* that is not null terminated if using a version of C++ earlier than C++11. If using C++11 or later, std::string::data() is required to return a null terminated char const*.
std::string does not have an "end character" like c style strings. You can have many null terminators inside a single std::string. If you want to the string to end after a certain character then you need to erase the rest of the characters in the string after that last character.
In your case that would give you something like
string_variable.erase(pos_of_last_character + 1)
TL;DR, is there any "end" character for an std::string? Can I make the end iterator point to the current character somehow?
Not really. std::string uses the std::string::size() function to keep track of the number of characters stored and maintained independently of any sentinel characters like '\0'.
Though these are considered when a std::string is initialized from a const char*.
Related
std::string resize causes strings that appear to be equal to no longer be equal. It can appear to be misleading when I hover over the variable in my debugger and they appear to hold the same value.
I think it comes down to the fact that I expected the == operator to stop at the first null character but it keeps going till the end of the size. I'm sure this is working as intended but I was stuck on an issue caused by this for a while so I wanted to see why you would keep comparing characters even after the first null character. thanks!
int main(void)
{
std::string test1;
test1.resize(10);
test1[0] = 'a';
std::string test2 = "a";
//they are not equal
bool same = (test1 == test2);
return 0;
}
test1 is the string "a\0\0\0\0\0\0\0\0\0". test2 is the string "a". They are not equal.
std::string can contain null characters. Its length is not the distance to the first null character. It does also guarantee that the memory buffer containing the characters of the string ends with an additional null character 1 beyond its length.
If you don't intend for the string to be longer but just want the memory, use std::string::reserve. Note that you cannot access elements beyond the end with [] legally, but pushing back or whatever won't cause any new memory allocations until you pass the reserve limit.
This is the intended behavior of std::string. Unlike a c-string a std::string can have as many null characters as you want. For instance "this\0 is\0 a\0 legal\0 std::string\0" would be legal to have as the contents for a std::string. You have to build it like
std::string nulls_inside("this\0 is\0 a\0 legal\0 std::string\0", sizeof("this\0 is\0 a\0 legal\0 std::string\0");
but you can also insert null characters into an existing std::string. In your case you're comparing
"a\0\0\0\0\0\0\0\0\0\0"
against
"a\0"
so it fails.
Is a std::string without a null-character in the end valid and can it be acquired like this?:
std::string str = "Hello World";
str.resize(str.size() - 1);
For those who are curious:
I have a 3rd party function taking a string and iterating over the chars (using iterators). Unfortunately the function is buggy (as its a dev-version) and cannot deal with null-characters. I dont have another signature to chose from, I cant modify the function (as I said, 3rd party and we dont want to fork) and at the same time I dont want to reinvent the wheel. As far as I can tell, the function should work as desired without the null-character so I want atleast to give it a try.
The iteration takes place like this:
bool nextChar(CharIntType& c)
{
if (_it == _end) return false;
c = *_it;
++_it;
return true;
}
where _it is initialized to std::string::begin() and _end to std::string::end()
Until C++11, std::string was not required to include a trailing nul until you called c_str().
http://en.cppreference.com/w/cpp/string/basic_string/data
std::string::data()
Returns pointer to the underlying array serving as character storage. The pointer is such that the range [data(); data() + size()) is valid and the values in it correspond to the values stored in the string.
The returned array is not required to be null-terminated.
If empty() returns true, the pointer is a non-null pointer that should not be dereferenced. (until c++11)
The returned array is null-terminated, that is, data() and c_str() perform the same function.
If empty() returns true, the pointer points to a single null character. (since c++11)
From this we can confirm that std::string::size does not include any nul terminator, and that std::string::begin() and std::string::end() describe the ranges you are actually looking for.
We can also determine this by the simple fact that std::string::back() doesn't return a nul character.
#include <iostream>
#include <string>
int main() {
std::string s("hello, world");
std::cout << "s.front = " << s.front() << " s.back = " << s.back() << '\n';
return 0;
}
http://ideone.com/nUX0AB
While it is possible to have non null terminated strings I would not recommend it, strings are null terminated for a good reason, i would actually recommend in this instance that you either go ahead and write the function properly or get in touch with the third party and have them fix it.
To answer your questions yes a std::string is valid if it is not null terminated, to achieve this you can use the overload of string copy with a maximum length loaded, once again i do not recommend this.
See this page for more information:
http://c2.com/cgi/wiki?NonNullTerminatedString
This is a very late answer but I just post it so that anyone who comes later can use it for their reference. If you write a null terminated string into the string.data() array, it will terminate the string and would not let you to continue concatenate the string if you need to. The way to solve it is already answer in the question.
str.resize(str.size() - 1);
This would solve the problem, I have tested out in my code.
Just had an interesting argument in the comment to one of my questions. My opponent claims that the statement "" does not contain "" is wrong.
My reasoning is that if "" contained another "", that one would also contain "" and so on.
Who is wrong?
P.S.
I am talking about a std::string
P.S. P.S
I was not talking about substrings, but even if I add to my question " as a substring", it still makes no sense. An empty substring is nonsense. If you allow empty substrings to be contained in strings, that means you have an infinity of empty substrings. What is the point of that?
Edit:
Am I the only one that thinks there's something wrong with the function std::string::find?
C++ reference clearly says
Return Value: The position of the first character of the first match.
Ok, let's assume it makes sense for a minute and run this code:
string empty1 = "";
string empty2 = "";
int postition = empty1.find(empty2);
cout << "found \"\" at index " << position << endl;
The output is: found "" at index 0
Nonsense part: how can there be index 0 in a string of length 0? It is nonsense.
To be able to even have a 0th position, the string must be at least 1 character long.
And C++ is giving a exception in this case, which proves my point:
cout << empty2.at( empty1.find(empty2) ) << endl;
If it really contained an empty string it would had no problem printing it out.
It depends on what you mean by "contains".
The empty string is a substring of the empty string, and so is contained in that sense.
On the other hand, if you consider a string as a collection of characters, the empty string can't contain the empty string, because its elements are characters, not strings.
Relating to sets, the set
{2}
is a subset of the set
A = {1, 2, 3}
but {2} is not a member of A - all A's members are numbers, not sets.
In the same way, {} is a subset of {}, but {} is not an element in {} (it can't be because it's empty).
So you're both right.
C++ agrees with your "opponent":
#include <iostream>
#include <string>
using namespace std;
int main()
{
bool contains = string("").find(string("")) != string::npos;
cout << "\"\" contains \"\": "
<< boolalpha << contains;
}
Output: "" contains "": true
Demo
It's easy. String A contains sub-string B if there is an argument offset such that A.substr(offset, B.size()) == B. No special cases for empty strings needed.
So, let's see. std::string("").substr(0,0) turns out to be std::string(""). And we can even check your "counter-example". std::string("").substr(0,0).substr(0,0) is also well-defined and empty. Turtles all the way down.
The first thing that is unclear is whether you are talking about std::string or null terminated C strings, the second thing is why should it matter?. I will assume std::string.
The requirements on std::string determine how the component must behave, not what its internal representation must be (although some of the requirements affect the internal representation). As long as the requirements for the component are met, whether it holds something internally is an implementation detail that you might not even be able to test.
In the particular case of an empty string, there is nothing that mandates that it holds anything. It could just hold a size member set to 0 and a pointer (for the dynamically allocated memory if/when not empty) also set to 0. The requirement in operator[] requires that it returns a reference to a character with value 0, but since that character cannot be modified without causing undefined behavior, and since strict aliasing rules allow reading from an lvalue of char type, the implementation could just return a reference to one of the bytes in the size member (all set to 0) in the case of an empty string.
Some implementations of std::string use small object optimizations, in those implementations there will be memory reserved for small strings, including an empty string. While the std::string will obviously not contain a std::string internally, it might contain the sequence of characters that compose an empty string (i.e. a terminating null character)
empty string doesn't contain anything - it's EMPTY. :)
Of course an empty string does not contain an empty string. It'll be turtles all the way down if it did.
Take String empty = ""; that is declaring a string literal that is empty, if you want a string literal to represent a string literal that is empty you would need String representsEMpty = """"; but of course, you need to escape it, giving you string actuallyRepresentsEmpty = "\"\"";
ps, I am taking a pragmatic approach to this. Leave the maths nonsense at the door.
Thinking about you amendment, it could be possible that your 'opponent' meant was that an 'empty' std::string still has an internal storage for characters which is itself empty of characters. That would be an implementation detail I am sure, it could perhaps just keep a certain size (say 10) array of characters 'just incase', so it will technically not be empty.
Of course, there is the trick question answer that 'nothing' fits into anything infinite times, a sort of 'divide by zero' situation.
Today I had the same question since I'm currently bound to a lousy STL implementation (dating back to the pre-C++98 era) that differs from C++98 and all following standards:
TEST_ASSERT(std::string().find(std::string()) == string::npos); // WRONG!!! (non-standard)
This is especially bad if you try to write portable code because it's so hard to prove that no feature depends on that behaviour. Sadly in my case that's actually true: it does string processing to shorten phone numbers input depending on a subscriber line spec.
On Cppreference, I see in std::basic_string::find an explicit description about empty strings that I think matches exactly the case in question:
an empty substring is found at pos if and only if pos <= size()
The referred pos defines the position where to start the search, it defaults to 0 (the beginning).
A standard-compliant C++ Standard Library will pass the following tests:
TEST_ASSERT(std::string().find(std::string()) == 0);
TEST_ASSERT(std::string().substr(0, 0).empty());
TEST_ASSERT(std::string().substr().empty());
This interpretation of "contain" answers the question with yes.
I have a character range with pointers (pBegin and pEnd). I think of it as a string, but it is not \0 terminated. How can I print it to std::cout effectively?
Without creating a copy, like with std::string
Without a loop that prints each character
Do we have good solution? If not, what is the smoothest workaround?
You can use ostream::write, which takes pointer and length arguments:
std::cout.write(pBegin, pEnd - pBegin);
Since C++17 you can use std::string_view, which was created for sharing part of std::string without copying
std::cout << std::string_view(pBegin, pEnd - pBegin);
pEnd must point to one pass the last character to print, like how iterators in C++ work, instead of the last character to print
What is string_view?
In C++11 what is the most performant way to return a reference/pointer to a position in a std::string?
In older C++ standards boost::string_ref is an alternative. Newer boost versions also have boost::string_view with the same semantics as std::string_view. See Differences between boost::string_ref and boost::string_view
If you use Qt then there's also QStringView and QStringRef although unfortunately they're used for viewing QString which stores data in UTF-16 instead of UTF-8 or a byte-oriented encoding
However if you need to process the string by some functions that require null-terminated string without any external libraries then there's a simple solution
char tmpEnd = *pEnd; // backup the after-end character
*pEnd = '\0';
std::cout << pBegin; // use it as normal C-style string, like dosomething(pBegin);
*pEnd = tmpEnd; // restore the char
In this case make sure that pEnd still points to an element inside the original array and not one past the end of it
In my code, I have char array and here it is: char pIPAddress[20];
And I'm setting this array from a string with this code:strcpy(pIPAddress,pString.c_str());
After this loading; for example pIPAddress value is "192.168.1.123 ". But i don't want spaces. I need to delete spaces. For this i did this pIPAddress[13]=0;.
But If IP length chances,It won't work. How can i can calculate space efficient way? or other ways?
Thnx
The simplest approach that you can do is to use the std::remove_copy algorithm:
std::string ip = read_ip_address();
char ipchr[20];
*std::remove_copy( ip.begin(), ip.end(), ipchr, ' ' ) = 0; // [1]
The next question would be why would you want to do this, because it might be better not to copy it into an array but rather remove the spaces from the string and then use c_str() to retrieve a pointer...
EDIT As per James suggestion, if you want to remove all space and not just the ' ' character, you can use std::remove_copy_if with a functor. I have tested passing std::isspace from the <locale> header directly and it seems to work, but I am not sure that this will not be problematic with non-ascii characters (which might be negative):
#include <locale>
#include <algorithm>
int main() {
std::string s = get_ip_address();
char ip[20];
*std::remove_copy_if( s.begin(), s.end(), ip, (int (*)(int))std::isspace ) = 0; // [1]
}
The horrible cast in the last argument is required to select a particular overload of isspace.
[1] The *... = 0; needs to be added to ensure NUL termination of the string. The remove_copy and remove_copy_if algorithms return an end iterator in the output sequence (i.e. one beyond the last element edited), and the *...=0 dereferences that iterator to write the NUL. Alternatively the array can be initialized before calling the algorithm char ip[20] = {}; but that will write \0 to all 20 characters in the array, rather than only to the end of the string.
If spaces are only at the end (or beginning) of your string, you'd best use boost::trim
#include <boost/algorithm/string/trim.hpp>
std::string pString = ...
boost::trim(pString);
strcpy(pIPAddress,pString.c_str());
If you want to handcode, <cctype> has the function isspace, which also has a locale specific version.
I see you have a std::string. You can use the erase() method :
std::string tmp = pString;
for(std::string::iterator iter = tmp.begin(); iter != tmp.end(); ++iter)
while(iter != tmp.end() && *iter == ' ') iter = tmp.erase(iter);
Then you can copy the contents of tmp into your char array.
Note that char arrays are totally deprecated in C++ and you shouldn't use them unless you absolutely have to. In either way, you should do all your string manipulations using std::string.
To make the solution work at all cases, i suggest you iterate through your string, and when finding a space you deal with it.
A more high-level solution may be for you to use the string methods that allow you to do that automatically. (see: http://www.cplusplus.com/reference/string/string/)
I think if you are using
strcpy(pIPAddress,pString.c_str())
then nothing is required to be done, as c_str() returns the a char* to a null terminated string. So after doing the above operation your char array 'pIPAddress' is itself null terminated. So nothing needs to be done to adjust the length as you said.