I am wondering is this way of reversing a string is safe?
void ReverseString( std::string & stringToReverse )
{
stringToReverse.assign( stringToReverse.rbegin(), stringToReverse.rend() );
}
According to §21.4.6.3/20, assign(first,last) (with iterators first and last) is equivalent to
assign(string(first,last))
Hence it first creates a new string object and then assigns it. There is no risk that the string you copy from (in reverse) is being modified while you still copy (if that is what you were afraid of).
However, using std::reverse(begin(str),end(str)) as suggested by the others is better and potentially more efficient.
I don't know if this is a request to have your code reviewed, or you don't know about other options, but you should just use std::reverse from <algorithm>
std::string str = "Hello world!";
std::reverse(str.begin(), str.end());
This reverses the string in place. If you wanted to create a new string, you're essentially doing what you have in you code using assign() but with the std::string constructor:
std::string reversed(str.rbegin(), str.rend());
As suggested by others, what you did, in fact, reverses the char sequence.
The fact this actually reverses the string depends on what the concept of "reverse" and "string" and "char" are meant to be.
An std::string is a sequence of char that are 8 bit long (at least on the most platforms).
A Japanese string (but even a French or Italian or German one) can contain codepoints that are outside the 0..127 range, and hence need to be encode somewhat to be represented into 8 bit characters, so a "character" may keep more than 1 char. An putting the char-s in reverse order doesn't reverse the text, it just mess it out completely.
Assuming 1 character <=> 1 char is true only for pure ASCII text.
Related
Many topics have discussed the difference between string and char[]. However, they are not clear to me to understand why we need to bring string in c++? Any insight is welcome, thanks!
char[] is C style. It is not object oriented, it forces you as the programmer to deal with implementation details (such as '\0' terminator) and rewrite standard code for handling strings every time over and over.
char[] is just an array of bytes, which can be used to store a string, but it is not a string in any meaningful way.
std::string is a class that properly represents a string and handles all string operations.
It lets you create objects and keep your code fully OOP (if that is what you want).
More importantly, it takes care of memory management for you.
Consider this simple piece of code:
// extract to string
#include <iostream>
#include <string>
main ()
{
std::string name;
std::cout << "Please, enter your name: ";
std::cin >> name;
std::cout << "Hello, " << name << "!\n";
return 0;
}
How would you write the same thing using char[]?
Assume you can not know in advance how long the name would be!
Same goes for string concatenation and other operations.
With real string represented as std::string you combine two strings with a simple += operator. One line.
If you are using char[] however, you need to do the following:
Calculate the size of the combined string + terminator character.
Allocate memory for the new combined string.
Use strncpy to copy first string to new array.
Use strncat to append second string to first string in new array.
Plus, you need to remember not to use the unsafe strcpy and strcat and to free the memory once you are done with the new string.
std::string saves you all that hassle and the many bugs you can introduce while writing it.
As noted by MSalters in a comment, strings can grow. This is, in my opinion, the strongest reason to have them in C++.
For example, the following code has a bug which may cause it to crash, or worse, to appear to work correctly:
char message[] = "Hello";
strcat(message, "World");
The same idea with std::string behaves correctly:
std::string message{"Hello"};
message += "World";
Additional benefits of std::string:
You can send it to functions by value, while char[] can only be sent by reference; this point looks rather insignificant, but it enables powerful code like std::vector<std::string> (a list of strings which you can add to)
std::string stores its length, so any operation which needs the length is more efficient
std::string works similarly to all other C++ containers (vector, etc) so if you are already familiar with containers, std::string is easy to use
std::string has overloaded comparison operators, so it's easy to use with std::map, std::sort, etc.
String class is no more than an amelioration of the char[] variable.
With strings you can achieve the same goals than the use of a char[] variable, but you won't have to matter about little tricks of char[] like pointers, segmentation faults...
This is a more convenient way to build strings, but you don't really see the "undergrounds" of the language, like how to implement concatenation or length functions...
Here is the documentation of the std::string class in C++ : C++ string documentation
I have a problem with a std::string comparation with codification I think. The problem is that I hate to compare a a string that is received and I dont know how kind of codification it has with a spanish string with unusal characters. I cant change s_area.m_s_area_text so I need to set s2 string with a identical value and i dont know how to do it in a generic way for other chases.
std::string s2= "Versión de sistema";
std::cout << s_area.m_s_area_text << std::endl;
for (const char* p = s2.c_str(); *p; ++p)
{
printf("%02x", *p);
}
printf("\n");
for (const char* p = s_area.m_s_area_text.c_str(); *p; ++p)
{
printf("%02x", *p);
}
printf("\n");
And the result of the execution is:
Versi├│n de sistema
5665727369fffffff36e2064652073697374656d61
5665727369ffffffc3ffffffb36e2064652073697374656d61
Obviously, as the 2 strings has not the same bytes values, all the compare method fails: strncmp, std::string ==, std:sstring.comapre etc.
Any idea of how to do that witho touching s_area.m_s_area_text string?
In general it is impossible to guess the encoding of a string by inspecting its raw bytes. The exception to this rule is when a byte order mark (BOM) is present at the start of the byte stream. The BOM will tell you which unicode encoding the bytes are and the endianness.
As an aside, if at some point in the future you decide you need a canonical string encoding (as some have pointed out in the comments that it would be a good idea). There are strong arguments in favour of UTF-8 as the best choice for C++. See UTF-8 everywhere for further information on this.
First of all, two compare two string correctly you at least need to know their encoding. In your example s_area.m_s_area_text is happened to be encoded with UTF-8 while for s2 ISO/IEC 8859-1 (Latin-1) is used.
If you are sure that s_area.m_s_area_text will always be encoded in UTF-8, you can try to make s2 use the same encoding and then just compare them. One way of defining a UTF-8 encoded string is escaping every character that is not in basic character set with \u.
std::string s2 = u8"Versi\u00F3n de sistema";
...
if (s_area.m_s_area_text == s2)
...
It should also be possible to do it without escaping the characters by setting an appropriate encoding for the source file and specifying the encoding to the compiler.
As #nwp mentioned, you may also want to normalise the strings before comparing. Otherwise, two strings that look the same may have different Unicode representation and that will cause your comparison to yield a false negative result.
For example, "Versión de sistema" will not be equal to "Versión de sistema".
Just had an interesting argument in the comment to one of my questions. My opponent claims that the statement "" does not contain "" is wrong.
My reasoning is that if "" contained another "", that one would also contain "" and so on.
Who is wrong?
P.S.
I am talking about a std::string
P.S. P.S
I was not talking about substrings, but even if I add to my question " as a substring", it still makes no sense. An empty substring is nonsense. If you allow empty substrings to be contained in strings, that means you have an infinity of empty substrings. What is the point of that?
Edit:
Am I the only one that thinks there's something wrong with the function std::string::find?
C++ reference clearly says
Return Value: The position of the first character of the first match.
Ok, let's assume it makes sense for a minute and run this code:
string empty1 = "";
string empty2 = "";
int postition = empty1.find(empty2);
cout << "found \"\" at index " << position << endl;
The output is: found "" at index 0
Nonsense part: how can there be index 0 in a string of length 0? It is nonsense.
To be able to even have a 0th position, the string must be at least 1 character long.
And C++ is giving a exception in this case, which proves my point:
cout << empty2.at( empty1.find(empty2) ) << endl;
If it really contained an empty string it would had no problem printing it out.
It depends on what you mean by "contains".
The empty string is a substring of the empty string, and so is contained in that sense.
On the other hand, if you consider a string as a collection of characters, the empty string can't contain the empty string, because its elements are characters, not strings.
Relating to sets, the set
{2}
is a subset of the set
A = {1, 2, 3}
but {2} is not a member of A - all A's members are numbers, not sets.
In the same way, {} is a subset of {}, but {} is not an element in {} (it can't be because it's empty).
So you're both right.
C++ agrees with your "opponent":
#include <iostream>
#include <string>
using namespace std;
int main()
{
bool contains = string("").find(string("")) != string::npos;
cout << "\"\" contains \"\": "
<< boolalpha << contains;
}
Output: "" contains "": true
Demo
It's easy. String A contains sub-string B if there is an argument offset such that A.substr(offset, B.size()) == B. No special cases for empty strings needed.
So, let's see. std::string("").substr(0,0) turns out to be std::string(""). And we can even check your "counter-example". std::string("").substr(0,0).substr(0,0) is also well-defined and empty. Turtles all the way down.
The first thing that is unclear is whether you are talking about std::string or null terminated C strings, the second thing is why should it matter?. I will assume std::string.
The requirements on std::string determine how the component must behave, not what its internal representation must be (although some of the requirements affect the internal representation). As long as the requirements for the component are met, whether it holds something internally is an implementation detail that you might not even be able to test.
In the particular case of an empty string, there is nothing that mandates that it holds anything. It could just hold a size member set to 0 and a pointer (for the dynamically allocated memory if/when not empty) also set to 0. The requirement in operator[] requires that it returns a reference to a character with value 0, but since that character cannot be modified without causing undefined behavior, and since strict aliasing rules allow reading from an lvalue of char type, the implementation could just return a reference to one of the bytes in the size member (all set to 0) in the case of an empty string.
Some implementations of std::string use small object optimizations, in those implementations there will be memory reserved for small strings, including an empty string. While the std::string will obviously not contain a std::string internally, it might contain the sequence of characters that compose an empty string (i.e. a terminating null character)
empty string doesn't contain anything - it's EMPTY. :)
Of course an empty string does not contain an empty string. It'll be turtles all the way down if it did.
Take String empty = ""; that is declaring a string literal that is empty, if you want a string literal to represent a string literal that is empty you would need String representsEMpty = """"; but of course, you need to escape it, giving you string actuallyRepresentsEmpty = "\"\"";
ps, I am taking a pragmatic approach to this. Leave the maths nonsense at the door.
Thinking about you amendment, it could be possible that your 'opponent' meant was that an 'empty' std::string still has an internal storage for characters which is itself empty of characters. That would be an implementation detail I am sure, it could perhaps just keep a certain size (say 10) array of characters 'just incase', so it will technically not be empty.
Of course, there is the trick question answer that 'nothing' fits into anything infinite times, a sort of 'divide by zero' situation.
Today I had the same question since I'm currently bound to a lousy STL implementation (dating back to the pre-C++98 era) that differs from C++98 and all following standards:
TEST_ASSERT(std::string().find(std::string()) == string::npos); // WRONG!!! (non-standard)
This is especially bad if you try to write portable code because it's so hard to prove that no feature depends on that behaviour. Sadly in my case that's actually true: it does string processing to shorten phone numbers input depending on a subscriber line spec.
On Cppreference, I see in std::basic_string::find an explicit description about empty strings that I think matches exactly the case in question:
an empty substring is found at pos if and only if pos <= size()
The referred pos defines the position where to start the search, it defaults to 0 (the beginning).
A standard-compliant C++ Standard Library will pass the following tests:
TEST_ASSERT(std::string().find(std::string()) == 0);
TEST_ASSERT(std::string().substr(0, 0).empty());
TEST_ASSERT(std::string().substr().empty());
This interpretation of "contain" answers the question with yes.
I have a character range with pointers (pBegin and pEnd). I think of it as a string, but it is not \0 terminated. How can I print it to std::cout effectively?
Without creating a copy, like with std::string
Without a loop that prints each character
Do we have good solution? If not, what is the smoothest workaround?
You can use ostream::write, which takes pointer and length arguments:
std::cout.write(pBegin, pEnd - pBegin);
Since C++17 you can use std::string_view, which was created for sharing part of std::string without copying
std::cout << std::string_view(pBegin, pEnd - pBegin);
pEnd must point to one pass the last character to print, like how iterators in C++ work, instead of the last character to print
What is string_view?
In C++11 what is the most performant way to return a reference/pointer to a position in a std::string?
In older C++ standards boost::string_ref is an alternative. Newer boost versions also have boost::string_view with the same semantics as std::string_view. See Differences between boost::string_ref and boost::string_view
If you use Qt then there's also QStringView and QStringRef although unfortunately they're used for viewing QString which stores data in UTF-16 instead of UTF-8 or a byte-oriented encoding
However if you need to process the string by some functions that require null-terminated string without any external libraries then there's a simple solution
char tmpEnd = *pEnd; // backup the after-end character
*pEnd = '\0';
std::cout << pBegin; // use it as normal C-style string, like dosomething(pBegin);
*pEnd = tmpEnd; // restore the char
In this case make sure that pEnd still points to an element inside the original array and not one past the end of it
What are various ways in C/C++ to define a string with no null terminating char(\0) at the end?
EDIT: I am interested in character arrays only and not in STL string.
Typically as another poster wrote:
char s[6] = {'s', 't', 'r', 'i', 'n', 'g'};
or if your current C charset is ASCII, which is usually true (not much EBCDIC around today)
char s[6] = {115, 116, 114, 105, 110, 107};
There is also a largely ignored way that works only in C (not C++)
char s[6] = "string";
If the array size is too small to hold the final 0 (but large enough to hold all the other characters of the constant string), the final zero won't be copied, but it's still valid C (but invalid C++).
Obviously you can also do it at run time:
char s[6];
s[0] = 's';
s[1] = 't';
s[2] = 'r';
s[3] = 'i';
s[4] = 'n';
s[5] = 'g';
or (same remark on ASCII charset as above)
char s[6];
s[0] = 115;
s[1] = 116;
s[2] = 114;
s[3] = 105;
s[4] = 110;
s[5] = 103;
Or using memcopy (or memmove, or bcopy but in this case there is no benefit to do that).
memcpy(c, "string", 6);
or strncpy
strncpy(c, "string", 6);
What should be understood is that there is no such thing as a string in C (in C++ there is strings objects, but that's completely another story). So called strings are just char arrays. And even the name char is misleading, it is no char but just a kind of numerical type. We could probably have called it byte instead, but in the old times there was strange hardware around using 9 bits registers or such and byte implies 8 bits.
As char will very often be used to store a character code, C designers thought of a simpler way than store a number in a char. You could put a letter between simple quotes and the compiler would understand it must store this character code in the char.
What I mean is (for example) that you don't have to do
char c = '\0';
To store a code 0 in a char, just do:
char c = 0;
As we very often have to work with a bunch of chars of variable length, C designers also choosed a convention for "strings". Just put a code 0 where the text should end. By the way there is a name for this kind of string representation "zero terminated string" and if you see the two letters sz at the beginning of a variable name it usually means that it's content is a zero terminated string.
"C sz strings" is not a type at all, just an array of chars as normal as, say, an array of int, but string manipulation functions (strcmp, strcpy, strcat, printf, and many many others) understand and use the 0 ending convention. That also means that if you have a char array that is not zero terminated, you shouldn't call any of these functions as it will likely do something wrong (or you must be extra carefull and use functions with a n letter in their name like strncpy).
The biggest problem with this convention is that there is many cases where it's inefficient. One typical exemple: you want to put something at the end of a 0 terminated string. If you had kept the size you could just jump at the end of string, with sz convention, you have to check it char by char. Other kind of problems occur when dealing with encoded unicode or such. But at the time C was created this convention was very simple and did perfectly the job.
Nowadays, the letters between double quotes like "string" are not plain char arrays as in the past, but const char *. That means that what the pointer points to is a constant that should not be modified (if you want to modify it you must first copy it), and that is a good thing because it helps to detect many programming errors at compile time.
The terminating null is there to terminate the string. Without it, you need some other method to determine it's length.
You can use a predefined length:
char s[6] = {'s','t','r','i','n','g'};
You can emulate pascal-style strings:
unsigned char s[7] = {6, 's','t','r','i','n','g'};
You can use std::string (in C++). (since you're not interested in std::string).
Preferably you would use some pre-existing technology that handles unicode, or at least understands string encoding (i.e., wchar.h).
And a comment: If you're putting this in a program intended to run on an actual computer, you might consider typedef-ing your own "string". This will encourage your compiler to barf if you ever accidentally try to pass it to a function expecting a C-style string.
typedef struct {
char[10] characters;
} ThisIsNotACString;
C++ std::strings are not NUL terminated.
P.S : NULL is a macro1. NUL is \0. Don't mix them up.
1: C.2.2.3 Macro NULL
The macro NULL, defined in any of <clocale>, <cstddef>, <cstdio>, <cstdlib>, <cstring>,
<ctime>, or <cwchar>, is an implementation-defined C++ null pointer constant in this International
Standard (18.1).
In C++ you can use the string class and not deal with the null char at all.
Just for the sake of completeness and nail this down completely.
vector<char>
Use std::string.
There are dozens of other ways to store strings, but using a library is often better than making your own. I'm sure we could all come up with plenty of wacky ways of doing strings without null terminators :).
In C there generally won't be an easier solution. You could possibly do what pascal did and put the length of the string in the first character, but this is a bit of a pain and will limit your string length to the size of the integer that can fit in the space of the first char.
In C++ I'd definitely use the std::string class that can be accessed by
#include <string>
Being a commonly used library this will almost certainly be more reliable than rolling your own string class.
The reason for the NULL termination is so that the handler of the string can determine it's length. If you don't use a NULL termination, you need to pass the strings length, either through a separate parameter/variable, or as part of the string. Otherwise, you could use another delimeter, so long as it isn't used within the string itself.
To be honest, I don't quite understand your question, or if it actually is a question.
Even the string class will store it with a null. If for some reason you absolutely do not want a null character at the end of your string in memory, you'd have to manually create a block of characters, and fill it out yourself.
I can't personally think of any realistic scenario for why you'd want to do this, since the null character is what signals the end of the string. If you're storing the length of the string too, then I guess you've saved one byte at the cost of whatever the size of your variable is (likely 4 bytes), and gained faster access to the length of said string.