Is there a way to add character to a string using a raw binary value? I know I can do something like that:
std::string output3 = std::string("\x01\x00\x01...", ...);
There it's done by character's hex value. Is is possible to specify the character by its bin value? Something like this:
std::string output1 = std::string("\b11100101\b01000000", 7);
Note: I know \b has its meaning, it was just an example.
Short answer - C++ does not provide a means for escaping characters using binary values.
Likely explanation: it's never been considered useful enough for any compiler to implement as an extension (AFAIK), and certainly never useful enough to propose for standardisation.
If it's something you really need, I recommend you write (or modify) a preprocessor to do that for you (but you shouldn't use \b as introducer, as that already represents the backspace character).
You can use append() to add individual characters, e.g.:
std::string s = "abc";
s.append(1, 'd');
s.append(1, 0x65); // 0x65 == 'e'
std::cout << s << std::endl;
Related
If I want to construct a std::string with a line like:
std::string my_string("a\0b");
Where i want to have three characters in the resulting string (a, null, b), I only get one. What is the proper syntax?
Since C++14
we have been able to create literal std::string
#include <iostream>
#include <string>
int main()
{
using namespace std::string_literals;
std::string s = "pl-\0-op"s; // <- Notice the "s" at the end
// This is a std::string literal not
// a C-String literal.
std::cout << s << "\n";
}
Before C++14
The problem is the std::string constructor that takes a const char* assumes the input is a C-string. C-strings are \0 terminated and thus parsing stops when it reaches the \0 character.
To compensate for this, you need to use the constructor that builds the string from a char array (not a C-String). This takes two parameters - a pointer to the array and a length:
std::string x("pq\0rs"); // Two characters because input assumed to be C-String
std::string x("pq\0rs",5); // 5 Characters as the input is now a char array with 5 characters.
Note: C++ std::string is NOT \0-terminated (as suggested in other posts). However, you can extract a pointer to an internal buffer that contains a C-String with the method c_str().
Also check out Doug T's answer below about using a vector<char>.
Also check out RiaD for a C++14 solution.
If you are doing manipulation like you would with a c-style string (array of chars) consider using
std::vector<char>
You have more freedom to treat it like an array in the same manner you would treat a c-string. You can use copy() to copy into a string:
std::vector<char> vec(100)
strncpy(&vec[0], "blah blah blah", 100);
std::string vecAsStr( vec.begin(), vec.end());
and you can use it in many of the same places you can use c-strings
printf("%s" &vec[0])
vec[10] = '\0';
vec[11] = 'b';
Naturally, however, you suffer from the same problems as c-strings. You may forget your null terminal or write past the allocated space.
I have no idea why you'd want to do such a thing, but try this:
std::string my_string("a\0b", 3);
What new capabilities do user-defined literals add to C++? presents an elegant answer: Define
std::string operator "" _s(const char* str, size_t n)
{
return std::string(str, n);
}
then you can create your string this way:
std::string my_string("a\0b"_s);
or even so:
auto my_string = "a\0b"_s;
There's an "old style" way:
#define S(s) s, sizeof s - 1 // trailing NUL does not belong to the string
then you can define
std::string my_string(S("a\0b"));
The following will work...
std::string s;
s.push_back('a');
s.push_back('\0');
s.push_back('b');
You'll have to be careful with this. If you replace 'b' with any numeric character, you will silently create the wrong string using most methods. See: Rules for C++ string literals escape character.
For example, I dropped this innocent looking snippet in the middle of a program
// Create '\0' followed by '0' 40 times ;)
std::string str("\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00", 80);
std::cerr << "Entering loop.\n";
for (char & c : str) {
std::cerr << c;
// 'Q' is way cooler than '\0' or '0'
c = 'Q';
}
std::cerr << "\n";
for (char & c : str) {
std::cerr << c;
}
std::cerr << "\n";
Here is what this program output for me:
Entering loop.
Entering loop.
vector::_M_emplace_ba
QQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQ
That was my first print statement twice, several non-printing characters, followed by a newline, followed by something in internal memory, which I just overwrote (and then printed, showing that it has been overwritten). Worst of all, even compiling this with thorough and verbose gcc warnings gave me no indication of something being wrong, and running the program through valgrind didn't complain about any improper memory access patterns. In other words, it's completely undetectable by modern tools.
You can get this same problem with the much simpler std::string("0", 100);, but the example above is a little trickier, and thus harder to see what's wrong.
Fortunately, C++11 gives us a good solution to the problem using initializer list syntax. This saves you from having to specify the number of characters (which, as I showed above, you can do incorrectly), and avoids combining escaped numbers. std::string str({'a', '\0', 'b'}) is safe for any string content, unlike versions that take an array of char and a size.
In C++14 you now may use literals
using namespace std::literals::string_literals;
std::string s = "a\0b"s;
std::cout << s.size(); // 3
Better to use std::vector<char> if this question isn't just for educational purposes.
anonym's answer is excellent, but there's a non-macro solution in C++98 as well:
template <size_t N>
std::string RawString(const char (&ch)[N])
{
return std::string(ch, N-1); // Again, exclude trailing `null`
}
With this function, RawString(/* literal */) will produce the same string as S(/* literal */):
std::string my_string_t(RawString("a\0b"));
std::string my_string_m(S("a\0b"));
std::cout << "Using template: " << my_string_t << std::endl;
std::cout << "Using macro: " << my_string_m << std::endl;
Additionally, there's an issue with the macro: the expression is not actually a std::string as written, and therefore can't be used e.g. for simple assignment-initialization:
std::string s = S("a\0b"); // ERROR!
...so it might be preferable to use:
#define std::string(s, sizeof s - 1)
Obviously you should only use one or the other solution in your project and call it whatever you think is appropriate.
I know it is a long time this question has been asked. But for anyone who is having a similar problem might be interested in the following code.
CComBSTR(20,"mystring1\0mystring2\0")
Almost all implementations of std::strings are null-terminated, so you probably shouldn't do this. Note that "a\0b" is actually four characters long because of the automatic null terminator (a, null, b, null). If you really want to do this and break std::string's contract, you can do:
std::string s("aab");
s.at(1) = '\0';
but if you do, all your friends will laugh at you, you will never find true happiness.
char str[] = "C:\Windows\system32"
auto raw_string = convert_to_raw(str);
std::cout << raw_string;
Desired output:
C:\Windows\system32
Is it possible? I am not a big fan of cluttering my path strings with extra backslash. Nor do I like an explicit R"()" notation.
Any other work-around of reading a backslash in a string literally?
That's not possible, \ has special meaning inside a non-raw string literal, and raw string literals exist precisely to give you a chance to avoid having to escape stuff. Give up, what you need is R"(...)".
Indeed, when you write something like
char const * str{"a\nb"};
you can verify yourself that strlen(str) is 3, not 4, which means that once you compile that line, in the binary/object file there's only one single character, the newline character, corresponding to \n; there's no \ nor n anywere in it, so there's no way you can retrieve them.
As a personal taste, I find raw string literals great! You can even put real Enter in there. Often just for the price of 3 characters - R, (, and ) - in addtion to those you would write anyway. Well, you would have to write more characters to escape anything needs escaping.
Look at
std::string s{R"(Hello
world!
This
is
Me!)"};
That's 28 keystrokes from R to last " included, and you can see in a glimpse it's 6 lines.
The equivalent non-raw string
std::string s{"Hello\nworld!\nThis\nis\nMe!"};
is 30 keystrokes from R to last " included, and you have to parse it carefully to count the lines.
A pretty short string, and you already see the advantage.
To answer the question, as asked, no it is not possible.
As an example of the impossibility, assume we have a path specified as "C:\a\b";
Now, str is actually represented in memory (in your program when running) using a statically allocated array of five characters with values {'C', ':', '\007', '\010', '\000'} where '\xyz' represents an OCTAL representation (so '\010' is a char equal to numerically to 8 in decimal).
The problem is that there is more than one way to produce that array of five characters using a string literal.
char str[] = "C:\a\b";
char str1[] = "C:\007\010";
char str2[] = "C:\a\010";
char str3[] = "C:\007\b";
char str4[] = "C:\x07\x08"; // \xmn uses hex coding
In the above, str1, str2, str3, and str4 are all initialised using equivalent arrays of five char.
That means convert_to_raw("C:\a\b") could quite legitimately assume it is passed ANY of the strings above AND
std::cout << convert_to_raw("C:\a\b") << '\n';
could quite legitimately produce output of
C:\007\010
(or any one of a number of other strings).
The practical problem with this, if you are working with windows paths, is that c:\a\b, C:\007\010, C:\a\010, C:\007\b, and C:\x07\x08 are all valid filenames under windows - that (unless they are hard links or junctions) name DIFFERENT files.
In the end, if you want to have string literals in your code representing filenames or paths, then use \\ or a raw string literal when you need a single backslash. Alternatively, write your paths as string literals in your code using all forward slashes (e.g. "C:/a/b") since windows API functions accept those too.
I see there is a function in <iomanip> for quoting a single char out and in a std::string. I search something similar to that for multiple characters. My current use case is to masquerade STX / ETX pairs in/out a string.
Use for std::quote is easy like:
std::string example{ "Hallao" };
std::cout << std::quoted(example, 'a', 'x') << std::endl;
So my wish to see is something like:
std::cout << std::quoted(example, {0x02,0x03}, ...) << std::endl;
Is that already done somewhere or is this to special to be a part of STL?
Unfortunately, so far there isn't. The delim and escape chars are restricted to one character. You can't use multiple chars or string under the current interface.
This unnecessary limitation applies to a lot of components too :( (for example the isalpha family of functions)
I have a problem with a std::string comparation with codification I think. The problem is that I hate to compare a a string that is received and I dont know how kind of codification it has with a spanish string with unusal characters. I cant change s_area.m_s_area_text so I need to set s2 string with a identical value and i dont know how to do it in a generic way for other chases.
std::string s2= "Versión de sistema";
std::cout << s_area.m_s_area_text << std::endl;
for (const char* p = s2.c_str(); *p; ++p)
{
printf("%02x", *p);
}
printf("\n");
for (const char* p = s_area.m_s_area_text.c_str(); *p; ++p)
{
printf("%02x", *p);
}
printf("\n");
And the result of the execution is:
Versi├│n de sistema
5665727369fffffff36e2064652073697374656d61
5665727369ffffffc3ffffffb36e2064652073697374656d61
Obviously, as the 2 strings has not the same bytes values, all the compare method fails: strncmp, std::string ==, std:sstring.comapre etc.
Any idea of how to do that witho touching s_area.m_s_area_text string?
In general it is impossible to guess the encoding of a string by inspecting its raw bytes. The exception to this rule is when a byte order mark (BOM) is present at the start of the byte stream. The BOM will tell you which unicode encoding the bytes are and the endianness.
As an aside, if at some point in the future you decide you need a canonical string encoding (as some have pointed out in the comments that it would be a good idea). There are strong arguments in favour of UTF-8 as the best choice for C++. See UTF-8 everywhere for further information on this.
First of all, two compare two string correctly you at least need to know their encoding. In your example s_area.m_s_area_text is happened to be encoded with UTF-8 while for s2 ISO/IEC 8859-1 (Latin-1) is used.
If you are sure that s_area.m_s_area_text will always be encoded in UTF-8, you can try to make s2 use the same encoding and then just compare them. One way of defining a UTF-8 encoded string is escaping every character that is not in basic character set with \u.
std::string s2 = u8"Versi\u00F3n de sistema";
...
if (s_area.m_s_area_text == s2)
...
It should also be possible to do it without escaping the characters by setting an appropriate encoding for the source file and specifying the encoding to the compiler.
As #nwp mentioned, you may also want to normalise the strings before comparing. Otherwise, two strings that look the same may have different Unicode representation and that will cause your comparison to yield a false negative result.
For example, "Versión de sistema" will not be equal to "Versión de sistema".
Given a std::string containing text encoded in an arbitrary but known character set. What is the easiest way in C++ to count the characters? It should be able to handle things like combining characters and Unicode code points.
It would be nice to have something like:
std::string test = "éäöü";
std::cout << test.size("utf-8") << std::endl;
Unfortunately, life isn't always easy with C++. :)
For Unicode, I have seen that one can use the ICU library: Cross-platform iteration of Unicode string (counting Graphemes using ICU)
But is there a more general solution?
I'm afraid it depends on the particular encoding. If you use UTF-8 (and I really don't see why you should not), you could use UTF8-CPP.
It would appear they have a function to do just this:
::std::string test = "éäöü";
auto length = ::utf8::distance(test.begin(), test.end());
::std::cout << length << "\n"; // should print 4.