This question already has answers here:
Getting the actual length of a UTF-8 encoded std::string?
(11 answers)
Closed 4 years ago.
Consider
#include <string>
#include <iostream>
int main()
{
std::string test="αλφα";
std::cout << "size() of '" << test << "' = " << test.size() << std::endl;
}
which produces
size() of 'αλφα' = 8
How can I with the C++ standard library find the width of the output that will be produced by writing a string (i.e. 4 in the example above)?
The problem here is related to the encoding associated with the string.
This looks like UTF-8 encoding to me (the first character is not the lower case 'a'). In that encoding, the characters you present take two bytes each which accounts for the answer.
UTF-8 encoding is broadly supported by the C++11 standard (rather elegantly UTF-8 doesn't have any zero bytes in any text stream cf. Windows Unicode) - you can use std::string although the lengths will, in general, be understated - but care must be taken when creating string literals of that type directly in your editor.
More reading from here: How to use Unicode (UTF-8) in C++
Related
This question already has answers here:
How to read and write a STL C++ string?
(3 answers)
Closed 4 years ago.
I've recently learnt about std::substr() by searching on Google. I saw a code something like this:
std::string s = "This is an example string";
std::string s1 = s.substr(11, 7);
std::cout << s1 << std::endl;
Now if I try to take input using scanf() function (instead of using std::cin), the program crashes during runtime. Doesn't std::string support using scanf() function?
scanf() belongs to a family of C functions that, being part of the C language rather than C++, offers no direct support for std::string and works instead with null terminated character strings.
If you are using C++, you should generally prefer std::string over null terminated terminated character strings and the input/output library over printf()/scanf() library functions.
This question already has answers here:
How to construct a std::string with embedded values, i.e. "string interpolation"?
(8 answers)
Closed 2 years ago.
I am currently learning C++ and I cannot find how to create a string with a formatter to take multiple parameters such as using sprintf but for an unknown string length.
What I want do is something like
string myString = string.format("Hello, here is %s %s and another %s", par1, par2, par3);
I know in C there is the asprintf function where it works out how long the string will be and it malloc's the memory and you need to free it when finished, however, this doesn't seem to be available for C++, although sprintf is. Everything I've seen on google about asprintf seems to mostly focus on Linux, whereas I need cross platform.
Everything I've seen about C++ and string formatting you need to define a char array of fixed length and use sprintf, but I am not going to know the length of the string so I can't do this.
In addition to the existing excellent answer, you may consider the Boost Format library.
Example:
std::string myString = str(boost::format("Hello, here is %s %s an another %s") % par1 % par2 % par3);
Get the book The Standard C++ Library by Josuttis. It will give you the complete string interface and much, much more. You appear to be thinking C, not C++. You could of course use a C interface like sprintf() to load a
char[] and then copy that to a C++ string. That is usually a bad idea.
Two ways to do what you ask:
string myString("Hello, here is ");
myString += par1;
myString += " ";
myString += par2;
myString += " and another ";
myString += par3;
stringstream foo;
foo << "Hello, here is " << par1 << " " << par2 << " and another " << par3;
string myString(foo.str());
There are lots of answers.
As C++ strings get very long, you want to use the std::stringstream to build them. This allows you to write to a string as though it were a file in memory, and it is written to handle very large strings efficiently. The C function snprintf() returns the number of characters it would have written if passed a null pointer. So you need to call it twice, once to get the size, then allocate the buffer and call again to format. It's a good solution for strings which are expected to be quite short and with a defined format, but might get arbitrarily long, like a string containing someone's name.
Note that printf() formats are convenient and easy to use for basic output of integers, string, and reals, but they don't scale up to user-defined objects because there's no accepted way of writing a toString() method and destroying the string after the call. They also can't handle arrays of objects. There is an accepted convention that overloading << writes a text representation of an object to a stream.
It took time looking as utf8 convert string to hexadecimal string, and backwards
I found some examples and possible solutions, but all work well only without special characters.
I have a folowing :
string in="áéíóúñü"
The result shoud be:
"c3a1c3a9c3adc3b3c3bac3b1c3bc"
I try following post, and others:
C++ convert string to hexadecimal and vice versa
How to convert a string in hexadecimal string?
http://www.cplusplus.com/forum/beginner/161703/
I will try to explain better, but I can not speak English properly. Sorry.
I have to send some data using socket. For that I have to convert names to hexadecimal using UTF-8, but in some cases have specials characters for example á, é, í...
When converting normal letters get a string length of 2 per letter.
a-> "61"
e-> "65"
But special characters are encoded (on UTF-8) with length 4
á-> "c3a1" this is the correct conversion
é-> "c3a9" this is the correct conversion
I have attempted the conversion of all the ways I've found, including that suggested me down. But every time you convert a special character gives me an answer of 2 digits, that is not correct.
á-> "e1" this isnt correct
é-> "e9" this isnt correct
Loop over each "character" in the std::string object, output it's two-digit hexadecimal equivalent as an int.
For looping, I recommend you look into range-based for loops.
To set the number of digits to print, read about setting stream precision.
To print a number as hexadecimal, read about the base I/O manipulators.
To convert to an int read about static_cast.
Oh, and I recommend using an unsigned char for the single "characters".
Simple solution based on the above:
std::string stoh(std::string const& in)
{
std::ostringstream os;
for(unsigned char const& c : in)
{
os << std::hex << std::setprecision(2) << std::setw(2)
<< std::setfill('0') << static_cast<int>(c);
}
return os.str();
}
This question already has answers here:
What is the type of a string literal in C++? [duplicate]
(2 answers)
Closed 7 years ago.
When I use :
std::cout << "Hello world ";
Which type is "Hello world" ?
Where does it stored , so I can get it out and work with it ?
For some reasons, I don't want to use something like :
std::string str = "Hello world";
std::cout << str;
Please help me, I searched an hour but still no answer.
The type of a string literal is "constant array of char", with as many elements as characters in the literal, plus one for a final null character. Other versions of string literals (wide, unicode) are arrays of other character types (wchar_t, char16_t etc.) (e.g. see here).
I ran the same code which determines number of characters in a wide-character string. The tested string has ascii, numbers and Korean language.
#include <iostream>
using namespace std;
template <class T,class trait>
void DumpCharacters(T& a)
{
size_t length = a.size();
for(size_t i=0;i<length;i++)
{
trait n = a[i];
cout<<i<<" => "<<n<<endl;
}
cout<<endl;
}
int main(int argc, char* argv[])
{
wstring u = L"123abc가1나1다";
wcout<<u<<endl;
DumpCharacters<wstring,wchar_t>(u);
string s = "123abc가1나1다";
cout<<s<<endl;
DumpCharacters<string,char>(s);
return 0;
}
The obvious thing is that wstring.size() in Visual C++ 2010 returns the number of letters (11 characters), regardless if it is ascii or international character. However, it returns the byte count of string data (17 bytes) in XCode 4.2 in Mac OS X.
Please reply me how to get the character length of a wide-character string, not byte count in xcode.
--- added on 12 Feb --
I found that wcslen() also returns 17 in xcode. it returns 11 in vc++.
Here's the tested code:
const wchar_t *p = L"123abc가1나1다";
size_t plen = wcslen(p);
--- added on 18 Feb --
I found that llvm 3.0 causes the wrong length. This problem is fixed after changing compiler frontend from llvm3.0 to 4.2
wcslen() works differently in Xcode and VC++ says the details.
It is an error if the std::wstring version uses 17 characters: it should only use 11 characters. Using recent SVN heads of gcc and clang it uses 11 characters for the std::wstring and 17 characters for the std::string. I think this is what expected.
Please note that the standard C++ library internally has a different idea of what a "character" is than what might be expected when multi-word encodings (e.g. UTF-8 for words of type char and UTF-16 for words with 16 bits) are used. Here is the first paragraph of the chapter describing string (21.1 [strings.general]):
This Clause describes components for manipulating sequences of any non-array POD (3.9) type. In this Clause such types are called char-like types , and objects of char-like types are called char-like objects or simply characters.
This basically means that when using Unicode the various functions won't pay attention to what constitutes a code point but rather process the strings as a sequence of words. This is severe impacts and what will happen e.g. when producing substrings because these may easily split multi-byte characters apart. Currently, the standard C++ library doesn't have any support for processing multi-bytes encodings internally because it is assumed that the translation from an encoding to characters is done when reading data (and correspondingly the other way when writing data). If you are processing multi-byte encoded strings internally, you need be aware of this as there is no support at all.
It is recognized that this state of affairs is actually a problem. For C++2011 the character type char32_t was added which should support Unicode character still better than wchar_t (because Unicode uses 20 bits while wchar_t was allowed to only support 16 bits which is a choice made on some platforms at a time when Unicode was promising to use at most 16 bits). However, this would still not deal with combining characters. It is recognized by the C++ committee that this is a problem and that proper character processing in the standard C++ library would be something nice to have but so far nobody as come forward with a comprehensive proposal to address this problem (if you feel you want to propose something like this but you don't know how, please feel free to contact me and I will help you with how to submit a proposal).
XCode 4.2 apparently used UTF-8 (or something very similar) as narrow multibyte encoding to represent your characters string literal "123abc가1나1다" in the program's source code when initializing string s. The UTF-8 representation of that string happens to be 17 bytes long.
The wide character representation (stored in u) is 11 wide characters. There are many ways to convert from narrow to wide encoding. Try this:
#include <iostream>
#include <clocale>
#include <cstdlib>
int main()
{
std::wstring u = L"123abc가1나1다";
std::cout << "Wide string containts " << u.size() << " characters\n";
std::string s = "123abc가1나1다";
std::cout << "Narrow string contains " << s.size() << " bytes\n";
std::setlocale(LC_ALL, "");
std::cout << "Which can be converted to "
<< std::mbstowcs(NULL, s.c_str(), s.size())
<< " wide characters in the current locale,\n";
}
Use .length(), not .size() to get the string length.
std::string and std::wstring are typedefs of std::basic_string templated on char and wchar_t. The size() member function returns the number of elements in the string - the number of char's or wchar_t's. "" and L"" don't deal with encodings.