Get SHA1 of Unicode string in Crypto++

Get SHA1 of Unicode string in Crypto++ - c++

I study C++ independently and I have one problem, which I can't solve more than week. I hope you can help me.
I need to get a SHA1 digest of a Unicode string (like Привет), but I don't know how to do that.
I tried to do it like this, but it returns a wrong digest!
For wstring('Ы')
It returns - A469A61DF29A7568A6CC63318EA8741FA1CF2A7
I need - 8dbe718ab1e0c4d75f7ab50fc9a53ec4f0528373
Regards and sorry for my English :).
CryptoPP 5.6.2
MVC++ 2013
#include <iostream>
#include "cryptopp562\cryptlib.h"
#include "cryptopp562\sha.h"
#include "cryptopp562\hex.h"
int main() {
std::wstring string(L"Ы");
int bs_size = (int)string.length() * sizeof(wchar_t);
byte* bytes_string = new byte[bs_size];
int n = 0; //real bytes count
for (int i = 0; i < string.length(); i++) {
wchar_t wcharacter = string[i];
int high_byte = wcharacter & 0xFF00;
high_byte = high_byte >> 8;
int low_byte = wcharacter & 0xFF;
if (high_byte != 0) {
bytes_string[n++] = (byte)high_byte;
}
bytes_string[n++] = (byte)low_byte;
}
CryptoPP::SHA1 sha1;
std::string hash;
CryptoPP::StringSource ss(bytes_string, n, true,
new CryptoPP::HashFilter(sha1,
new CryptoPP::HexEncoder(
new CryptoPP::StringSink(hash)
)
)
);
std::cout << hash << std::endl;
return 0;
}

I need to get a SHA1 digest of a Unicode string (like Привет), but I don't know how to do that.
The trick here is you need to know how to encode the Unicode string. On Windows, a wchar_t is 2 octets; while on Linux a wchar_t is 4 otects. There's a Crypto++ wiki page on it at Character Set Considerations, but its not that good.
To interoperate most effectively, always use UTF-8. That means you convert UTF-16 or UTF-32 to UTF-8. Because you are on Windows, you will want to call WideCharToMultiByte function to convert it using CP_UTF8. If you were on Linux, then you would use libiconv.
Crypto++ has a built-in function called StringNarrow that uses C++. Its in the file misc.h. Be sure to call setlocale before using it.
Stack Overflow has a few question on using the Windows function . See, for example, How do you properly use WideCharToMultiByte.
I need - 8dbe718ab1e0c4d75f7ab50fc9a53ec4f0528373
What is the hash (SHA-1, SHA-256, ...)? Is it a HMAC (keyed hash)? Is the information salted (like a password in storage)? How is it encoded? I have to ask because I cannot reproduce your desired results:
SHA-1: 2805AE8E7E12F182135F92FB90843BB1080D3BE8
SHA-224: 891CFB544EB6F3C212190705F7229D91DB6CECD4718EA65E0FA1B112
SHA-256: DD679C0B9FD408A04148AA7D30C9DF393F67B7227F65693FFFE0ED6D0F0ADE59
SHA-384: 0D83489095F455E4EF5186F2B071AB28E0D06132ABC9050B683DA28A463697AD
1195FF77F050F20AFBD3D5101DF18C0D
SHA-512: 0F9F88EE4FA40D2135F98B839F601F227B4710F00C8BC48FDE78FF3333BD17E4
1D80AF9FE6FD68515A5F5F91E83E87DE3C33F899661066B638DB505C9CC0153D
Here's the program I used. Be sure to specify the length of the wide string. If you don't (and use -1 for the length), then WideCharToMultiByte will include the terminating ASCII-Z in its calculations. Since we are using a std::string, we don't need the function to include the ASCII-Z terminator.
int main(int argc, char* argv[])
{
wstring m1 = L"Привет"; string m2;
int req = WideCharToMultiByte(CP_UTF8, 0, m1.c_str(), (int)m1.length(), NULL, 0, NULL, NULL);
if(req < 0 || req == 0)
throw runtime_error("Failed to convert string");
m2.resize((size_t)req);
int cch = WideCharToMultiByte(CP_UTF8, 0, m1.c_str(), (int)m1.length(), &m2[0], (int)m2.length(), NULL, NULL);
if(cch < 0 || cch == 0)
throw runtime_error("Failed to convert string");
// Should not be required
m2.resize((size_t)cch);
string s1, s2, s3, s4, s5;
SHA1 sha1; SHA224 sha224; SHA256 sha256; SHA384 sha384; SHA512 sha512;
HashFilter f1(sha1, new HexEncoder(new StringSink(s1)));
HashFilter f2(sha224, new HexEncoder(new StringSink(s2)));
HashFilter f3(sha256, new HexEncoder(new StringSink(s3)));
HashFilter f4(sha384, new HexEncoder(new StringSink(s4)));
HashFilter f5(sha512, new HexEncoder(new StringSink(s5)));
ChannelSwitch cs;
cs.AddDefaultRoute(f1);
cs.AddDefaultRoute(f2);
cs.AddDefaultRoute(f3);
cs.AddDefaultRoute(f4);
cs.AddDefaultRoute(f5);
StringSource ss(m2, true /*pumpAll*/, new Redirector(cs));
cout << "SHA-1: " << s1 << endl;
cout << "SHA-224: " << s2 << endl;
cout << "SHA-256: " << s3 << endl;
cout << "SHA-384: " << s4 << endl;
cout << "SHA-512: " << s5 << endl;
return 0;
}

You say ‘but it returns wrong digest’ – what are you comparing it with?
Key point: digests such as SHA-1 don't work with sequences of characters, but with sequences of bytes.
What you're doing in this snippet of code is generating an ad-hoc encoding of the unicode characters in the string "Ы". This encoding will (as it turns out) match the UTF-16 encoding if the characters in the string are all in the BMP (‘basic multilingual plane’, which is true in this case) and if the numbers that end up in wcharacter are integers representing unicode codepoints (which is sort-of probably correct, but not, I think, guaranteed).
If the digest you're comparing it with turns an input string into an sequence of bytes using the UTF-8 encoding (which is quite likely), then that will produce a different byte sequence from yours, so that the SHA-1 digest of that sequence will be different from the digest you calculate here.
So:
Check what encoding your test string is using.
You'd be best off using some library functions to specifically generate a UTF-16 or UTF-8 (as appropriate) encoding of the string you want to process, to ensure that the byte sequence you're working with is what you think it is.
There's an excellent introduction to unicode and encodings in the aptly-named document The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

This seems to work fine for me.
Rather than fiddling about trying to extract the pieces I simply cast the wide character buffer to a const byte* and pass that (and the adjusted size) to the hash function.
int main() {
std::wstring string(L"Привет");
CryptoPP::SHA1 sha1;
std::string hash;
CryptoPP::StringSource ss(
reinterpret_cast<const byte*>(string.c_str()), // cast to const byte*
string.size() * sizeof(std::wstring::value_type), // adjust for size
true,
new CryptoPP::HashFilter(sha1,
new CryptoPP::HexEncoder(
new CryptoPP::StringSink(hash)
)
)
);
std::cout << hash << std::endl;
return 0;
}
Output:
C6F8291E68E478DD5BD1BC2EC2A7B7FC0CEE1420
EDIT: To add.
The result is going to be encoding dependant. For example I ran this on Linux where wchar_t is 4 bytes. On Windows I believe wchar_t may be only 2 bytes.
For consistency it may be better to use UTF8 a store the text in a normal std::string. This also makes calling the API simpler:
int main() {
std::string string("Привет"); // UTF-8 encoded
CryptoPP::SHA1 sha1;
std::string hash;
CryptoPP::StringSource ss(
string,
true,
new CryptoPP::HashFilter(sha1,
new CryptoPP::HexEncoder(
new CryptoPP::StringSink(hash)
)
)
);
std::cout << hash << std::endl;
return 0;
}
Output:
2805AE8E7E12F182135F92FB90843BB1080D3BE8

Related

How to convert base64 to Integer in Crypto++?

I use Crypto++ library. I have a base64 string saved as CString. I want to convert my string to Integer. actually this base64 built from an Integer and now i want to convert to Integer again.but two Integer not equal.in the other words second Integer not equal with original Integer.
Base64Decoder bd;
CT2CA s(c);
std::string strStd(s);
bd.Put((byte*)strStd.data(), strStd.size());
bd.MessageEnd();
word64 size = bd.MaxRetrievable();
vector<byte> cypherVector(size);
string decoded;
if (size && size <= SIZE_MAX)
{
decoded.resize(size);
bd.Get((byte*)decoded.data(), decoded.size());
}
Integer cipherMessage((byte*)decoded.data(), decoded.size());

string decoded;
if (size && size <= SIZE_MAX)
{
decoded.resize(size);
bd.Get((byte*)decoded.data(), decoded.size());
}
You have a string called decoded, but you never actually decode the data by running it through a Base64Decoder.
Use something like the following. I don't have a MFC project handy to test, so I'm going to assume you converted the CString to a std::string.
// Converted from Unicode CString
std::string str;
StringSource source(str, true, new Base64Decoder);
Integer value(val, source.MaxRetrievable());
std::cout << std::hex << value << std::endl;
The StringSource is a BufferedTransformation. The Integer constructor you are using is:
Integer (BufferedTransformation &bt, size_t byteCount, Signedness sign=UNSIGNED, ByteOrder order=BIG_ENDIAN_ORDER)
In between the StringSource and the Integer is the Base64Decoder. its a filter that decodes the string on the fly. So data flows from the source (StringSource) to the sink (Integer constructor).
Also see Pipelines on the Crypto++ wiki.

Here is my solution to achieve this. It uses some Qt classes but it should be simple to replace them:
#include <QByteArray>
#include <QScopedArrayPointer>
#include <crypto++/base64.h>
#include <crypto++/rsa.h>
using namespace CryptoPP;
Integer convertBase64ToCryptoPpInt(const QByteArray &base64)
{
Base64Decoder decoder;
decoder.Put(reinterpret_cast<const byte*>(base64.data()), base64.size());
decoder.MessageEnd();
const word64 size = decoder.MaxRetrievable();
QScopedArrayPointer<byte> decoded{new byte[size]};
decoder.Get(decoded.data(), size);
return {decoded.data(), size};
}
QByteArray convertCryptoPpIntToBase64(const Integer &i)
{
// Copy content of i into byte array
const unsigned iLen = i.ByteCount();
QScopedArrayPointer<byte> idata{new byte[iLen]};
i.Encode(idata.data(), iLen);
// Encode data
Base64Encoder encoder;
encoder.Put(idata.data(), iLen);
encoder.MessageEnd();
const int encodedSize = encoder.MaxRetrievable();
QScopedArrayPointer<byte> encoded{new byte[encodedSize]};
encoder.Get(encoded.data(), encodedSize);
return {reinterpret_cast<char*>(encoded.data()), encodedSize};
}
It may be much more compact using CryptoPP's pipelining but i didn't find out how to stream from and to a CryptoPP::Integer.

Coding a path in unicode c++

I had a problem with opening UTF-8 path files. Path that has a UTF-8 char (like Cyrillic or Latin). I found a way to solve that with _wfopen but the way a solved it was when I encode the UTF-8 char with UTF by hand (\Uxxxx).
Is there a function, macro or anything that when I supply the string (path) it will return the Unicode??
Something like this:
https://www.branah.com/unicode-converter
I tried with MultiByteToWideChar but it returns some Hex numbers that are not relavent.
Tried:
std::wstring s2ws(const std::string& s)
{
int len;
int slength = (int)s.length() + 1;
len = MultiByteToWideChar(CP_ACP, 0, s.c_str(), slength, 0, 0);
wchar_t* buf = new wchar_t[len];
MultiByteToWideChar(CP_ACP, 0, s.c_str(), slength, buf, len);
std::wstring r(buf);
delete[] buf;
return r;
}
std::wstring stemp = s2ws(x);
LPCWSTR result = stemp.c_str();
The result I get: 0055F7E8
Thank you in advance
Update:
I installed boost, and now I am trying to do it with boost. Can some one maybe help me out with boost.
So I have a path:
wchar_t path[100] = _T("čaćšžđ\\test.txt");
I need it converted to:
wchar_t s[100] = _T("\u010d\u0061\u0107\u0161\u017e\u0111\\test.txt");

Here's a way to convert between UTF-8 and UTF-16 on Windows, as well as showing the real values of the stored code units for both input and output:
#include <codecvt>
#include <iostream>
#include <iomanip>
#include <string>
int main() {
std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>, wchar_t> convert;
std::string s = "test";
std::cout << std::hex << std::setfill('0');
std::cout << "Input `char` data: ";
for (char c : s) {
std::cout << std::setw(2) << static_cast<unsigned>(static_cast<unsigned char>(c)) << ' ';
}
std::cout << '\n';
std::wstring ws = convert.from_bytes(s);
std::cout << "Output `wchar_t` data: ";
for (wchar_t wc : ws) {
std::cout << std::setw(4) << static_cast<unsigned>(wc) << ' ';
}
std::cout << '\n';
}
Understanding the real values of the input and output is important because otherwise you may not correctly understand the transformation that you really need. For example it looks to me like there may be some confusion as to how VC++ deals with encodings, and what \Uxxxxxxxx and \uxxxx actually do in C++ source code (e.g., they don't necessarily produce UTF-8 data).
Try using code like that shown above to see what your input data really is.
To emphasize what I've written above; there are strong indications that you may not correctly understand the processing that's being done on your input, and you need to thoroughly check it.
The above program does correctly transform the UTF-8 representation of ć (U+0107) into the single 16-bit code unit 0x0107, if you replace the test string with the following:
std::string s = "\xC4\x87"; // UTF-8 representation of U+0107
The output of the program, on Windows using Visual Studio, is then:
Input char data: c4 87
Output wchar_t data: 0107
This is in contrast to if you use test strings such as:
std::string s = "ć";
Or
std::string s = "\u0107";
Which may result in the following output:
Input char data: 3f
Output wchar_t data: 003f
The problem here is that Visual Studio does not use UTF-8 as the encoding for strings without some trickery, so your request to convert from UTF-8 probably isn't what you actually need; or you do need conversion from UTF-8, but you're testing potential conversion routines using input that differs from your real input.
So I have a path: wchar_t path[100] = _T("čaćšžđ\test.txt");
I need it converted to:
wchar_t s[100] = _T("\u010d\u0061\u0107\u0161\u017e\u0111\test.txt");
Okay, so if I understand correctly, your actual problem is that the following fails:
wchar_t path[100] = _T("čaćšžđ\\test.txt");
FILE *f = _wfopen(path, L"w");
But if you instead write the string like:
wchar_t path[100] = _T("\u010d\u0061\u0107\u0161\u017e\u0111\\test.txt");
Then the _wfopen call succeeds and opens the file you want.
First of all, this has absolutely nothing to do with UTF-8. I assume you found some workaround using a char string and converting that to wchar_t and you somehow interpreted this as involving UTF-8, or something.
What encoding are you saving the source code with? Is the string L"čaćšžđ\\test.txt" actually being saved properly? Try closing the source file and reopening it. If some characters show up replaced by ?, then part of your problem is the source file encoding. In particular this is true of the default encoding used by Windows in most of North America and Western Europe: "Western European (Windows) - Codepage 1252".
You can also check the output of the following program:
#include <iomanip>
#include <iostream>
int main() {
wchar_t path[16] = L"čaćšžđ\\test.txt";
std::cout << std::hex << std::setfill('0');
for (wchar_t wc : path) {
std::cout << std::setw(4) << static_cast<unsigned>(wc) << ' ';
}
std::cout << '\n';
wchar_t s[16] = L"\u010d\u0061\u0107\u0161\u017e\u0111\\test.txt";
for (wchar_t wc : s) {
std::cout << std::setw(4) << static_cast<unsigned>(wc) << ' ';
}
std::cout << '\n';
}
Another thing you need to understand is that the \uxxxx form of writing characters, called Universal Character Names or UCNs, is not a form that you can convert strings to and from in C++. By the time you've compiled the program and it's running, i.e. by the time any code you write could be attempting to produce strings containing \uxxxx, the time when UCNs are interpreted by the compiler as different characters is long past. The only UCNs that will work are ones that are written directly in the source file.
Also, you're using _T() incorrectly. IMO You shouldn't be using TCHAR and the related macros at all, but if you do use it then you ought to use it consistently: don't mix TCHAR APIs with explicit use of the *W APIs or wchar_t. The whole point of TCHAR is to allow code to be independent and switch between those wchar_t and Microsoft's "ANSI" APIs, so using TCHAR and then hard coding an assumption that TCHAR is wchar_t defeats the entire purpose.
You should just write:
wchar_t path[100] = L"čaćšžđ\\test.txt";

Your code is Windows-specific, and you're using Visual C++. So, just use wide literals. Visual C++ supports wide strings for file stream constructors.
It's as simple as that &dash; when you don't require portability.
#include <fstream>
#include <iostream>
#include <stdlib.h>
using namespace std;
auto main() -> int
{
wchar_t const path[] = L"cacšžd/test.txt";
ifstream f( path );
int ch;
while( (ch = f.get()) != EOF )
{
cout.put( ch );
}
}
Note, however, that this code is Visual C++ specific. That's reasonable for Windows-specific code. Possibly with C++17 we will have Boost file system library adopted into the standard library, and then for conformance g++ will ideally offer the constructor used here.

The problem was that I was saving the CPP file as ANSI... I had to convert it to UTF-8. I tried this before posting but VS 2015 turns it into ANSI, I had to change it in VS so I could get it working.
I tried opening the cpp file with notepad++ and changing the encoding but when I turn on VS it automatically returns. So I was looking to Save As option but there is no encoding option. Finally i found it, in Visual Studio 2015
File -> Advanced Save Options in the Encoding dropdown change it to Unicode
One thing that is still strange to me, how did VS display the characters normally but when I opened the file in N++ there was ? (like it was supposed to be, because of ANSI)?

Gzip compress/uncompress a long char array

I need to compress a large byte array, im already using the Crypto++ library in the application, so having the compression/decompression part in the same library would be great.
this little test works as expected:
///
string test = "bleachbleachtestingbiatchbleach123123bleachbleachtestingb.....more";
string compress(string input)
{
string result ("");
CryptoPP::StringSource(input, true, new CryptoPP::Gzip(new CryptoPP::StringSink(result), 1));
return result;
}
string decompress(string _input)
{
string _result ("");
CryptoPP::StringSource(_input, true, new CryptoPP::Gunzip(new CryptoPP::StringSink(_result), 1));
return _result;
}
void main()
{
string compressed = compress(test);
string decompressed = decompress(compressed);
cout << "orginal size :" << test.length() << endl;
cout << "compressed size :" << compressed.length() << endl;
cout << "decompressed size :" << decompressed.length() << endl;
system("PAUSE");
}
I need to compress something like this:
unsigned char long_array[194506]
{
0x00,0x00,0x02,0x00,0x00,0x04,0x00,0x00,0x00,
0x01,0x00,0x02,0x00,0x00,0x04,0x02,0x00,0x04,
0x04,0x00,0x02,0x00,0x01,0x04,0x02,0x00,0x04,
0x01,0x00,0x02,0x02,0x00,0x04,0x02,0x00,0x00,
0x03,0x00,0x02,0x00,0x00,0x04,0x01,0x00,0x04,
....
};
i tried to use the long_array as const char * and as byte then feed it to the compress function, it seems to be compressed but the decompressed one has a size of 4, and its clearly uncomplete. maybe its too long.
How could i rewrite those compress/uncompress functions to work with that byte array?
Thank you all. :)

i tried to use the array as const char * and as byte then feed it to the compress function, it seems to be compressed but the decompressed one has a size of 4, and its clearly uncomplete.
Use the alternate StringSource constructor that takes a pointer and a length. It will be immune to embedded NULL's.
CryptoPP::StringSource ss(long_array, sizeof(long_array), true,
new CryptoPP::Gzip(
new CryptoPP::StringSink(result), 1)
));
Or, you can use:
Gzip zipper(new StringSink(result), 1);
zipper.Put(long_array, sizeof(long_array));
zipper.MessageEnd();
Crypto++ added an ArraySource at 5.6. You can use it too (but its really a typedef for a StringSource):
CryptoPP::ArraySource as(long_array, sizeof(long_array), true,
new CryptoPP::Gzip(
new CryptoPP::StringSink(result), 1)
));
The 1 that is used as an argument to Gzip is a deflate level. 1 is one of the lowest compressions. You might consider using 9 or Gzip::MAX_DEFLATE_LEVEL (which is 9). The default log2 windows size is the max size, so there's no need to turn any knobs on it.
Gzip zipper(new StringSink(result), Gzip::MAX_DEFLATE_LEVEL);
You should also name your declarations. I've seen GCC generate bad code when using anonymous declarations.
Finally, use long_array (or similar) because array is a keyword in C++ 11.

Compare std::wstring and std::string

How can I compare a wstring, such as L"Hello", to a string? If I need to have the same type, how can I convert them into the same type?

Since you asked, here's my standard conversion functions from string to wide string, implemented using C++ std::string and std::wstring classes.
First off, make sure to start your program with set_locale:
#include <clocale>
int main()
{
std::setlocale(LC_CTYPE, ""); // before any string operations
}
Now for the functions. First off, getting a wide string from a narrow string:
#include <string>
#include <vector>
#include <cassert>
#include <cstdlib>
#include <cwchar>
#include <cerrno>
// Dummy overload
std::wstring get_wstring(const std::wstring & s)
{
return s;
}
// Real worker
std::wstring get_wstring(const std::string & s)
{
const char * cs = s.c_str();
const size_t wn = std::mbsrtowcs(NULL, &cs, 0, NULL);
if (wn == size_t(-1))
{
std::cout << "Error in mbsrtowcs(): " << errno << std::endl;
return L"";
}
std::vector<wchar_t> buf(wn + 1);
const size_t wn_again = std::mbsrtowcs(buf.data(), &cs, wn + 1, NULL);
if (wn_again == size_t(-1))
{
std::cout << "Error in mbsrtowcs(): " << errno << std::endl;
return L"";
}
assert(cs == NULL); // successful conversion
return std::wstring(buf.data(), wn);
}
And going back, making a narrow string from a wide string. I call the narrow string "locale string", because it is in a platform-dependent encoding depending on the current locale:
// Dummy
std::string get_locale_string(const std::string & s)
{
return s;
}
// Real worker
std::string get_locale_string(const std::wstring & s)
{
const wchar_t * cs = s.c_str();
const size_t wn = std::wcsrtombs(NULL, &cs, 0, NULL);
if (wn == size_t(-1))
{
std::cout << "Error in wcsrtombs(): " << errno << std::endl;
return "";
}
std::vector<char> buf(wn + 1);
const size_t wn_again = std::wcsrtombs(buf.data(), &cs, wn + 1, NULL);
if (wn_again == size_t(-1))
{
std::cout << "Error in wcsrtombs(): " << errno << std::endl;
return "";
}
assert(cs == NULL); // successful conversion
return std::string(buf.data(), wn);
}
Some notes:
If you don't have std::vector::data(), you can say &buf[0] instead.
I've found that the r-style conversion functions mbsrtowcs and wcsrtombs don't work properly on Windows. There, you can use the mbstowcs and wcstombs instead: mbstowcs(buf.data(), cs, wn + 1);, wcstombs(buf.data(), cs, wn + 1);
In response to your question, if you want to compare two strings, you can convert both of them to wide string and then compare those. If you are reading a file from disk which has a known encoding, you should use iconv() to convert the file from your known encoding to WCHAR and then compare with the wide string.
Beware, though, that complex Unicode text may have multiple different representations as code point sequences which you may want to consider equal. If that is a possibility, you need to use a higher-level Unicode processing library (such as ICU) and normalize your strings to some common, comparable form.

You should convert the char string to a wchar_t string using mbstowcs, and then compare the resulting strings. Notice that mbstowcs works on char */wchar *, so you'll probably need to do something like this:
std::wstring StringToWstring(const std::string & source)
{
std::wstring target(source.size()+1, L' ');
std::size_t newLength=std::mbstowcs(&target[0], source.c_str(), target.size());
target.resize(newLength);
return target;
}
I'm not entirely sure that that usage of &target[0] is entirely standard-conforming, if someone has a good answer to that please tell me in the comments. Also, there's an implicit assumption that the converted string won't be longer (in number of wchar_ts) than the number of chars of the original string - a logical assumption that still I'm not sure it's covered by the standard.
On the other hand, it seems that there's no way to ask to mbstowcs the size of the needed buffer, so either you go this way, or go with (better done and better defined) code from Unicode libraries (be it Windows APIs or libraries like iconv).
Still, keep in mind that comparing Unicode strings without using special functions is slippery ground, two equivalent strings may be evaluated different when compared bitwise.
Long story short: this should work, and I think it's the maximum you can do with just the standard library, but it's a lot implementation-dependent in how Unicode is handled, and I wouldn't trust it a lot. In general, it's just better to stick with an encoding inside your application and avoid this kind of conversions unless absolutely necessary, and, if you are working with definite encodings, use APIs that are less implementation-dependent.

Think twice before doing this — you might not want to compare them in the first place. If you are sure you do and you are using Windows, then convert string to wstring with MultiByteToWideChar, then compare with CompareStringEx.
If you are not using Windows, then the analogous functions are mbstowcs and wcscmp. The standard wide character C++ functions are often not portable under Windows; for instance mbstowcs is deprecated.
The cross-platform way to work with Unicode is to use the ICU library.
Take care to use special functions for Unicode string comparison, don't do it manually. Two Unicode strings could have different characters, yet still be the same.
wstring ConvertToUnicode(const string & str)
{
UINT codePage = CP_ACP;
DWORD flags = 0;
int resultSize = MultiByteToWideChar
( codePage // CodePage
, flags // dwFlags
, str.c_str() // lpMultiByteStr
, str.length() // cbMultiByte
, NULL // lpWideCharStr
, 0 // cchWideChar
);
vector<wchar_t> result(resultSize + 1);
MultiByteToWideChar
( codePage // CodePage
, flags // dwFlags
, str.c_str() // lpMultiByteStr
, str.length() // cbMultiByte
, &result[0] // lpWideCharStr
, resultSize // cchWideChar
);
return &result[0];
}

Why is the following C++ code printing only the first character?

I am trying to convert a char string to a wchar string.
In more detail: I am trying to convert a char[] to a wchar[] first and then append " 1" to that string and the print it.
char src[256] = "c:\\user";
wchar_t temp_src[256];
mbtowc(temp_src, src, 256);
wchar_t path[256];
StringCbPrintf(path, 256, _T("%s 1"), temp_src);
wcout << path;
But it prints just c
Is this the right way to convert from char to wchar? I have come to know of another way since. But I'd like to know why the above code works the way it does?

mbtowc converts only a single character. Did you mean to use mbstowcs?
Typically you call this function twice; the first to obtain the required buffer size, and the second to actually convert it:
#include <cstdlib> // for mbstowcs
const char* mbs = "c:\\user";
size_t requiredSize = ::mbstowcs(NULL, mbs, 0);
wchar_t* wcs = new wchar_t[requiredSize + 1];
if(::mbstowcs(wcs, mbs, requiredSize + 1) != (size_t)(-1))
{
// Do what's needed with the wcs string
}
delete[] wcs;
If you rather use mbstowcs_s (because of deprecation warnings), then do this:
#include <cstdlib> // also for mbstowcs_s
const char* mbs = "c:\\user";
size_t requiredSize = 0;
::mbstowcs_s(&requiredSize, NULL, 0, mbs, 0);
wchar_t* wcs = new wchar_t[requiredSize + 1];
::mbstowcs_s(&requiredSize, wcs, requiredSize + 1, mbs, requiredSize);
if(requiredSize != 0)
{
// Do what's needed with the wcs string
}
delete[] wcs;
Make sure you take care of locale issues via setlocale() or using the versions of mbstowcs() (such as mbstowcs_l() or mbstowcs_s_l()) that takes a locale argument.

why are you using C code, and why not write it in a more portable way, for example what I would do here is use the STL!
std::string src = std::string("C:\\user") +
std::string(" 1");
std::wstring dne = std::wstring(src.begin(), src.end());
wcout << dne;
it's so simple it's easy :D

L"Hello World"
the prefix L in front of the string makes it a wide char string.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Get SHA1 of Unicode string in Crypto++ - c++

Related

How to convert base64 to Integer in Crypto++?

Coding a path in unicode c++

Gzip compress/uncompress a long char array

Compare std::wstring and std::string

Why is the following C++ code printing only the first character?

Categories

Resources