Convert String Encoding in C Language

Convert String Encoding in C Language - c++

Basically I need to convert UTF-8 string to windows-1256 and I do it using following code:
#include <windows.h>
#include <stdio.h>
char* convert(char* pszStringToConvert)
{
int len = strlen(pszStringToConvert); // get string length
wchar_t* pwsz = new wchar_t[len+1]; // allocate storage for temporary UNICODE string
MultiByteToWideChar(65001,0, pszStringToConvert, len, pwsz, len);
WideCharToMultiByte(1256,0, pwsz, len, pszStringToConvert, len, NULL, FALSE);
return pszStringToConvert;
}
int main()
{
char* arabic ="السلام";
char* win = convert(arabic);
printf("%s\n%",win);
return 0;
}
My source string is
arabic ="السلام"
But unfortunately my result string becomes
win = Ø§Ù„Ø³Ù„Ø§Ù…
What I'm doing wrong here?

Related

Converting SQLWCHAR * to char*

Working for the first time in the area of unicode and widechars,
I am trying to convert WCHAR* to char*.
(WCHAR typedefed to SQLWCHAR, and eventually to unsigned short)
And I need to support all platforms(windows, mac, linux).
What I have as input is a WCHAR* and its length.
I figured, if I convert the input to wstring, I will have a chance to then strcpy/ strdup it to the output variable.
But looks like I am not constructing my wstring correctly because wprintf doesn't print its value.
Any hints what I am missing?
#include <iostream>
#include <codecvt>
#include <locale>
SQLRETURN SQL_API SQLExecDirectW(SQLHSTMT phstmt, SQLWCHAR* pwCmd, SQLINTEGER len)
{
char *output;
wchar_to_utf8(pwCmd, len, output);
// further processing
SQLRETURN rc;
return rc;
}
int wchar_to_utf8(WCHAR *wStr, int size, char *output)
{
std::wstring ws((const wchar_t*)wStr, size - 1);
wprintf(L"wstring:%s\n", ws);
std::wstring_convert<std::codecvt_utf8<wchar_t>> conv;
std::string t = conv.to_bytes(ws);
/*allocate output or use strdup*/
strncpy(output, t.c_str(), size); // todo:take care of the last null char
return strlen(output);
}

"%s" in a wprintf format string requires a wchar_t* argument, so
wprintf(L"wstring%s\n", ws);
should be
wprintf(L"wstring%s\n", ws.c_str());

To convert wchar* to char*
const std::wstring wStr = /*your WCHAR* */;
const char* str = std::filesystem::path(wStr).string().c_str();

How do I convert a string to a wstring using the value of the string?

I'm new to C++ and I have this issue. I have a string called DATA_DIR that I need for format into a wstring.
string str = DATA_DIR;
std::wstring temp(L"%s",str);
Visual Studio tells me that there is no instance of constructor that matches with the argument list. Clearly, I'm doing something wrong.
I found this example online
std::wstring someText( L"hello world!" );
which apparently works (no compile errors). My question is, how do I get the string value stored in DATA_DIR into the wstring constructor as opposed to something arbitrary like "hello world"?

Here is an implementation using wcstombs (Updated):
#include <iostream>
#include <cstdlib>
#include <string>
std::string wstring_from_bytes(std::wstring const& wstr)
{
std::size_t size = sizeof(wstr.c_str());
char *str = new char[size];
std::string temp;
std::wcstombs(str, wstr.c_str(), size);
temp = str;
delete[] str;
return temp;
}
int main()
{
std::wstring wstr = L"abcd";
std::string str = wstring_from_bytes(wstr);
}
Here is a demo.

This is in reference to the most up-voted answer but I don't have enough "reputation" to just comment directly on the answer.
The name of the function in the solution "wstring_from_bytes" implies it is doing what the original poster wants, which is to get a wstring given a string, but the function is actually doing the opposite of what the original poster asked for and would more accurately be named "bytes_from_wstring".
To convert from string to wstring, the wstring_from_bytes function should use mbstowcs not wcstombs
#define _CRT_SECURE_NO_WARNINGS
#include <iostream>
#include <cstdlib>
#include <string>
std::wstring wstring_from_bytes(std::string const& str)
{
size_t requiredSize = 0;
std::wstring answer;
wchar_t *pWTempString = NULL;
/*
* Call the conversion function without the output buffer to get the required size
* - Add one to leave room for the NULL terminator
*/
requiredSize = mbstowcs(NULL, str.c_str(), 0) + 1;
/* Allocate the output string (Add one to leave room for the NULL terminator) */
pWTempString = (wchar_t *)malloc( requiredSize * sizeof( wchar_t ));
if (pWTempString == NULL)
{
printf("Memory allocation failure.\n");
}
else
{
// Call the conversion function with the output buffer
size_t size = mbstowcs( pWTempString, str.c_str(), requiredSize);
if (size == (size_t) (-1))
{
printf("Couldn't convert string\n");
}
else
{
answer = pWTempString;
}
}
if (pWTempString != NULL)
{
delete[] pWTempString;
}
return answer;
}
int main()
{
std::string str = "abcd";
std::wstring wstr = wstring_from_bytes(str);
}
Regardless, this is much more easily done in newer versions of the standard library (C++ 11 and newer)
#include <locale>
#include <codecvt>
#include <string>
std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> converter;
std::wstring wide = converter.from_bytes(narrow_utf8_source_string);

printf-style format specifiers are not part of the C++ library and cannot be used to construct a string.
If the string may only contain single-byte characters, then the range constructor is sufficient.
std::string narrower( "hello" );
std::wstring wider( narrower.begin(), narrower.end() );
The problem is that we usually use wstring when wide characters are applicable (hence the w), which are represented in std::string by multibyte sequences. Doing this will cause each byte of a multibyte sequence to translate to an sequence of incorrect wide characters.
Moreover, to convert a multibyte sequence requires knowing its encoding. This information is not encapsulated by std::string nor std::wstring. C++11 allows you to specify an encoding and translate using std::wstring_convert, but I'm not sure how widely supported it is of yet. See 0x....'s excellent answer.

The converter mentioned for C++11 and above has deprecated this specific conversion in C++17, and suggests using the MultiByteToWideChar function.
The compiler error (c4996) mentions defining _SILENCE_CXX17_CODECVT_HEADER_DEPRECATION_WARNING.

wstring temp = L"";
for (auto c : DATA_DIR)
temp.push_back(c);

I found this function. Could not find any predefined method to do this.
std::wstring s2ws(const std::string& s)
{
int len;
int slength = (int)s.length() + 1;
len = MultiByteToWideChar(CP_ACP, 0, s.c_str(), slength, 0, 0);
wchar_t* buf = new wchar_t[len];
MultiByteToWideChar(CP_ACP, 0, s.c_str(), slength, buf, len);
std::wstring r(buf);
delete[] buf;
return r;
}
std::wstring stemp = s2ws(myString);

How to convert from UTF-8 to ANSI using standard c++

I have some strings read from the database, stored in a char* and in UTF-8 format (you know, "á" is encoded as 0xC3 0xA1). But, in order to write them to a file, I first need to convert them to ANSI (can't make the file in UTF-8 format... it's only read as ANSI), so that my "á" doesn't become "Ã¡". Yes, I know some data will be lost (chinese characters, and in general anything not in the ANSI code page) but that's exactly what I need.
But the thing is, I need the code to compile in various platforms, so it has to be standard C++ (i.e. no Winapi, only stdlib, stl, crt or any custom library with available source).
Anyone has any suggestions?

A few days ago, somebody answered that if I had a C++11 compiler, I could try this:
#include <string>
#include <codecvt>
#include <locale>
string utf8_to_string(const char *utf8str, const locale& loc)
{
// UTF-8 to wstring
wstring_convert<codecvt_utf8<wchar_t>> wconv;
wstring wstr = wconv.from_bytes(utf8str);
// wstring to string
vector<char> buf(wstr.size());
use_facet<ctype<wchar_t>>(loc).narrow(wstr.data(), wstr.data() + wstr.size(), '?', buf.data());
return string(buf.data(), buf.size());
}
int main(int argc, char* argv[])
{
string ansi;
char utf8txt[] = {0xc3, 0xa1, 0};
// I guess you want to use Windows-1252 encoding...
ansi = utf8_to_string(utf8txt, locale(".1252"));
// Now do something with the string
return 0;
}
Don't know what happened to the response, apparently someone deleted it. But, turns out that it is the perfect solution. To whoever posted, thanks a lot, and you deserve the AC and upvote!!

If you mean ASCII, just discard any byte that has bit 7 set, this will remove all multibyte sequences. Note that you could create more advanced algorithms, like removing the accent from the "á", but that would require much more work.

This should work:
#include <string>
#include <codecvt>
using namespace std::string_literals;
std::string to_utf8(const std::string& str, const std::locale& loc = std::locale{}) {
using wcvt = std::wstring_convert<std::codecvt_utf8<char32_t>, char32_t>;
std::u32string wstr(str.size(), U'\0');
std::use_facet<std::ctype<char32_t>>(loc).widen(str.data(), str.data() + str.size(), &wstr[0]);
return wcvt{}.to_bytes(wstr.data(),wstr.data() + wstr.size());
}
std::string from_utf8(const std::string& str, const std::locale& loc = std::locale{}) {
using wcvt = std::wstring_convert<std::codecvt_utf8<char32_t>, char32_t>;
auto wstr = wcvt{}.from_bytes(str);
std::string result(wstr.size(), '0');
std::use_facet<std::ctype<char32_t>>(loc).narrow(wstr.data(), wstr.data() + wstr.size(), '?', &result[0]);
return result;
}
int main() {
auto s0 = u8"Blöde C++ Scheiße äöü!!1Elf"s;
auto s1 = from_utf8(s0);
auto s2 = to_utf8(s1);
return 0;
}
For VC++:
#include <string>
#include <codecvt>
using namespace std::string_literals;
std::string to_utf8(const std::string& str, const std::locale& loc = std::locale{}) {
using wcvt = std::wstring_convert<std::codecvt_utf8<int32_t>, int32_t>;
std::u32string wstr(str.size(), U'\0');
std::use_facet<std::ctype<char32_t>>(loc).widen(str.data(), str.data() + str.size(), &wstr[0]);
return wcvt{}.to_bytes(
reinterpret_cast<const int32_t*>(wstr.data()),
reinterpret_cast<const int32_t*>(wstr.data() + wstr.size())
);
}
std::string from_utf8(const std::string& str, const std::locale& loc = std::locale{}) {
using wcvt = std::wstring_convert<std::codecvt_utf8<int32_t>, int32_t>;
auto wstr = wcvt{}.from_bytes(str);
std::string result(wstr.size(), '0');
std::use_facet<std::ctype<char32_t>>(loc).narrow(
reinterpret_cast<const char32_t*>(wstr.data()),
reinterpret_cast<const char32_t*>(wstr.data() + wstr.size()),
'?', &result[0]);
return result;
}
int main() {
auto s0 = u8"Blöde C++ Scheiße äöü!!1Elf"s;
auto s1 = from_utf8(s0);
auto s2 = to_utf8(s1);
return 0;
}

#include <stdio.h>
#include <string>
#include <codecvt>
#include <locale>
#include <vector>
using namespace std;
std::string utf8_to_string(const char *utf8str, const locale& loc){
// UTF-8 to wstring
wstring_convert<codecvt_utf8<wchar_t>> wconv;
wstring wstr = wconv.from_bytes(utf8str);
// wstring to string
vector<char> buf(wstr.size());
use_facet<ctype<wchar_t>>(loc).narrow(wstr.data(), wstr.data() + wstr.size(), '?', buf.data());
return string(buf.data(), buf.size());
}
int main(int argc, char* argv[]){
std::string ansi;
char utf8txt[] = {0xc3, 0xa1, 0};
// I guess you want to use Windows-1252 encoding...
ansi = utf8_to_string(utf8txt, locale(".1252"));
// Now do something with the string
return 0;
}

Convert WCHAR[260] to std::string

I have gotten a WCHAR[MAX_PATH] from (PROCESSENTRY32) pe32.szExeFile on Windows. The following do not work:
std::string s;
s = pe32.szExeFile; // compile error. cast (const char*) doesnt work either
and
std::string s;
char DefChar = ' ';
WideCharToMultiByte(CP_ACP,0,pe32.szExeFile,-1, ch,260,&DefChar, NULL);
s = pe32.szExeFile;

For your first example you can just do:
std::wstring s(pe32.szExeFile);
and for second:
char DefChar = ' ';
WideCharToMultiByte(CP_ACP,0,pe32.szExeFile,-1, ch,260,&DefChar, NULL);
std::wstring s(pe32.szExeFile);
as std::wstring has a char* ctor

Your call to WideCharToMultiByte looks correct, provided ch is a
sufficiently large buffer. After than, however, you want to assign the
buffer (ch) to the string (or use it to construct a string), not
pe32.szExeFile.

There are convenient conversion classes from ATL; you may want to use some of them, e.g.:
std::string s( CW2A(pe32.szExeFile) );
Note however that a conversion from Unicode UTF-16 to ANSI can be lossy. If you wan't a non-lossy conversion, you could convert from UTF-16 to UTF-8, and store UTF-8 inside std::string.
If you don't want to use ATL, there are some convenient freely available C++ wrappers around raw Win32 WideCharToMultiByte to convert from UTF-16 to UTF-8 using STL strings.

#ifndef __STRINGCAST_H__
#define __STRINGCAST_H__
#include <vector>
#include <string>
#include <cstring>
#include <cwchar>
#include <cassert>
template<typename Td>
Td string_cast(const wchar_t* pSource, unsigned int codePage = CP_ACP);
#endif // __STRINGCAST_H__
template<>
std::string string_cast( const wchar_t* pSource, unsigned int codePage )
{
assert(pSource != 0);
size_t sourceLength = std::wcslen(pSource);
if(sourceLength > 0)
{
int length = ::WideCharToMultiByte(codePage, 0, pSource, sourceLength, NULL, 0, NULL, NULL);
if(length == 0)
return std::string();
std::vector<char> buffer( length );
::WideCharToMultiByte(codePage, 0, pSource, sourceLength, &buffer[0], length, NULL, NULL);
return std::string(buffer.begin(), buffer.end());
}
else
return std::string();
}
and use this template as followed
PWSTR CurWorkDir;
std::string CurWorkLogFile;
CurWorkDir = new WCHAR[length];
CurWorkLogFile = string_cast<std::string>(CurWorkDir);
....
delete [] CurWorkDir;

how to convert char array to wchar_t array?

char cmd[40];
driver = FuncGetDrive(driver);
sprintf_s(cmd, "%c:\\test.exe", driver);
I cannot use cmd in
sei.lpFile = cmad;
so,
how to convert char array to wchar_t array ?

Just use this:
static wchar_t* charToWChar(const char* text)
{
const size_t size = strlen(text) + 1;
wchar_t* wText = new wchar_t[size];
mbstowcs(wText, text, size);
return wText;
}
Don't forget to call delete [] wCharPtr on the return result when you're done, otherwise this is a memory leak waiting to happen if you keep calling this without clean-up. Or use a smart pointer like the below commenter suggests.
Or use standard strings, like as follows:
#include <cstdlib>
#include <cstring>
#include <string>
static std::wstring charToWString(const char* text)
{
const size_t size = std::strlen(text);
std::wstring wstr;
if (size > 0) {
wstr.resize(size);
std::mbstowcs(&wstr[0], text, size);
}
return wstr;
}

From MSDN:
#include <iostream>
#include <stdlib.h>
#include <string>
using namespace std;
using namespace System;
int main()
{
char *orig = "Hello, World!";
cout << orig << " (char *)" << endl;
// Convert to a wchar_t*
size_t origsize = strlen(orig) + 1;
const size_t newsize = 100;
size_t convertedChars = 0;
wchar_t wcstring[newsize];
mbstowcs_s(&convertedChars, wcstring, origsize, orig, _TRUNCATE);
wcscat_s(wcstring, L" (wchar_t *)");
wcout << wcstring << endl;
}

From your example using swprintf_s would work
wchar_t wcmd[40];
driver = FuncGetDrive(driver);
swprintf_s(wcmd, "%C:\\test.exe", driver);
Note the C in %C has to be written with uppercase since driver is a normal char and not a wchar_t.
Passing your string to swprintf_s(wcmd,"%S",cmd) should also work

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Convert String Encoding in C Language - c++

Related

Converting SQLWCHAR * to char*

How do I convert a string to a wstring using the value of the string?

How to convert from UTF-8 to ANSI using standard c++

Convert WCHAR[260] to std::string

how to convert char array to wchar_t array?

Categories

Resources