why UChar* is not working with this ICU conversion?

why UChar* is not working with this ICU conversion? - c++

When converting from UTF-8 to ISO-8859-6 this code didn't work:
UnicodeString ustr = UnicodeString::fromUTF8(StringPiece(input));
const UChar* source = ustr.getBuffer();
char target[1000];
UErrorCode status = U_ZERO_ERROR;
UConverter *conv;
int32_t len;
// set up the converter
conv = ucnv_open("iso-8859-6", &status);
assert(U_SUCCESS(status));
// convert
len = ucnv_fromUChars(conv, target, 100, source, -1, &status);
assert(U_SUCCESS(status));
// close the converter
ucnv_close(conv);
string s(target);
return s;
images: (1,2)
However when replacing UChar* with a hard-coded UChar[] it works well!!
image : (3)

It looks like you're taking the difficult approach. How about this:
static char const* const cp = "iso-8859-6";
UnicodeString ustr = UnicodeString::fromUTF8(StringPiece(input));
std::vector<char> buf(ustr.length() + 1);
std::vector<char>::size_type len = ustr.extract(0, ustr.length(), &buf[0], buf.size(), cp);
if (len >= buf.size())
{
buf.resize(len + 1);
len = ustr.extract(0, ustr.length(), &buf[0], buf.size(), cp);
}
std::string ret;
if (len)
ret.assign(buf.begin(), buf.begin() + len));
return ret;

Related

On windows “retsize <= sizeInWords” in mbstowcs

I am getting the error “retsize <= sizeInWords” in mbstowcs. Can someone please guide me as to where am I making mistake.
const char *pChar = "[{\]}] ";
int len = strlen(pChar);
wchar_t *str = (wchar_t*)calloc(len+1, sizeof(wchar_t)); // L"";
size_t cSize = len + 1;
mbstowcs_s(&cSize, str, cSize+1, pChar, cSize);

Convert between wstring and string , got different results with "same" way

I use a function s2ws() (search from the SO,if you find something wrong please let me know)convert from string to wstring,then I use tinyxml2 to read something from xml.As we all know ,some of tinyxml2 interface use char * as input so does the return value.
The reason why convert from string to wstring is the project all using wchar_t types to deal with string.
/*
string converts to wstring
*/
std::wstring s2ws(const std::string& src)
{
std::wstring res = L"";
size_t const wcs_len = mbstowcs(NULL, src.c_str(), 0);
std::vector<wchar_t> buffer(wcs_len + 1);
mbstowcs(&buffer[0], src.c_str(), src.size());
res.assign(buffer.begin(), buffer.end() - 1);
return res;
}
/*
wstring converts to string
*/
std::string ws2s(const std::wstring & src)
{
setlocale(LC_CTYPE, "");
std::string res = "";
size_t const mbs_len = wcstombs(NULL, src.c_str(), 0);
std::vector<char> buffer(mbs_len + 1);
wcstombs(&buffer[0], src.c_str(), buffer.size());
res.assign(buffer.begin(), buffer.end() - 1);
return res;
}
The ClassES-Attribute will return char *,funciton s2ws will convert string to wstring. These two ways got different result in map m_UpdateClassification. The second method is between #if 0 and #endif. But I thinks these two ways should make no difference.
The second method will got empty string after convert,can not figure out why,If you have any clue,please let me know.
typedef std::map<std::wstring, std::wstring> CMapString;
CMapString m_UpdateClassification;
const wchar_t * First = NULL;
const wchar_t * Second = NULL;
const char *name = ClassES->Attribute( "name" );
const char *value = ClassES->Attribute( "value" );
std::wstring wname = s2ws(name);
std::wcout<< wname << std::endl;
First = wname.c_str();
std::wstring wvalue = s2ws(value);
std::wcout<< wvalue << std::endl;
Second = wvalue.c_str();
#if 0
First = s2ws(ClassES->Attribute( "name" )).c_str();
if( !First ) { m_ProdectFamily.clear(); return FALSE; }
Second = s2ws(ClassES->Attribute( "value" )).c_str();
if( !Second ) { m_ProdectFamily.clear(); return FALSE; }
#endif
m_UpdateClassification[Second] = First;

I think I found the reason,I assgin wchar_t * to wstring,After modfiy code like this,everything run well.
std::wstring First = L"";
std::wstring Second = L"";
First = s2ws(ClassES->Attribute("name"));
if( First.empty() ) { m_ProdectFamily.clear(); return FALSE; }
Second = s2ws(ClassES->Attribute("value"));
if( Second.empty() ) { m_ProdectFamily.clear(); return FALSE; }
Another question,Should I check the result of s2ws(mbstowcs) ws2s(wcstombs)?

serialization of float vector in a struct

I have the following code:
struct MsgDetectedTarget
{
int target_id;
float bbox[4]; // need to change
};
in a serialization function :
void SerializeToArray(std::vector<char>& buffer, int& dst_len, void* pMsg, int len){
buffer.resize(HEADER_LENGTH + len);
// encode message header
char header[HEADER_LENGTH + 1] = "";
std::sprintf(header, "%8d", len);
std::memcpy(&buffer[0], header, HEADER_LENGTH);
// encode message body
std::memcpy(&buffer[0]+HEADER_LENGTH, reinterpret_cast<char*>(pMsg), len);
dst_len = HEADER_LENGTH + len;
}
if the data bbox in MsgDetectedTarget is of fixed size, it is easy to do the serialization.
MsgDetectedTarget msg;
msg.target_id = 1;
msg.bbox[0] = 0;
msg.bbox[1] = 0;
msg.bbox[2] = 500;
msg.bbox[3] = 500;
std::vector<char> msgdata;
int destlen;
SerializeToArray(msgdata, destlen, &msg, sizeof(msg));
Problem:
I want to change the bbox in MsgDetectedTarget to be a vector of float, How can I perform corresponding serialization and deserialization?
Thanks very much.

If you cannot use Boost serialization, you can do what #Jean-FrancoisFabre suggested. Something like:
void SerializeToArray(std::vector<char>& buffer, int& dst_len, MsgDetectedTarget* msg) {
size_t numFloats = msg->bbox.size();
auto len = sizeof(int) + sizeof(size_t) + sizeof(float)*numFloats;
buffer.resize(HEADER_LENGTH + len);
// encode message header
char header[HEADER_LENGTH + 1] = "";
std::sprintf(header, "%8d", len);
std::memcpy(&buffer[0], header, HEADER_LENGTH);
// encode target_id
std::memcpy(&buffer[0] + HEADER_LENGTH, static_cast<char*>(&msg->target_id), sizeof(int));
// encode numFloats
std::memcpy(&buffer[0] + HEADER_LENGTH + sizeof(int), static_cast<char*>(&numFloats), sizeof(size_t));
// encode the vector of float
std::memcpy(&buffer[0] + HEADER_LENGTH + sizeof(int) + sizeof(size_t),
static_cast<char*>(&bbox[0]), sizeof(float)*numFloats);
dst_len = HEADER_LENGTH + len;
}

wcscpy_s not affecting wchar_t*

I'm trying to load some strings from a database into a struct, but I keep running into an odd issue. Using my struct datum,
struct datum {
wchar_t* name;
wchar_t* lore;
};
I tried the following code snippet
datum thisDatum;
size_t len = 0;
wchar_t wBuffer[2048];
mbstowcs_s(&len, wBuffer, (const char*)sqlite3_column_text(pStmt, 1), 2048);
if (len) {
thisDatum.name = new wchar_t[len + 1];
wcscpy_s(thisDatum.name, len + 1, wBuffer);
} else thisDatum.name = 0;
mbstowcs_s(&len, wBuffer, (const char*)sqlite3_column_text(pStmt, 2), 2048);
if (len) {
thisDatum.lore = new wchar_t[len + 1];
wcscpy_s(thisDatum.lore, len + 1, wBuffer);
} else thisDatum.name = 0;
However, while thisDatum.name copies correctly, thisDatum.lore is always garbage, except on two occassions. If the project is Debug, everything is fine, but that just isn't an option. I also discovered that rewriting the struct datum
struct datum {
wchar_t* lore;
wchar_t* name;
};
completely fixes the issue for thisDatum.lore, but gives me garbage for thisDatum.name.

Try something more like this:
struct datum {
wchar_t* name;
wchar_t* lore;
};
wchar_t* widen(const char *str)
{
wchar_t *wBuffer = NULL;
size_t len = strlen(str) + 1;
size_t wlen = 0;
mbstowcs_s(&wlen, NULL, 0, str, len);
if (wlen)
{
wBuffer = new wchar_t[wlen];
mbstowcs_s(NULL, wBuffer, wlen, str, len);
}
return wBuffer;
}
datum thisDatum;
thisDatum.name = widen((const char*)sqlite3_column_text(pStmt, 1));
thisDatum.lore = widen((const char*)sqlite3_column_text(pStmt, 2));
...
delete[] thisDatum.name;
delete[] thisDatum.lore;
That being said, I would use std::wstring instead:
struct datum {
std::wstring name;
std::wstring lore;
};
#include <locale>
#include <codecvt>
std::wstring widen(const char *str)
{
std::wstring_convert< std::codecvt<wchar_t, char, std::mbstate_t> > conv;
return conv.from_bytes(str);
}
datum thisDatum;
thisDatum.name = widen((const char*)sqlite3_column_text(pStmt, 1));
thisDatum.lore = widen((const char*)sqlite3_column_text(pStmt, 2));

Unable to preserve newlines in RichEdit

I'm having problems preserving newslines from a RichEdit control inside strings.
What I'm doing is:
Get text from RichEdit control
Split everything delimited by a space
Add some RTF formatting
"Fuse" words back together
Send text to control
I'm not sure what part causes this so here's the most relevant bits:
int RichEdit::GetTextLength() const
{
GETTEXTLENGTHEX len;
len.codepage = 1200;
len.flags = GTL_NUMBYTES;
return (int)SendMessage(this->handle, EM_GETTEXTLENGTHEX, (WPARAM)&len, 0) + 1;
}
tstring RichEdit::GetText() const
{
auto len = this->GetTextLength();
GETTEXTEX str;
TCHAR* tmp = new TCHAR[len];
str.cb = len;
str.flags = GT_USECRLF;
str.codepage = 1200;
str.lpDefaultChar = NULL;
str.lpUsedDefChar = NULL;
(void)SendMessage(this->handle, EM_GETTEXTEX, (WPARAM)&str, (LPARAM)tmp);
tstring ret(tmp);
delete[] tmp;
return ret;
}
void RichEdit::SetRtfText(const tstring& text, int flags)
{
DWORD WideLength = text.length();
DWORD Length = WideLength * 4;
PSTR Utf8 = (PSTR)malloc(Length);
int ReturnedLength = WideCharToMultiByte(CP_UTF8,
0,
text.c_str(),
WideLength-1,
Utf8,
Length-1,
NULL,
NULL);
if (ReturnedLength)
Utf8[ReturnedLength] = 0;
SETTEXTEX st = {0};
st.flags = flags;
st.codepage = CP_UTF8;
(void)SendMessage(this->handle, EM_SETTEXTEX, (WPARAM)&st, (LPARAM)Utf8 );
free(Utf8);
}
void split ( tstring input , tstring split_id, std::vector<std::pair<tstring,bool>>& res ) {
std::vector<std::pair<tstring,bool>> result;
int i = 0;
bool add;
tstring temp;
std::wstringstream ss;
size_t found;
tstring real;
int r = 0;
while ( i != input.length() )
{
add = false;
ss << input.at(i);
temp = ss.str();
found = temp.find(split_id);
if ( found != tstring::npos )
{
add = true;
real.append ( temp , 0 , found );
} else if ( r > 0 && ( i+1 ) == input.length() )
{
add = true;
real.append ( temp , 0 , found );
}
if ( add )
{
result.emplace_back(std::make_pair(real,false));
ss.str(tstring());
ss.clear();
temp.clear();
real.clear();
r = 0;
}
i++;
r++;
}
res = result;
}
ps: tstring is just a typedef for std::wstring/std::string
How can I preserve the newlines?

There are quite a few problems with your code.
Your code is TCHAR based, but you are not actually retrieving/setting the RTF data using TCHAR correctly.
When retreiving the text, you are normalizing line breaks to CRLF, but you are not doing that same normalizing when retreiving the text length, so they are going to be out of sync with each other.
You are writing data to the RichEdit using UTF-8, but RTF is an ASCII-based format that uses escape sequences for Unicode data. If you are going to retrieve data as Unicode, you may as well write it using Unicode as well, and make sure you are doing all of that correctly to begin with. Let the RichEdit control handle the Unicode for you.
Your use of WideCharToMultiByte() is wrong. You should not be subtracting -1 from the string lengths at all. You are likely trying to account for null terminators, but the length values do not include null terminators to begin with. If you are going to stick with UTF-8 then you should be using WideCharToMultiByte() to calculate the correct UTF-8 length instead of hard-coding it.
int Length = WideCharToMultiByte(CP_UTF8, 0, text.c_str(), text.length(), NULL, 0, NULL, NULL);
char Utf8 = new char[Length+1];
WideCharToMultiByte(CP_UTF8, 0, text.c_str(), text.length(), Utf8, Length, NULL, NULL);
Utf8[Length] = 0;
...
delete[] Utf8;
With that said, if you are going to stick with TCHAR then try this:
#ifdef UNICODE
#define RTFCodePage 1200
#else
#define RTFCodePage CP_ACP
#endif
int RichEdit::GetTextLength() const
{
GETTEXTLENGTHEX len = {0};
len.codepage = RTFCodePage;
len.flags = GTL_NUMCHARS | GTL_USECRLF;
return SendMessage(this->handle, EM_GETTEXTLENGTHEX, (WPARAM)&len, 0);
}
tstring RichEdit::GetText() const
{
int len = this->GetTextLength() + 1;
GETTEXTEX str = {0};
str.cb = len * sizeof(TCHAR);
str.flags = GT_USECRLF;
str.codepage = RTFCodePage;
vector<TCHAR> tmp(len);
len = SendMessage(this->handle, EM_GETTEXTEX, (WPARAM)&str, (LPARAM)&tmp[0]);
return tstring(&tmp[0], len-1);
}
void RichEdit::SetRtfText(const tstring& text, int flags)
{
SETTEXTEX st = {0};
st.flags = flags;
st.codepage = RTFCodePage;
#ifdef UNICODE
st.flags |= ST_UNICODE;
#endif
SendMessage(this->handle, EM_SETTEXTEX, (WPARAM)&st, (LPARAM)text.c_str());
}
It would be better to drop TCHAR and just use Unicode for everything:
int RichEdit::GetTextLength() const
{
GETTEXTLENGTHEX len = {0};
len.codepage = 1200;
len.flags = GTL_NUMCHARS | GTL_USECRLF;
return SendMessage(this->handle, EM_GETTEXTLENGTHEX, (WPARAM)&len, 0);
}
wstring RichEdit::GetText() const
{
int len = this->GetTextLength() + 1;
GETTEXTEX str = {0};
str.cb = len * sizeof(WCHAR);
str.flags = GT_USECRLF;
str.codepage = 1200;
vector<WCHAR> tmp(len);
len = SendMessage(this->handle, EM_GETTEXTEX, (WPARAM)&str, (LPARAM)&tmp[0]);
return wstring(tmp, len-1);
}
void RichEdit::SetRtfText(const wstring& text, int flags)
{
SETTEXTEX st = {0};
st.flags = flags | ST_UNICODE;
st.codepage = 1200;
SendMessage(this->handle, EM_SETTEXTEX, (WPARAM)&st, (LPARAM)text.c_str());
}
Update: if you have to go back to UTF-8 for the EM_SETTEXTEX message then try this:
void RichEdit::SetRtfText(const tstring& text, int flags)
{
string Utf8;
int Length;
#ifdef UNICODE
Length = WideCharToMultiByte(CP_UTF8, 0, text.c_str(), text.length(), NULL, 0, NULL, NULL);
if (Length > 0)
{
Utf8.resize(Length);
WideCharToMultiByte(CP_UTF8, 0, text.c_str(), text.length(), &Utf8[0], Length, NULL, NULL);
}
#else
Length = MultiByteToWideChar(CP_ACP, 0, text.c_str(), text.length(), NULL, 0);
if (Length > 0)
{
vector<WCHAR> tmp(Length);
MultiByteToWideChar(CP_ACP, 0, text.c_str(), text.length(), &tmp[0], Length);
Length = WideCharToMultiByte(CP_UTF8, 0, tmp.c_str(), tmp.length(), NULL, 0, NULL, NULL);
if (Length > 0)
{
Utf8.resize(Length);
WideCharToMultiByte(CP_UTF8, 0, tmp.c_str(), tmp.length(), &Utf8[0], Length, NULL, NULL);
}
}
#endif
SETTEXTEX st = {0};
st.flags = flags & ~ST_UNICODE;
st.codepage = CP_UTF8;
SendMessage(this->handle, EM_SETTEXTEX, (WPARAM)&st, (LPARAM)Utf8.c_str());
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

why UChar* is not working with this ICU conversion? - c++

Related

On windows “retsize <= sizeInWords” in mbstowcs

Convert between wstring and string , got different results with "same" way

serialization of float vector in a struct

wcscpy_s not affecting wchar_t*

Unable to preserve newlines in RichEdit

Categories

Resources