Using rapidjson and ATL CString - c++

I am attempting to use the rapidjson library with Microsoft ATL CString type, as shown in the example below.
#include "stdafx.h"
#include "rapidjson\document.h"
using namespace rapidjson;
typedef GenericDocument<UTF16<> > WDocument;
int main()
{
WDocument document;
CString hello = _T("Hello");
document.SetObject();
document.AddMember(_T("Hello"), hello, document.GetAllocator());
return 0;
}
This fails with the compiler error
'rapidjson::GenericValue::GenericValue(rapidjson::GenericValue &&)': cannot convert argument 1 from 'CString' to 'rapidjson::Type' rapidjson document.h 1020
which does imply that a conversion between CString and a format which rapidjson would need is required. I know that rapidjson internally uses wchar_t as the encoding for the UTF16 version of its functions, however I am not sure how to convert a CString to a wchar_t (or array of wchar_t) in a way that rapidjson will be able to use the string as it uses strings defined by the _T macro.
I have looked at the msdn resources on converting between string types here but this only gives a way to return a pointer to the first member of an array of wchar_t, which rapidjson cannot then use.

The correct way to do this is to use one of the constructors rapidjson provides for its GenericValue class, namely the constructor for a pointer to a character encoding type and a character length.
GenericValue(const Ch* s, SizeType length) RAPIDJSON_NOEXCEPT : data_(), flags_() { SetStringRaw(StringRef(s, length)); }
This constructor can take a pointer to any of the character types which rapidjson accepts along with a length and then read this into a value. For the ATL::CString class, this can be accomplished with the .GetString() and .GetLength() methods available on a CString object. A function to return a Value which can be used in a DOM tree would look like this:
typedef GenericValue<UTF16<> > WValue;
WValue CStringToRapidjsonValue(CString in)
{
WValue out(in.GetString(), in.GetLength());
return out;
}

Related

how to convert typedef basic_string<char> string to byte array in cpp

I am trying to pass cpp string to java android using JNI.
void Endpoint::utilLogWrite(int prmLevel,
const string &prmSender,
const string &prmMsg)
so when i read the prmMsg from java using JNI i get exception.
The below exception occurs while converting that cpp string to java String.
JNI DETECTED ERROR IN APPLICATION
I have no control over JNIMethods.so searching stackoverflow tells instead of sending cpp string prefer sending byte array.
The variable prmMsg is of type typedef basic_string
so how to convert this typedef basic_string string to byte array.In java we have simple method
String.toBytes()
.But IN CPP how can i achieve it.
"input is not valid Modified UTF-8: illegal start byte": Yep, UTF-8 is not the same as Modified UTF-8, which is a crutch JNI offers [and, in some places (class paths and member names, etc), demands]. So, I like your approach of using Java to do any character encoding conversions.
To create a byte array, call NewByteArray. To fill it, call SetByteArrayRegion. To get jbytes out of a std::string, call data and cast.
std::string s = "\xF0\x9F\x9A\xB2";
// jbyte and char are the same size
const auto output = env->NewByteArray(s.length());
env->SetByteArrayRegion(output, 0, s.length(), reinterpret_cast<const jbyte *>(s.data()));
return output; // return or release output

Deep copy of TCHAR array is truncated

I've created a class to test some functionality I need to use. Essentially the class will take a deep copy of the passed in string and make it available via a getter. Am using Visual Studio 2012. Unicode is enabled in the project settings.
The problem is that the memcpy operation is yielding a truncated string. Output is like so;
THISISATEST: InstanceDataConstructor: Testing testing 123
Testing te_READY
where the first line is the check of the passed in TCHAR* string & the second line is the output from populating the allocated memory with the memcpy operation. Output expected is; "Testing testing 123".
Can anyone explain what is wrong here?
N.B. Got the #ifndef UNICODE typedefs from here: how-to-convert-tchar-array-to-stdstring
#ifndef INSTANCE_DATA_H//if not defined already
#define INSTANCE_DATA_H//then define it
#include <string>
//TCHAR is just a typedef, that depending on your compilation configuration, either defaults to char or wchar.
//Standard Template Library supports both ASCII (with std::string) and wide character sets (with std::wstring).
//All you need to do is to typedef String as either std::string or std::wstring depending on your compilation configuration.
//To maintain flexibility you can use the following code:
#ifndef UNICODE
typedef std::string String;
#else
typedef std::wstring String;
#endif
//Now you may use String in your code and let the compiler handle the nasty parts. String will now have constructors that lets you convert TCHAR to std::string or std::wstring.
class InstanceData
{
public:
InstanceData(TCHAR* strIn) : strMessage(strIn)//constructor
{
//Check to passed in string
String outMsg(L"THISISATEST: InstanceDataConstructor: ");//L for wide character string literal
outMsg += strMessage;//concatenate message
const wchar_t* finalMsg = outMsg.c_str();//prepare for outputting
OutputDebugStringW(finalMsg);//print the message
//Prepare TCHAR dynamic array. Deep copy.
charArrayPtr = new TCHAR[strMessage.size() +1];
charArrayPtr[strMessage.size()] = 0;//null terminate
std::memcpy(charArrayPtr, strMessage.data(), strMessage.size());//copy characters from array pointed to by the passed in TCHAR*.
OutputDebugStringW(charArrayPtr);//print the copied message to check
}
~InstanceData()//destructor
{
delete[] charArrayPtr;
}
//Getter
TCHAR* getMessage() const
{
return charArrayPtr;
}
private:
TCHAR* charArrayPtr;
String strMessage;//is used to conveniently ascertain the length of the passed in underlying TCHAR array.
};
#endif//header guard
A solution without all of the dynamically allocated memory.
#include <tchar.h>
#include <vector>
//...
class InstanceData
{
public:
InstanceData(TCHAR* strIn) : strMessage(strIn),
{
charArrayPtr.insert(charArrayPtr.begin(), strMessage.begin(), strMessage.end())
charArrayPtr.push_back(0);
}
TCHAR* getMessage()
{ return &charArrayPtr[0]; }
private:
String strMessage;
std::vector<TCHAR> charArrayPtr;
};
This does what your class does, but the major difference being that it does not do any hand-rolled dynamic allocation code. The class is also safely copyable, unlike the code with the dynamic allocation (lacked a user-defined copy constructor and assignment operator).
The std::vector class has superseded having to do new[]/delete[] in almost all circumstances. The reason being that vector stores its data in contiguous memory, no different than calling new[].
Please pay attention to the following lines in your code:
// Prepare TCHAR dynamic array. Deep copy.
charArrayPtr = new TCHAR[strMessage.size() + 1];
charArrayPtr[strMessage.size()] = 0; // null terminate
// Copy characters from array pointed to by the passed in TCHAR*.
std::memcpy(charArrayPtr, strMessage.data(), strMessage.size());
The third argument to pass to memcpy() is the count of bytes to copy.
If the string is a simple ASCII string stored in std::string, then the count of bytes is the same of the count of ASCII characters.
But, if the string is a wchar_t Unicode UTF-16 string, then each wchar_t occupies 2 bytes in Visual C++ (with GCC things are different, but this is a Windows Win32/C++ code compiled with VC++, so let's just focus on VC++).
So, you have to properly scale the size count for memcpy(), considering the proper size of a wchar_t, e.g.:
memcpy(charArrayPtr, strMessage.data(), strMessage.size() * sizeof(TCHAR));
So, if you compile in Unicode (UTF-16) mode, then TCHAR is expanded to wchar_t, and sizeof(wchar_t) is 2, so the content of your original string should be properly deep-copied.
As an alternative, for Unicode UTF-16 strings in VC++ you may use also wmemcpy(), which considers wchar_t as its "unit of copy". So, in this case, you don't have to scale the size factor by sizeof(wchar_t).
As a side note, in your constructor you have:
InstanceData(TCHAR* strIn) : strMessage(strIn)//constructor
Since strIn is an input string parameter, consider passing it by const pointer, i.e.:
InstanceData(const TCHAR* strIn)

Convert Platform::Array<byte> to String

A have a function in C++ from a library that reads a resource and returns Platform::Array<byte>^
How can I convert this into a Platform::String or an std::string
BasicReaderWriter^ m_basicReaderWriter = ref new BasicReaderWriter()
Platform::Array<byte>^ data = m_basicReaderWriter ("file.txt")
I need a Platform::String from data
If your Platform::Array<byte>^ data contains an ASCII string (as you clarified in a comment to your question), you can convert it to std::string using proper std::string constructor overloads (note that Platform::Array offers STL-like begin() and end() methods):
// Using std::string's range constructor
std::string s( data->begin(), data->end() );
// Using std::string's buffer pointer + length constructor
std::string s( data->begin(), data->Length );
Unlike std::string, Platform::String contains Unicode UTF-16 (wchar_t) strings, so you need a conversion from your original byte array containing the ANSI string to Unicode string. You can perform this conversion using ATL conversion helper class CA2W (which wraps calls to Win32 API MultiByteToWideChar()).
Then you can use Platform::String constructor taking a raw UTF-16 character pointer:
Platform::String^ str = ref new String( CA2W( data->begin() ) );
Note:
I currently don't have VS2012 available, so I haven't tested this code with the C++/CX compiler. If you get some argument matching errors, you may want to consider reinterpret_cast<const char*> to convert from the byte * pointer returned by data->begin() to a char * pointer (and similar for data->end()), e.g.
std::string s( reinterpret_cast<const char*>(data->begin()), data->Length );

How to convert a utf16 ushort array to a utf8 std::string?

Currently I'm writing a plugin which is just a wrapper around an existing library.
The plugin's host passes to me an utf-16 formatted string defined as following
typedef unsigned short PA_Unichar;
And the wrapped library accepts only a const char* or a std::string utf-8 formatted string
I tried writing a conversion function like
std::string toUtf8(const PA_Unichar* data)
{
std::wstring_convert<std::codecvt_utf8_utf16<char16_t>,char16_t> convert;
return std::string(convert.to_bytes(static_cast<const char16_t*>(data));
}
But obviously this doesn't work, throwing me a compile error "static_cast from 'const pointer' (aka 'const unsigned short*') to 'const char16_t *' is not allowed"
So what's the most elegant/correct way to do it?
Thank you in advance.
You could convert the PA_unichar string to a string of char16_t using the basic_string(Iterator, Iterator) constructor, then use the std::codecvt_utf8_utf16 facet as you attempted:
std::string conv(const PA_unichar* str, size_t len)
{
std::u16string s(str, str+len);
std::wstring_convert<std::codecvt_utf8_utf16<char16_t>,char16_t> convert;
return convert.to_bytes(s);
}
I think that's right. Unfortunately I can't test this, as my implementation doesn't support it yet. I have an implementation of wstring_convert which I plan to include in GCC 4.9, but I don't have an implementation of codecvt_utf8_utf16 to test it with.

Assigning a "const char*" to std::string is allowed, but assigning to std::wstring doesn't compile. Why?

I assumed that std::wstring and std::string both provide more or less the same interface.
So I tried to enable unicode capabilities for our application
# ifdef APP_USE_UNICODE
typedef std::wstring AppStringType;
# else
typedef std::string AppStringType;
# endif
However that gives me a lot of compile errors when -DAPP_USE_UNICODE is used.
It turned out, that the compiler chokes when a const char[] is assigned to std::wstring.
EDIT: improved example by removing the usage of literal "hello".
#include <string>
void myfunc(const char h[]) {
string s = h; // compiles OK
wstring w = h; // compile Error
}
Why does it make such a difference?
Assigning a const char* to std::string is allowed, but assigning to std::wstring gives compile errors.
Shouldn't std::wstring provide the same interface as std::string? At least for such a basic operation as assignment?
(environment: gcc-4.4.1 on Ubuntu Karmic 32bit)
You should do:
#include <string>
int main() {
const wchar_t h[] = L"hello";
std::wstring w = h;
return 0;
}
std::string is a typedef of std::basic_string<char>, while std::wstring is a typedef of std::basic_string<wchar_t>. As such, the 'equivalent' C-string of a wstring is an array of wchar_ts.
The 'L' in front of the string literal is to indicate that you are using a wide-char string constant.
The relevant part of the string API is this constructor:
basic_string(const charT*);
For std::string, charT is char. For std::wstring it's wchar_t. So the reason it doesn't compile is that wstring doesn't have a char* constructor. Why doesn't wstring have a char* constructor?
There is no one unique way to convert a string of char to a string of wchar. What's the encoding used with the char string? Is it just 7 bit ASCII? Is it UTF-8? Is it UTF-7? Is it SHIFT-JIS? So I don't think it would entirely make sense for std::wstring to have an automatic conversion from char*, even though you could cover most cases. You can use:
w = std::wstring(h, h + sizeof(h) - 1);
which will convert each char in turn to wchar (except the NUL terminator), and in this example that's probably what you want. As int3 says though, if that's what you mean it's most likely better to use a wide string literal in the first place.
To convert from a multibyte encoding to a wide character encoding, take a look at the header <locale> and the type std::codecvt. The Dinkumware library has a class Dinkum::wstring_convert that makes performing such multibyte-to-wide conversions easier.
The function std::codecvt_byname allows one to find a codecvt instance for a particular named encoding. Unfortunately, discovering the names of the encodings (or locales) on your system is implementation-specific.
Small suggestion... Do not use "Unicode" strings under Linux (a.k.a. wide strings). std::string is perfectly fine and holds Unicode very well (UTF-8).
Most Linux API works with char * strings and most popular encoding is UTF-8.
So... Just don't bother yourself using wstring.
In addition to the other answers, you could use a trick from Microsoft's book (specifically, tchar.h), and write something like this:
# ifdef APP_USE_UNICODE
typedef std::wstring AppStringType;
#define _T(s) (L##s)
# else
typedef std::string AppStringType;
#define _T(s) (s)
# endif
AppStringType foo = _T("hello world!");
(Note: my macro-fu is weak, and this is untested, but you get the idea.)
Looks like you can do something like this:
#include <sstream>
// ...
std::wstringstream tmp;
tmp << "hello world";
std::wstring our_string =
Although for a more complex situation, you may want to break down and use mbstowcs
you should use
#include <tchar.h>
tstring instead of wstring/string
TCHAR* instead of char*
and _T("hello") instead of "hello" or L"hello"
this will use the appropriate form of string+char, when _UNICODE is defined.