NewString crashes for Unicode chars - java-native-interface

NewString crashes for Unicode chars - java-native-interface

I am using following piece of code for converting to UTF 8 on Linux. Please note that for me sizeof(wchar_t) = 2 due to compiler flag
void convert(const wchar_t* data, size_t len)
{
ASSERT(sizeof(wchar_t) == sizeof(jchar));
JNIEnv* env = GetEnv();
JString jstr = env->NewString((const jchar *)data, len);
int cbMLen = jStr.GetStringUTFLength();
char* pUTF8Str = new (std::nothrow) char[cbLen + 1];
//IFALLOCFAILED_EXIT(pUTF8String);
strncpy_s(pUTF8Str, cbLen + 1, jStr.GetUTFString(), cbLen);
// release memory...
}
Code is crashing at NewString for certain set of Unicode characters. Am I doing something wrong?

Related

Simple C++ Logging

I have been set the task of:
To evolve a Logger class that can be integrated into your
projects/developments, most probably by declaring a single global instance; this will be used to capture and save log information.
There are 2 levels that are acceptable:
Logger - essentially just debug code in the
source, no class.
Logger - packaged in a class, with default behavior (not configurable). Controlled by _DEBUG and writes to std::clog (can be re-directed).
I really do not know where to start with this, and have spent hours trying to find help somewhere.

Your task may be to use a more complex Logger class which makes logging throughout your entire project easier due to functionalities such as
being able to break a string on multiple lines of code when debugging it
separate methods between error logging and debug logging
being able to turn logging on/off from that Logger instance, so that all the debug messages throughout your project become obsolete (instead of needing to comment all of them for instance)
adding other methods specific to your case (you will see below)
This is a Logger class I used for a windows project. It supports all the features previously mentioned. Feel free to use it.
#include "Logger.h"
Logger::Logger(
wstring wstrComponentName,
wstring wstrFunctionName) :
_wstrApplicationName(APPLICATION_NAME),
_wstrComponentName(StripFileName(wstrComponentName)),
_wstrFunctionName(wstrFunctionName)
{}
// This function doesn't support string, but wstring for it's arguments
VOID
Logger::Debug(
LPCWSTR format,
...
)
{
wstring wstrBase =
_wstrApplicationName + wstring(L": ") +
_wstrComponentName + wstring(L": ") +
_wstrFunctionName + wstring(L": ");
va_list args;
va_start(args, format);
wchar_t* msg = new wchar_t[_1kB];
wvsprintf(msg, format, args);
va_end(args);
wstring wstrOutput = wstrBase + msg;
delete[] msg;
OutputDebugString(wstrOutput.c_str());
}
// This function doesn't support string, but wstring for it's arguments
VOID
Logger::Error(
LPCWSTR format,
...
)
{
wstring wstrBase = wstring(L"[ERROR] ") +
_wstrApplicationName + wstring(L": ") +
_wstrComponentName + wstring(L": ") +
_wstrFunctionName + wstring(L": ");
va_list args;
va_start(args, format);
wchar_t* msg = new wchar_t[_1kB];
wvsprintf(msg, format, args);
va_end(args);
wstring wstrOutput = wstrBase + msg;
delete[] msg;
OutputDebugString(wstrOutput.c_str());
}
wstring
Logger::StripFileName(
__in wstring wstrFileName
)
{
DWORD dwLeftLimit = 0;
DWORD dwRightLimit = wstrFileName.length();
wstring wstrResult(L"FileName unavailable");
if (wstrFileName.rfind(L"\\") != wstring::npos)
{
dwLeftLimit = wstrFileName.rfind(L"\\") + 1;
}
if (wstrFileName.rfind(L".") != wstring::npos)
{
dwRightLimit = wstrFileName.rfind(L".");
}
if (dwRightLimit > dwLeftLimit)
{
wstrResult = wstrFileName.substr(
dwLeftLimit,
dwRightLimit - dwLeftLimit
);
}
return wstrResult;
}

HMAC on Mountain lion OSX 10.8.3 EXC_CRASH

Looking for a bit of help using OpenSSL's HMAC function. Currently this function is failing on the HMAC call. ONLY for OSX. Both linux and windows os's are working okay.
QString tradingDialog::HMAC_SHA512_SIGNER(QString UrlToSign, QString Secret){
QString retval = "";
QByteArray byteArray = UrlToSign.toUtf8();
const char* URL = byteArray.constData();
QByteArray byteArrayB = Secret.toUtf8();
const char* Secretkey = byteArrayB.constData();
const EVP_MD *md = EVP_sha512();
unsigned char* digest = NULL;
// Be careful of the length of string with the choosen hash engine. SHA1 produces a 20-byte hash value which rendered as 40 characters.
// Change the length accordingly with your choosen hash engine
char mdString[129] = { 0 };
// Using sha512 hash engine here.
digest = HMAC(md, Secretkey, strlen( Secretkey), (unsigned char*) URL, strlen( URL), NULL, NULL);
for(int i = 0; i < 64; i++){
sprintf(&mdString[i*2], "%02x", (unsigned int)digest[i]);
}
retval = mdString;
return retval;
}

You don't say what the problem is on osx, but it looks like you're not nul terminating mdString, so try changing it to
char mdString[129] = { 0 };
The crashlog you linked to shows that your app is aborting because the stack has been corrupted (I assume this happens on exit).
I would say the final sprintf is causing this, as it is adding a nul byte after the end of your mdString array. Try the above modification and see if that helps.
This ought to crash on all platforms, but I guess you got "lucky".

Access violation reading location NULL in Circuit Satisfiability Solver

I try to resolve the circuit satisfiability problem reading the circuit from file (in the form presented in text visualizer-somehow dynamic). If my circuit is small my resolver work smooth (small means like <16-18 wires). If i get to 25-30 wires so 2^25-30 possibilities i encountered a problem with a violation of access. I tried to free memory every time i can. I tried to create a new pointer of my expression every time, but the access violation always occur.
^ How is this possible ?
int evalBoolExprForBinaryVector(char *expr, int n, int binaryVector[]){
// create boolean expression from logical expression
char* expression = (char*) malloc(sizeof(char) * strlen(expr) + 1);
strcpy(expression, expr);
for(int binaryVectorCounter=0; binaryVectorCounter<n; binaryVectorCounter++){
char* currentSearchedIdentifier = (char*) malloc(sizeof(char) * 10);
char* index =(char*) malloc(sizeof(char) * 10);
char* valueOfIndex = (char*) malloc(sizeof(char)*2);
strcpy(currentSearchedIdentifier,"v[");
sprintf(index, "%d", binaryVectorCounter);
strcat(currentSearchedIdentifier, index);
strcat(currentSearchedIdentifier, "]");
sprintf(valueOfIndex, "%d", binaryVector[binaryVectorCounter]);
expression = str_replace(expression,currentSearchedIdentifier,valueOfIndex);
free(currentSearchedIdentifier);
free(index);
free(valueOfIndex);
}
// here my expression will be something like
// ( 0 | 1 ) & (!0 | !1) & ...
// evaluate this
return evalBoolExpr(expression);
};
Here is my code for better understanding.
The program breaks with this exception in strlen.asm at:
main_loop:
mov eax,dword ptr [ecx] ; read 4 bytes
Thanks in advance for any thoughts.

I rewrite this part in c++ manner and everything worked smooth (some delay but at least it can finish with success)
void replaceAll(std::string& str, const std::string& from, const std::string& to) {
if(from.empty())
return;
size_t start_pos = 0;
while((start_pos = str.find(from, start_pos)) != std::string::npos) {
str.replace(start_pos, from.length(), to);
start_pos += to.length();
}
}
int evalBoolExprForBinaryVector(char *expr, int n, int binaryVector[]){
std::string expression(expr);
for(int binaryVectorCounter=0; binaryVectorCounter<n; binaryVectorCounter++){
std::string currentSearchedIdentifier, valueOfIndex;
currentSearchedIdentifier = "v[" + std::to_string(binaryVectorCounter) + "]";
valueOfIndex = std::to_string(binaryVector[binaryVectorCounter]);
replaceAll(expression,currentSearchedIdentifier,valueOfIndex);
}
char *cstr = new char[expression.length() + 1];
strcpy(cstr, expression.c_str());
return evalBoolExpr(cstr);
};

How to set HTML Unicode text to clipboard in VC++?

I am a newbie to C++. I want to get the content of the clipboard, which might contain Unicode chars, append a div tag with some content formatted in HTML and set that back to clipboard.
I have achieved successfully in getting the content and appending it. But could not set it back to the clipboard as an HTML text. I have achieved setting as simple text. Here is my code:
#include <shlwapi.h>
#include <iostream>
#include <conio.h>
#include <stdio.h>
using namespace std;
wstring getClipboard(){
if (OpenClipboard(NULL)){
HANDLE clip = GetClipboardData(CF_UNICODETEXT);
WCHAR * c;
c = (WCHAR *)clip;
CloseClipboard();
return (WCHAR *)clip;
}
return L"";
}
bool setClipboard(wstring textToclipboard)
{
if (OpenClipboard(NULL)){
EmptyClipboard();
HGLOBAL hClipboardData;
size_t size = (textToclipboard.length()+1) * sizeof(WCHAR);
hClipboardData = GlobalAlloc(NULL, size);
WCHAR* pchData = (WCHAR*)GlobalLock(hClipboardData);
memcpy(pchData, textToclipboard.c_str(), size);
SetClipboardData(CF_UNICODETEXT, hClipboardData);
GlobalUnlock(hClipboardData);
CloseClipboard();
return true;
}
return false;
}
int main (int argc, char * argv[])
{
wstring s = getClipboard();
s += std::wstring(L"some extra text <b>hello</b>");
setClipboard(s);
getch();
return 0;
}
I did try using the code described here and read the doc here. But I couldn't make it work. What I tried could be way off track or completely wrong.
Update: The code below is what I tried after the modifications suggested by Cody Gray to the original code presented here:
bool CopyHTML2(WCHAR *html ){
wchar_t *buf = new wchar_t [400 + wcslen(html)];
if(!buf) return false;
static int cfid = 0;
if(!cfid) cfid = RegisterClipboardFormat("HTML Format");
// Create a template string for the HTML header...
wcscpy(buf,
L"Version:0.9\r\n"
L"StartHTML:00000000\r\n"
L"EndHTML:00000000\r\n"
L"StartFragment:00000000\r\n"
L"EndFragment:00000000\r\n"
L"<html><body>\r\n"
L"<!--StartFragment -->\r\n");
// Append the HTML...
wcscat(buf, html);
wcscat(buf, L"\r\n");
// Finish up the HTML format...
wcscat(buf,
L"<!--EndFragment-->\r\n"
L"</body>\r\n"
L"</html>");
wchar_t *ptr = wcsstr(buf, L"StartHTML");
wsprintfW(ptr+10, L"%08u", wcsstr(buf, L"<html>") - buf);
*(ptr+10+8) = L'\r';
ptr = wcsstr(buf, L"EndHTML");
wsprintfW(ptr+8, L"%08u", wcslen(buf));
*(ptr+8+8) = '\r';
ptr = wcsstr(buf, L"StartFragment");
wsprintfW(ptr+14, L"%08u", wcsstr(buf, L"<!--StartFrag") - buf);
*(ptr+14+8) = '\r';
ptr = wcsstr(buf, L"EndFragment");
wsprintfW(ptr+12, L"%08u", wcsstr(buf, L"<!--EndFrag") - buf);
*(ptr+12+8) = '\r';
// Open the clipboard...
if(OpenClipboard(0)) {
EmptyClipboard();
HGLOBAL hText = GlobalAlloc(GMEM_MOVEABLE |GMEM_DDESHARE, wcslen(buf)+4);
wchar_t *ptr = (wchar_t *)GlobalLock(hText);
wcscpy(ptr, buf);
GlobalUnlock(hText);
SetClipboardData(cfid, hText);
CloseClipboard();
GlobalFree(hText);
}
// Clean up...
delete [] buf;
return true;
}
This code compiles successfully, But I get the following error at SetClipboardData : HEAP[Project1.exe]: Heap block at 007A8530 modified at 007A860A past requested size of d2
Project1.exe has triggered a breakpoint.
Please guide me on how to proceed. I am using Visual Studio Express 2012 on Windows 8. Thanks.

You're mismatching ANSI (narrow) and Unicode (wide) strings.
Unlike the wcscpy function, the w in the wsprintf function doesn't stand for "wide", it stands for "Windows". It is part of the Win32 API, rather than the C runtime library. All of the Win32 API functions that work with strings have two versions, one suffixed with an A that deals with ANSI strings and another suffixed with a W that deals with wide strings. The headers hide all of this from you with macros. I explain all of this in more detail here—recommended reading.
Anyway, the simple fix here is to explicitly call the wide variant of that function, since you're correctly using wide strings everywhere else. Make all the calls to wsprintf look like this:
wchar_t *ptr = wcsstr(buf, L"StartHTML");
wsprintfW(ptr+10, L"%08u", wcsstr(buf, L"<html>") - buf);
*(ptr+10+8) = L'\r';
Alternatively, you could use the swprintf function provided by the C runtime library instead of the Win32 version. This one works just like the wcsstr and wcscpy functions you're using elsewhere. The w in the name means "wide". The documentation for this series of functions is here.
Note also that when you use character or string literals, they also need to be wide characters. You accomplish that by prepending them with an L. You do that some places, but miss doing it others. Make sure that you do it consistently.
The compiler should warn you about all this, though. You just need to make sure you turn your warning level up and don't ignore any of the warnings. Also make sure that both the UNICODE and _UNICODE preprocessor symbols are defined globally for your project. That will ensure that you are always calling the Unicode/wide versions of functions. Although that should be the default for all new projects.

This is the function I came up with the help of Jochen Arndt at codeproject.com. Hope this helps somebody. Here is a complete working code, if you are interested in checking this out.
It still has one problem. That is when pasted to onenote alone, it pastes gibberish after a anchor tag. It does not happen with Word, PowerPoint or Excel. And it does not have this problem for normal English language texts. If you have a solution for this, please do let me know. The problem seems to be with OneNote. Not with the code.
bool setClipboard(LPCWSTR lpszWide){
int nUtf8Size = ::WideCharToMultiByte(CP_UTF8, 0, lpszWide, -1, NULL, 0, NULL, NULL);
if (nUtf8Size < 1) return false;
const int nDescLen = 105;
HGLOBAL hGlobal = ::GlobalAlloc(GMEM_MOVEABLE, nDescLen + nUtf8Size);
if (NULL != hGlobal)
{
bool bErr = false;
LPSTR lpszBuf = static_cast<LPSTR>(::GlobalLock(hGlobal));
LPSTR lpszUtf8 = lpszBuf + nDescLen;
if (::WideCharToMultiByte(CP_UTF8, 0, lpszWide, -1, lpszUtf8, nUtf8Size, NULL, NULL) <= 0)
{
bErr = true;
}
else
{
LPCSTR lpszStartFrag = strstr(lpszUtf8, "<!--StartFragment-->");
LPCSTR lpszEndFrag = strstr(lpszUtf8, "<!--EndFragment-->");
lpszStartFrag += strlen("<!--StartFragment-->") + 2;
int i = _snprintf(
lpszBuf, nDescLen,
"Version:1.0\r\nStartHTML:%010d\r\nEndHTML:%010d\r\nStartFragment:%010d\r\nEndFragment:%010d\r\n",
nDescLen,
nDescLen + nUtf8Size - 1, // offset to next char behind string
nDescLen + static_cast<int>(lpszStartFrag - lpszUtf8),
nDescLen + static_cast<int>(lpszEndFrag - lpszUtf8));
}
::GlobalUnlock(hGlobal);
if (bErr)
{
::GlobalFree(hGlobal);
hGlobal = NULL;
}
// Get clipboard id for HTML format...
static int cfid = 0;
cfid = RegisterClipboardFormat("HTML Format");
// Open the clipboard...
if(OpenClipboard(0)) {
EmptyClipboard();
HGLOBAL hText = GlobalAlloc(GMEM_MOVEABLE |GMEM_DDESHARE, strlen(lpszBuf)+4);
char *ptr = (char *)GlobalLock(hText);
strcpy(ptr, lpszBuf);
GlobalUnlock(hText);
::SetClipboardData(cfid, hText);
CloseClipboard();
GlobalFree(hText);
}
}
return NULL != hGlobal;
}

Your problem comes from the use of wchar_t instead of char in the cited example which makes you wrong on the offset computations.
I would however recommend you avoiding the use of wchar_t for transfering UNICODE text to the clipboard. Indeed, UTF-8 char could coded with a sequence of bytes comprised between 1 and 4 bytes, while wchar_t on Windows is a fixed 2 bytes type.
As explained in the Microsoft doc refered in your email, the content of the clipboard shall be UNICODE, which happens to be the same as ASCII for the characters contained in the header of the clipboard memory.
To transfert UNICODE in the clipboard, you can do it using the standard char C++ functions to prepare the content sent to clipboard (std::string for eg.)
While the cited example works, please find here another code sample using C++ framework that can actually copy UTF-8 chars to the clipboard in HTML format:
void copyHTMLtoClipboard(const std::string& html) {
std::string contextStart("Version:0.9\r\nStartHTML:0000000000\r\nEndHTML:0000000000\r\nStartFragment:0000000000\r\nEndFragment:0000000000\r\n<html><body>\r\n<!--StartFragment -->\r\n");
std::string contextEnd("\r\n<!--EndFragment -->\r\n</body></html>");
std::stringstream aux;
aux << contextStart << html << contextEnd;
std::string res = aux.str();
size_t htmlStart = 105 * sizeof(char);
size_t fragmentStart = 119 * sizeof(char);
size_t htmlEnd = res.size() * sizeof(char);
size_t fragmentEnd = htmlEnd - 35 * sizeof(char);
aux.fill('0');
aux.width(10);
aux.seekp(23);
aux << htmlStart;
aux.seekp(43);
aux.fill('0');
aux.width(10);
aux << htmlEnd;
aux.seekp(69);
aux.fill('0');
aux.width(10);
aux << fragmentStart;
aux.seekp(93);
aux.fill('0');
aux.width(10);
aux << fragmentEnd;
res = aux.str();
HGLOBAL hdst = GlobalAlloc(GMEM_MOVEABLE | GMEM_DDESHARE, htmlEnd + sizeof(char));
LPSTR dst = (LPSTR)GlobalLock(hdst);
memcpy(dst, res.c_str(), htmlEnd);
dst[htmlEnd] = 0;
GlobalUnlock(hdst);
OpenClipboard(NULL);
EmptyClipboard();
SetClipboardData(RegisterClipboardFormat(L"HTML Format"), hdst);
CloseClipboard();
GlobalFree(hdst);
}
Note that this code was compiled defining the macros _UNICODE and UNICODE.

C++/CLI UTF-8 & JNI Not Converting Unicode String Properly

I have a Java class that returns a unicode string... Java has the correct version of the string but when it comes through a JNI wrapper in the form of a jstring it must be converted over to a C++ or C++/CLI string. Here is some test code I have which actually works on most languages except for the asian char sets. Chinese Simplified & Japanese characters are garbled and I can't figure out why. Here is the code snippet, I don't see anything wrong with either methods of conversion (the if statement checks os as I have two VMs with diff OS's and runs the appropriate conversion method).
String^ JStringToCliString(const jstring string){
String^ converted = gcnew String("");
JNIEnv* envLoc = GetJniEnvHandle();
std::wstring value;
jboolean isCopy;
if(string){
try{
jsize len = env->GetStringLength(string);
if(Environment::OSVersion->Version->Major >= 6) // 6 is post XP/2003
{
TraceLog::Log("Using GetStringChars() for string conversion");
const jchar* raw = envLoc->GetStringChars(string, &isCopy);
// todo add exception handling here for jvm
if (raw != NULL) {
value.assign(raw, raw + len);
converted = gcnew String(value.c_str());
env->ReleaseStringChars(string, raw);
}
}else{
TraceLog::Log("Using GetStringUTFChars() for string conversion.");
const char* raw = envLoc->GetStringUTFChars(string, &isCopy);
if(raw) {
int bufSize = MultiByteToWideChar(CP_UTF8, 0 , raw , -1, NULL , 0 );
wchar_t* wstr = new wchar_t[bufSize];
MultiByteToWideChar( CP_UTF8 , 0 , raw , -1, wstr , bufSize );
String^ val = gcnew String(wstr);
delete[] wstr;
converted = val; // partially working
envLoc->ReleaseStringUTFChars(string, raw);
}
}
}catch(Exception^ ex){
TraceLog::Log(ex->Message);
}
}
return converted;
}

Answer was to enable east asian languages in Windows XP as Win7 + Later work fine. Super easy.... waste of a entire day lol.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

NewString crashes for Unicode chars - java-native-interface

Related

Simple C++ Logging

HMAC on Mountain lion OSX 10.8.3 EXC_CRASH

Access violation reading location NULL in Circuit Satisfiability Solver

How to set HTML Unicode text to clipboard in VC++?

C++/CLI UTF-8 & JNI Not Converting Unicode String Properly

Categories

Resources