Get C++ wchar_t into Flash, via Lua - c++

I am currently working on an application in C++, that ties into Lua, that ties into Flash (in that order). My goal at the moment is getting wchar_ts from C++ into Flash, via Lua. I would love any insights as to how I can accomplish this!
If any other information is required, please ask and I'll do my best to provide it
What I have tried
It's my understanding that Lua is not a fan of Unicode, but it should still be able to receive the string of bytes from my C++ application. I imagine there must be a way to then pass those bytes over to Flash to then render out my intended Unicode. So what I've done so far:
C++:
//an example wchar_t*
const wchar_t *text = L"Test!";
//this function pushes a char* to my Lua code
lua.PushString((char*)text); //directly casting text to a char*... D:
Lua:
theString = FunctionThatGetsWCharFromCpp();
flash.ShowString(theString);
Flash:
function ShowString(theString:String)
{
myTextField.text = theString;
}
Now the outcome here is that myTextField only shows "T". This made sense to me. The cast from wchar_t to char would end up padding out the chars with some zeros, especially since "T" doesn't really utilize both bytes of a wchar_t. A quick look at the documentation yields:
lua_pushstring
The string cannot contain embedded zeros; it is assumed to end at the first zero.
So I ran a little test:
C++:
//prefixing with a Japanese character
//which will use both bytes of the wchar_t
const wchar_t *text = L"たTest!";
The Flash textbox now reads: "_0T", 3 characters. Makes total sense, the 2 bytes of the Japanese character + T, then termination.
I understand what is going on, but I am still completely unsure of how to tackle this problem. And I'm really unsure of what to search for. Is there a specific Lua function I can use to pass a wad of bytes over to Lua from C++ (I've read somewhere that lua_pushlstring is often used for this, but that also terminates at first zero)? Is there a Flash datatype that will accept these bytes, then I'll need to do some sort of conversion to get them into a readable, multibyte string? or is this just really not possible?
Note:
I'm not too familiar with Unicode and code pages and whatnot, so I'm not too sure if there'll also be a step where I'll need to specify the correct encoding in Flash so that I can get the correct output - but I'm happy to cross that bridge when I get there, but if anyone has any insight here too, that would be great!

I don't know if this will work, but I'd recommend trying to use UTF-8. A string encoded in UTF-8 doesn't have any embedded zeros in it, so Lua should be able handle it, and Flash ought to also be able to handle it, depending on how exactly the languages interface.
Here's one way to convert a wide-character string to UTF-8 using setlocale(3) wcstombs(3):
// Error checking omitted for expository purposes
// Call this once at program startup. If you'd rather not change the locale,
// you can instead write your own conversion routine (but beware of UTF-16
// surrogate pairs if you do)
setlocale(LC_ALL, "en_US.UTF-8");
// Do this for each string you want to convert
const wchar_t *wideString = L"たTest!";
size_t len = wcslen(wideString);
size_t maxUtf8len = 4 * len + 1; // Each wchar_t encodes to a max of 4 bytes
char *utf8String = new char[maxUtf8len];
wcstombs(utf8String, wideString, maxUtf8len);
...
// Do stuff with utf8string
...
delete [] utf8String;
If you're on Windows, you can instead use the WideCharToMultiByte function with the CP_UTF8 code page to do the conversion, since I don't believe that the Visual Studio C runtime supports UTF-8 locales:
// Error checking omitted for expository purposes
const wchar_t *wideString = L"たTest!";
size_t len = wcslen(wideString);
size_t maxUtf8len = 4 * len + 1; // Each wchar_t encodes to a max of 4 bytes
char *utf8String = new char[maxUtf8len];
WideCharToMultiByte(CP_UTF8, 0, wideString, len + 1, utf8String, maxUtf8len, NULL, NULL);
...
// Do stuff with utf8string
...
delete [] utf8String;

Related

what does function setlocale does?

I write a function to convert wstring to string.If I remove the code setlocale(LC_CTYPE, "") the program goes wrong.I refer to cplusplus read the doc.
C string containing the name of a C locale. These are system specific,
but at least the two following locales must exist:
"C" Minimal "C" locale
"" Environment's default locale
If the value of this parameter is NULL, the function does not make any
changes to the current locale, but the name of the current locale is
still returned by the function.
my code here,source code from cplusplus.com(I add some chinese character):
/* wcstombs example */
#include <stdio.h> /* printf */
#include <stdlib.h> /* wcstombs, wchar_t(C) */
#include <locale.h> /* setlocale */
int main()
{
setlocale(LC_CTYPE, "");
const wchar_t str[] = L"中国、wcstombs example";
char buffer[64];
int ret;
printf ("wchar_t string: %ls \n",str);
ret = wcstombs ( buffer, str, sizeof(buffer) );
if (ret==64)
buffer[63]='\0';
if (ret)
printf ("length:%d,multibyte string: %s \n",ret,buffer);
return 0;
}
If I remove the code setlocale(LC_CTYPE, ""),the program does not run as I expect.
My question is :"If I run in different machine,the program will differ? As the doc say,if the locale is "" ,function does not make any changes to the current locale,but the name of the current locale is still returned by the funciton."
Because the current locale in different machine may differ?
Here is a my c++ version of convert wstring with string,while string to wstring do not need function setlocale,and the program runs well:
/*
string converts to wstring
*/
std::wstring s2ws(const std::string& src)
{
std::wstring res = L"";
size_t const wcs_len = mbstowcs(NULL, src.c_str(), 0);
std::vector<wchar_t> buffer(wcs_len + 1);
mbstowcs(&buffer[0], src.c_str(), src.size());
res.assign(buffer.begin(), buffer.end() - 1);
return res;
}
/*
wstring converts to string
*/
std::string ws2s(const std::wstring & src)
{
setlocale(LC_CTYPE, "");
std::string res = "";
size_t const mbs_len = wcstombs(NULL, src.c_str(), 0);
std::vector<char> buffer(mbs_len + 1);
wcstombs(&buffer[0], src.c_str(), buffer.size());
res.assign(buffer.begin(), buffer.end() - 1);
return res;
}
If the second argument to setlocale is NULL, it does nothing apart from returning the current locale. But you're not doing that. You're sending it a string entirely consisting of a single nil byte, aka "". My setlocale man page says
If locale is an empty string, "", each part of the locale that should be modified is set according to the environment variables. The details are implementation-dependent.
So what this is doing for you is setting the locale to whatever the user has specified or to the system default.
Without running setlocale at all presumably leaves the current locale either uninitialized or NULL on your system, which is why your program fails without that setting.
Two other man pages for stuff you're using say
The behavior of mbstowcs() depends on the LC_CTYPE category of the current locale.
The behavior of wcstombs() depends on the LC_CTYPE category of the current locale.
Presumably these routines are what is failing if you haven't set the locale at all.
I would guess that you probably don't need to run the setlocale statement on every invocation of these routines, but you do need to make sure it's run at least once before running them.
As far as what happens differently depending on the current locale, I believe that would be how exactly the multibyte string is converted to wide characters and vis versa. I think that the man page for those routines leaves it vague because of that difference. Personally, I'd prefer if it set some examples, such as, "if the current locale is C, the multibyte string is ASCII characters." I would guess there's also at least one in which it is interpreted as UTF-8, but I don't know enough about the different locales to say exactly which one that is. There's probably also at least one locale where the multibyte string happened to be another two bytes per character encoding, but C and C++ would still treat it as bytes.
Edit: Thinking about this more, given the characters you added to the example code, it might make sense to explicitly state that using locales that do not support Chinese characters will cause the final printf to report that the length was -1, and this includes the default C locale. In this case, the contents of the buffer is not clearly specified by the standard - at least, my reading of it indicates that the buffer value will probably be all of the characters up to but not including the one that failed to convert. While neither the C++ documentation nor the C documentation state what happens regarding the character that could not be converted. I haven't paid for the official standards, but I do have copies of the last free releases. C++17 defers to C17. C17 also refrains from commenting on this aspect of this function. For wcsrtombs, it explicitly states that the conversion state is unspecified. However, on wcstombs_s, C17 states
If the conversion stops without converting a null wide character and dst is not a null pointer, then a null character is stored into the array pointed to by dst immediately following any multibyte characters already stored.
In my own experiments with the code provided by the OP above, it appears that the wcstombs implementation on Fedora 28 simply refrains from making any further changes to the buffer. That seems to indicate to me, if the exact behavior of the code matters for this situation, it may make sense to use wcstombs_s instead. But at a minimum, you just check to see if the length returned is -1, and if it is, report an error rather than assuming the conversion worked.

Converting std::string to a const WCHAR *string required by a GDI+ parameter

Short version I am using unicode. I am attempting to use a std::string, to a function that requires a const WCHAR string; DrawString(const WCHAR, ...
I compile with GCC. Everything is unicode, I have specified.
I have been trying to convert a string, into a wchar_t*. The purpose is so that I can output using a GDI+ function, its parameters require it so.
Here is how I have outputted string literals, no problems, debugs fine, works fine.
http://msdn.microsoft.com/en-us/library/ms535991%28v=vs.85%29.aspx for reference why:
// works fine
wchar_t* wcBuff;
wcBuff = (wchar_t*)L"Some text here.\0";
AddString(wcBuff, wcslen(wcBuff), &gFontFamilyInfo, FontStyleBold, 20, ptOrg_Controls, &strFormat_Info);
Now this is what I have been trying, all day, and a side note: my conversion function works fine, it is not an issue, nor creating one.
// problems
string s = "Level " + convert::intToString(6) + "\0";
// try 1 - Segfault
wchar_t* wcBuff = new wchar_t[s.length() + 1];
copy(s.begin(), s.end(), wcBuff);
// random tries, compiles, but access violations (my conversion function here has worked other places, do not know for sure here.
wchar_t* wcBuff;
wstring wstr = convert::stringToWideChar(s);
wstring strvalue = convert::stringToWideChar(s);
wcBuff = (wchar_t*)strvalue.c_str();
wcBuff = (wchar_t*)wstr.c_str();
wstring foo;
foo.assign(s.begin(), s.end());
wcBuff = (wchar_t*)foo.c_str();
Everything compiles, but then presents problems. Some runtime errors as soon as it reaches that point. Others access violations and segfaults. Some compiles and debugs no problem, but the strings output constantly changes with random characters.
Any ideas?
(this is not really an answer, but it's too big for a comment)
Try 1: you didn't null-terminate the string
Try 2: can't comment without seeing the conversion function. Remove the casts.
Try 3: Remove the casts, should be OK.
In all cases use wchar_t const *wcBuff. If "Try 3" fails then it means you have a bug somewhere else in your code, that is showing up here. Try to produce a MCVE. You should be able to get it down to about 10-20 lines.
Even if you manage to write the correct code for what you're intending, this is a fairly naive conversion as it doesn't handle characters outside the 0-127 range properly. You need to think about whether that is what you want, or whether you want to do a UTF-8 conversion, etc.
In Windows you can use MultiByteToWideChar.
#include <string>
int main() {
// Can use convenient wstring
std::wstring wstr = L"My wide string";
// When you need a whar_t* just do this
const wchar_t* s = wstr.c_str();
// unicode form of strcpy
wchar_t buf[100] = {0};
wcscpy (buf,s);
// And if you want to convert from string to wstring
std::string thin = "I only take up one byte per character!";
std::wstring wide(thin.begin(), thin.end());
return 0;
}
I first get my data into a wstring. Like this:
(Converting from string):
std::string sString = "This is my string text";
std::wstring str1(sString.begin(), sString.end());
(Converting from int):
wstring str1 = std::to_wstring(BirthDate);
Then, I use it in GDI+ Command like this:
graphics.DrawString(str1.c_str(), -1,
&font, PointF(10, 5), &st);
First thing first. GDI+ is a C++ library. It uses Microsoft C++ ABI. Microsoft C++ ABI is wildly incompatible with gcc so you might just forget about using it. You can try to use WinAPI or any other library that uses C calling conventions.
Now for the wstring question. wchar_t is 32 bits wide in gcc, but Windows APIs require it to be 16 bits wide. You cannot use any native Windows call that requires wchar_t.
You can use -fshort-wchar command line option in gcc, that would make wchar_t 16 bits wide and you will regain compatibility with Windows APIs, but lose compatibility with libc, so no library functions that act on wchar_t for you. std::wstring will probably work as it's header-only, but wprintf or wscpy or all other compiled stuff won't.
None of this is detected at compile time, as the only things gcc sees are header files. It cannot tell whether corresponding libraries are compiled with 16-bit wchar_t or 32-bit wchar_t.
You can use uint16_t when you need to pass an array of wchar_t to a Windows function. If you can use C++11, it has char16_t that you can use too. Here's an example that should work with Basic Multilingual Plane characters:
std::wstring myLittleNiceWstring;
...
std::vector<uint16_t> myUglyCompatibilityString;
std::copy(myLittleNiceWstring.begin(),
myLittleNiceWstring.end(),
std::back_inserter(myUglyCompatibilityString));
myUglyCompatibilityString.push_back(0);
UglyWindowsAPI(static_cast<WCHAR*>(myUglyCompatibilityString.data());
If you have non-BMP characters, you need to convert UTF32 to UTF16 rather than just copy characters with std::copy. You can use libiconv for that or write a conversion routine yourself (it's rather simple) or just boorow some code from the internet.
It is my opinion that Windows-centric development with GCC is rather difficult because of this and other issues. You can use gcc as long as you stick to POSIX-ish APIs.

Win32 strange widechar behavior

I haven't used wide chars before. Here's the code from someone else:
char moduleFileName[512];
int size = ::GetModuleFileName(NULL,moduleFileName,sizeof(moduleFileName));
char c_drive[256];
char c_dir[256];
_splitpath_s(moduleFileName,c_drive,sizeof(c_drive),c_dir,sizeof(c_dir),NULL,0,NULL,0);
root = c_drive;
root.append(c_dir);
wchar_t moduleFileNameW[512];
int sizeW = ::GetModuleFileNameW(NULL,moduleFileNameW,sizeof(moduleFileNameW));
wchar_t w_drive[256];
wchar_t w_dir[256];
_wsplitpath_s(moduleFileNameW,w_drive,sizeof(w_drive),w_dir,sizeof(w_dir),NULL,0,NULL,0);
wroot = w_drive;
wroot.append(w_dir);
SEVEN_ZIP_EXE = wroot;
SEVEN_ZIP_EXE += L"\\7z.exe";
The point is to set a variable to where the 7z.exe file is. When I debug it on my Windows 7 Prof. 64 bit system I end up what look to me like invalid chars for wroot, e.g.
﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾﻾쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌쳌
Blockquote
Change your sizeof's to _countof's and that will fix your strange data problem.
You don't need both sets of functions. Use whichever set is appropriate for the application.
If you need SEVEN_ZIP_EXE to be a wstring, then you can eliminate the char stuff.
The problem is sizeof, and if you really want to use this sizeof(w_dir)/sizeof(wchar_t)
I gather what you are trying to do is get the directory containing your executable.
Rather than using the clumsy splitpath, I suggest the following:-
TCHAR szBuff[FILENAME_MAX];
TCHAR *ShortName;
GetFullPathName(moduleFileName, FILENAME_MAX, szBuff, &ShortName);
*(ShortName-1) = '\0'; // remove exe from path
szBuff will contain the directory, and ShortName points to the name.
The above code uses TCHAR and #define UNICODE, but you could change the functions names to the wchar names.

Defining a string with no null terminating char(\0) at the end

What are various ways in C/C++ to define a string with no null terminating char(\0) at the end?
EDIT: I am interested in character arrays only and not in STL string.
Typically as another poster wrote:
char s[6] = {'s', 't', 'r', 'i', 'n', 'g'};
or if your current C charset is ASCII, which is usually true (not much EBCDIC around today)
char s[6] = {115, 116, 114, 105, 110, 107};
There is also a largely ignored way that works only in C (not C++)
char s[6] = "string";
If the array size is too small to hold the final 0 (but large enough to hold all the other characters of the constant string), the final zero won't be copied, but it's still valid C (but invalid C++).
Obviously you can also do it at run time:
char s[6];
s[0] = 's';
s[1] = 't';
s[2] = 'r';
s[3] = 'i';
s[4] = 'n';
s[5] = 'g';
or (same remark on ASCII charset as above)
char s[6];
s[0] = 115;
s[1] = 116;
s[2] = 114;
s[3] = 105;
s[4] = 110;
s[5] = 103;
Or using memcopy (or memmove, or bcopy but in this case there is no benefit to do that).
memcpy(c, "string", 6);
or strncpy
strncpy(c, "string", 6);
What should be understood is that there is no such thing as a string in C (in C++ there is strings objects, but that's completely another story). So called strings are just char arrays. And even the name char is misleading, it is no char but just a kind of numerical type. We could probably have called it byte instead, but in the old times there was strange hardware around using 9 bits registers or such and byte implies 8 bits.
As char will very often be used to store a character code, C designers thought of a simpler way than store a number in a char. You could put a letter between simple quotes and the compiler would understand it must store this character code in the char.
What I mean is (for example) that you don't have to do
char c = '\0';
To store a code 0 in a char, just do:
char c = 0;
As we very often have to work with a bunch of chars of variable length, C designers also choosed a convention for "strings". Just put a code 0 where the text should end. By the way there is a name for this kind of string representation "zero terminated string" and if you see the two letters sz at the beginning of a variable name it usually means that it's content is a zero terminated string.
"C sz strings" is not a type at all, just an array of chars as normal as, say, an array of int, but string manipulation functions (strcmp, strcpy, strcat, printf, and many many others) understand and use the 0 ending convention. That also means that if you have a char array that is not zero terminated, you shouldn't call any of these functions as it will likely do something wrong (or you must be extra carefull and use functions with a n letter in their name like strncpy).
The biggest problem with this convention is that there is many cases where it's inefficient. One typical exemple: you want to put something at the end of a 0 terminated string. If you had kept the size you could just jump at the end of string, with sz convention, you have to check it char by char. Other kind of problems occur when dealing with encoded unicode or such. But at the time C was created this convention was very simple and did perfectly the job.
Nowadays, the letters between double quotes like "string" are not plain char arrays as in the past, but const char *. That means that what the pointer points to is a constant that should not be modified (if you want to modify it you must first copy it), and that is a good thing because it helps to detect many programming errors at compile time.
The terminating null is there to terminate the string. Without it, you need some other method to determine it's length.
You can use a predefined length:
char s[6] = {'s','t','r','i','n','g'};
You can emulate pascal-style strings:
unsigned char s[7] = {6, 's','t','r','i','n','g'};
You can use std::string (in C++). (since you're not interested in std::string).
Preferably you would use some pre-existing technology that handles unicode, or at least understands string encoding (i.e., wchar.h).
And a comment: If you're putting this in a program intended to run on an actual computer, you might consider typedef-ing your own "string". This will encourage your compiler to barf if you ever accidentally try to pass it to a function expecting a C-style string.
typedef struct {
char[10] characters;
} ThisIsNotACString;
C++ std::strings are not NUL terminated.
P.S : NULL is a macro1. NUL is \0. Don't mix them up.
1: C.2.2.3 Macro NULL
The macro NULL, defined in any of <clocale>, <cstddef>, <cstdio>, <cstdlib>, <cstring>,
<ctime>, or <cwchar>, is an implementation-defined C++ null pointer constant in this International
Standard (18.1).
In C++ you can use the string class and not deal with the null char at all.
Just for the sake of completeness and nail this down completely.
vector<char>
Use std::string.
There are dozens of other ways to store strings, but using a library is often better than making your own. I'm sure we could all come up with plenty of wacky ways of doing strings without null terminators :).
In C there generally won't be an easier solution. You could possibly do what pascal did and put the length of the string in the first character, but this is a bit of a pain and will limit your string length to the size of the integer that can fit in the space of the first char.
In C++ I'd definitely use the std::string class that can be accessed by
#include <string>
Being a commonly used library this will almost certainly be more reliable than rolling your own string class.
The reason for the NULL termination is so that the handler of the string can determine it's length. If you don't use a NULL termination, you need to pass the strings length, either through a separate parameter/variable, or as part of the string. Otherwise, you could use another delimeter, so long as it isn't used within the string itself.
To be honest, I don't quite understand your question, or if it actually is a question.
Even the string class will store it with a null. If for some reason you absolutely do not want a null character at the end of your string in memory, you'd have to manually create a block of characters, and fill it out yourself.
I can't personally think of any realistic scenario for why you'd want to do this, since the null character is what signals the end of the string. If you're storing the length of the string too, then I guess you've saved one byte at the cost of whatever the size of your variable is (likely 4 bytes), and gained faster access to the length of said string.

How to convert AS3 ByteArray into wchar_t const* filename? (Adobe Alchemy)

How to convert AS3 ByteArray into wchar_t const* filename?
So in my C code I have a function waiting for a file with void fun (wchar_t const* filename) how to send to that function my ByteArray? (Or, how should I re-write my function?)
A four month old question. Better late than never?
To convert a ByteArray to a String in AS3, there are two ways depending on how the String was stored. Firstly, if you use writeUTF it will write an unsigned short representing the String's length first, then write out the string data. The string is easiest to recover this way. Here's how it's done in AS3:
byteArray.position = 0;
var str:String = byteArray.readUTF();
Alternatively, toString also works in this case. The second way to store is with writeUTFBytes. This won't write the length to the beginning, so you'll need to track it independantly somehow. It sounds like you want the entire ByteArray to be a single String, so you can use the ByteArray's length.
byteArray.position = 0;
var str:String = byteArray.readUTFBytes(byteArray.length);
Since you want to do this with Alchemy, you just need to convert the above code. Here's a conversion of the second example:
std::string batostr(AS3_Val byteArray) {
AS3_SetS(byteArray, "position", AS3_Int(0));
return AS3_StringValue(AS3_CallS("readUTFBytes", byteArray,
AS3_Array("AS3ValType", AS3_GetS(byteArray, "length")) ));
}
This has a ridiculous amount of memory leaks, of course, since I'm not calling AS3_Release anywhere. I use a RAII wrapper for AS3_Val in my own code... for the sake of my sanity. As should you.
Anyway, the std::string my function returns will be UTF-8 multibyte. Any standard C++ technique for converting to wide characters should work from here. (search the site for endless reposts) I suggest leaving it is as it, though. I can't think of any advantage to using wide characters on this platform.