Convert dlang char/wchar to string/wstring - d

How do I convert a single char/wchar to a single-character string/wstring in d? I can't find anything online that doesn't talk about char* or wchar*.

As strings are just immutable(char)[], you can construct them like any other array with chars:
char a = 'a';
string s = [a];

There's a few different options. One is to get a pointer by just taking the address of it. You generally shouldn't use this but you should be aware it is possible.
char a = 'a';
char[] b = (&a)[0 .. 1]; // &a gets a pointer, [0..1] slices the single element
string c = b.idup; // copy it into a new string
If you used a wchar you could get a wstring out of it this way. Then std.conv.to can convert between string and wstring.
Speaking of std.conv.to, that's the next option and is actually the easiest:
import std.conv;
char a = 'a'; // or wchar
string b = to!string(a); // or to!wstring
In the real world I'd probably suggest you use this for maximum convenience and simplicity, but you lose a bit of efficiency in some cases.
Thus, the third option I'll present is std.utf.encode.
import std.utf;
char[4] buffer;
auto len = encode(buffer, a); // put the char in the buffer
writeln(buffer[0 .. len]); // slice the buffer. idup it if you want string specifically
This works for any input: char, wchar, or dchar, and will encode multi-byte code points into the string as well. To get a wstring, use wchar[2] for the buffer isntead. This is a good balance of correctness and efficiency, just at the trade of being a little less convenient.

Related

Convert std::wstring to WCHAR array

I have been searching the internet for days about this question. I have made a win32 project in which I want to convert a wstring to a WCHAR array
Please give an example
If you find out any mistakes please give an example
wstring timeNow = L"Hello";
WCHAR timeWchar[6] = {(WCHAR)timeNow.c_str()}; // Not Working
Instead of the text I see only a square when I run my program
I assume that by WCHAR, you mean wchar_t.
You can loop over the array and assign the elements.
Or if you don't feel like writing the loop yourself, you can use an algorithm from the standard library. Example:
assert(timeNow.size() < 6);
wchar_t timeWchar[6] {};
std::ranges::copy(timeNow, timeWchar);
You can't really initialize an array with a pointer.
This is closer to what you want without making a copy of the string into an array. just use a pointer to reference the chars.
wstring timeNow = L"Hello";
const WCHAR* timeWchar = timeNow.c_str();
99% of the time, the above works for whatever you need to do assuming you don't need to modify the chars.
If really need to make a make a copy of the characters into a different array, such as when you need to manipulate the string, this will do in Windows just fine - assuming you have a fixed size array that is big enough to receive the copy.
wstring timeNow = L"Hello";
WCHAR timeWchar[6];
StringCchCopyW(timeWchar, ARRAYSIZE(timeWchar), timeNow.c_str());
If you don't know the length ahead of time, then you'll need to allocate it before making the copy:
size_t allocSize = timeNow.size() + 1;
WCHAR timeWchar = new WCHAR[allocSize];
StringCchCopyW(timeWchar, allocSize, timeNow.c_str());
/* don't forget to `delete [] timeWchar` when you are done */
But then again why mess with new and delete when you can just let C++ do the work for you. Make a copy of the string and then use a pointer to reference the characters in the copy.
wstring timeNowCopy = timeNow;
const WCHAR* timeWchar = timeNowCopy.c_str(); // timeWchar points to the array copied into timeNowCopy.
If you can maintain the lifetime of timeNowCopy and timeWchar on the stack together, then you don't need to explicitly new or delete anything.
just do its .data() .
WCHAR timeWchar[6] = ...
wont work anyway
in which I want to convert a wstring to a WCHAR array
if you mean wchar_t [6] type - you just do not need this.
you don't have a situation where you can go with wchar_t [6] but can't with wstr.data() .

How to convert wstring into byte vector

Hi I have a few typedefs:
typedef unsigned char Byte;
typedef std::vector<Byte> ByteVector;
typedef std::wstring String;
I need to convert String into ByteVector, I have tried this:
String str = L"123";
ByteVector vect(str.begin(), str.end());
As a result vectror contains 3 elements: 1, 2, 3. However it is wstring so every charcter in this string is wide so my expected result would be: 0, 1, 0, 2, 0, 3.
Is there any standart way to do that or I need to write some custom function.
Byte const* p = reinterpret_cast<Byte const*>(&str[0]);
std::size_t size = str.size() * sizeof(str.front());
ByteVector vect(p, p+size);
What is your actual goal? If you just want to get the bytes representing the wchar_t objects, a fairly trivial conversion would do the trick although I wouldn't use just a cast to to unsigned char const* but rather an explicit conversion.
On the other hand, if you actually want to convert the std::wstring into a sequence encoded using e.g. UTF8 or UTF16 as is usually the case when dealing with characters, the conversion used for the encoding becomes significantly more complex. Probably the easiest approach to convert to an encoding is to use C's wcstombs():
std::vector<char> target(source.size() * 4);
size_t n = wcstombs(&target[0], &source[0], target.size());
The above fragment assumes that source isn't empty and that the last wchar_t in source is wchar_t(). The conversion uses C's global locale and assumes to convert whatever character encoding is set up there. There is also a version wcstombs_l() where you can specify the locale.
C++ has similar functionality but it is a bit harder to use in the std::codecvt<...> facet. I can provide an example if necessary.

Assigning strings of any size to a pointer to char

Before all, I must state that I'm a beginner with C++ and programming overall.
I'll get straight to the point. I'm wondering if it's possible to assign a string of characters of any size to a pointer to a character (not arrays, just a char * pointer). Would that violate any Memory Addresses?
The book I'm learning from doesn't seem to say anything about that. I can't seem to find anything on Google either.
You have your character pointer and want to dynamically create C strings
char *str;
say. This pointer will be used to point to the first character of the string. The string is a series of sequential characters (bytes) in memory. What we what to achieve this in memory:
str -> +---+---+---+---+---+----+
| H | E | L | L | O | \0 |
+---+---+---+---+---+----+
Note the final byte - This byte has the value 0 and is call the null character - it represents the end of the string and enables one to easilty know when we have come to the end.
To give str a value ne allocate this memory. In C++ this is done by the new operator like this
str = new char[6];
Note new has two versions new[] and new - one is to allocate an array of object, the other is to allocate a single object. ALWAYS use delete[] when you have allocated it with new[], similarly new/delete should be used. DO NOT MIX new[] with delete, and new with delete[]
This will allocate an array of 6 characters to place the string into. To place the characters into the string we cold do this.
str[0] = `H`;
str[1] = `E];
...
str[5] = 0;
But this would be tedious. Instead we can use strcpy to do this for us:
strcpy(str, "hello");
It knows all about the null character. There is a range of functions that operate on these types of strings - please see string
This is C strings. Once upon a time somebody invented this new language called C++. This language uses a different idea called objects that makes this stuff a lot easier. You need to look at the standard template library (or STL). Notes on these strings can be found at string. There is lots of goodies in the STL - here is a reference STL
Hope this helps
A char pointer can point to a string of any length, because the length of the string is determined by when you run into a NUL (0) byte in the string. When you store strings this way, it becomes a C-string. For instance:
const char* str = NULL; // at this point,
// doesn't point to anything (not even a string)
str = ""; // valid
str = "a"; // valid
str = "hello"; // valid
str = "farewell, cruel world"; // valid

connecting chars*

How would I connect to char* strings to each other.
For example:
char* a="Heli";
char* b="copter";
How would I connect them to one char c which should be equal to "Helicopter" ?
strncat
Or use strings.
size_t newlen = strlen(a) + strlen(b);
char *r = malloc(newlen + 1);
strcpy(r, a);
strcat(r, b);
In C++:
std::string foo(a);
std::string bar(b);
std::string result = foo+bar;
If your system has asprintf() (pretty common these days), then it's easy:
char* p;
int num_chars = asprintf(&p, "%s%s", a, b);
The second argument is a format string akin to printf(), so you can mix in constant text, ints, doubles etc., controlling field widths and precision, padding characters, justification etc.. If num_chars != -1 (an error), then p then points to heap-allocated memory that can be released with free(). Using asprintf() avoids the relatively verbose and error-prone steps to calculate the required buffer size yourself.
In C++:
std::string result = std::string(a) + b;
Note: a + b adds two pointers - not what you want, hence at least one side of the + operator needs to see a std::string, which will ensure the string-specific concatenation operator is used.
(The accepted answer of strncat is worth further comment: it can be used to concatenate more textual data after an ASCIIZ string in an existing, writeable buffer, in-so-much as that buffer has space to spare. You can't safely/portably concatenate onto a string literal, and it's still a pain to create such a buffer. If you do it using malloc() to ensure it's exactly the right length, then strcat() can be used in preference to strncat() anyway.)

how to convert char * to uchar16 in JNI C++

here's what I am trying to do:
typedef uint16_t uchar16_t;
uchar16_t buf[32];
// buf will contain timezone information like GMT-6, Eastern Daylight Time, etc
char * str = "Test";
for (int i = 0; i <= strlen(str); i++)
buf[i] = str[i];
I guess that's not correct since uchar16_t would contain 2 bytes and str contains 1 byte.
What is it that I am supposed to do ?
Strlen? buf[32]? Trying to destroy the universe?
You want to use a wstringstream.
std::wstringstream lols;
lols << "Test";
std::wstring cakes;
lols >> cakes;
Edit#Comment:
You shouldn't use strlen because any decent string system allows embedded zeros, and strlen is seriously slow. In addition, you didn't resize your buffer as needed, so if you had a string of size > 31 you would get a buffer overflow. In addition, you would have to (if you did dynamically size your buffer) manually free it afterwards. Both of these things are serious failings of the C string system. My example code makes your standard library writer do all the work and avoid all these problems for you.
That's actually OK if your string will always be ASCII. To do it correctly, the portable function is mbstowcs which assumes you're converting from the default locale or if you're on Windows then there's API functions that let you specify the source code page explicitly.
Your code will work, as long as str is ASCII; calling strlen() in the loop condition is probably a bad idea, though. It might be easier to just use swprintf() if it's available on your system:
uchar16_t buf[32];
char *str = "Test";
swprintf(buf, sizeof buf, "%s", str);
Have a look here.
Also, is there a good reason you are defining your own type?
If you have a (narrow) char string, you cannot convert it to
a wchar_t string by setting your locale to "C" and then passing
the string through mbstowcs(). That's because the "C" locale specifies
a -particular- character encoding, and that encoding might not match
the encoding of the execution character set, so mbstowcs() might
map the characters to something unexpected, or could even fail
(if the execution character set happened to use encodings that
were incompatible with the encoding structure for the C locale
character set.)
Thus, in order to convert a char
string into a wider string, you have
to copy the chars one by one into an
array of wchar_t . If you need to work
with Unicode or utf-16 or whatever
after that, then wcstombs() is what
you should look at.