Converting a unsigned char(BYTE) array to const t_wchar* (LPCWSTR)

Converting a unsigned char(BYTE) array to const t_wchar* (LPCWSTR) - c++

Alright so I have a BYTE array that I need to ultimately convert into a LPCWSTR or const WCHAR* to use in a built in function. I have been able to print out the BYTE array with printf but now that I need to convert it into a string I am having problems... mainly that I have no idea how to convert something like this into a non array type.
BYTE ba[0x10];
for(int i = 0; i < 0x10; i++)
{
printf("%02X", ba[i]); // Outputs: F1BD2CC7F2361159578EE22305827ECF
}
So I need to have this same thing basically but instead of printing the array I need it transformed into a LPCWSTR or WCHAR or even a string. The main problem I am having is converting the array into a non array form.

LPCWSTR represents a UTF-16 encoded string. The array contents you have shown are outside the 7bit ASCII range, so unless the BYTE array is already encoded in UTF-16 (the array you showed is not, but if it were, you could just use a simple type-cast), you will need to do a conversion to UTF-16. You need to know the particular encoding of the array before you can do that conversion, such as with the Win32 API MultiByteToWideChar() function, or third-party libraries like iconv or ICU, or built-in locale convertors in C++11, etc. So what is the actual encoding of the array, and where is the array data coming from? It is not UTF-8, for instance, so it has to be something else.

Alright I got it working. Now I can convert the BYTE array to a char* var. Thanks for the help guys but the formatting wasn't a large problem in this instance. I appreciate the help though, its always nice to have some extra input.
// Helper function to convert
Char2Hex(unsigned char ch, char* szHex)
{
unsigned char byte[2];
byte[0] = ch/16;
byte[1] = ch%16;
for(int i = 0; i < 2; i++)
{
if(byte[i] >= 0 && byte[i] <= 9)
{
szHex[i] = '0' + byte[i];
}
else
szHex[i] = 'A' + byte[i] - 10;
}
szHex[2] = 0;
}
// Function used throughout code to convert
CharStr2HexStr(unsigned char const* pucCharStr, char* pszHexStr, int iSize)
{
int i;
char szHex[3];
pszHexStr[0] = 0;
for(i = 0; i < iSize; i++)
{
Char2Hex(pucCharStr[i], szHex);
strcat(pszHexStr, szHex);
}
}

Related

Encoding Vietnamese characters from ISO88591, UTF8, UTF16BE, UTF16LE, UTF16 to Hex and vice versa using C++

I have edited my post. Currently what I'm trying to do is to encode an input string from the user and then convert it to Hex formats. I can do it properly if it does not contain any Vietnamese character.
If my inputString is "Hello". But when I try to input a string such as "Tôi", I don't know how to do it.
enum Encodings { USASCII, ISO88591, UTF8, UTF16BE, UTF16LE, UTF16, BIN, OCT, HEX };
switch (Encodings)
{
case USASCII:
ASCIIToHex(inputString, &ascii); //hello output 48656C6C6F
return new ByteField(ascii.c_str());
case ISO88591:
ASCIIToHex(inputString, &ascii);//hello output 48656C6C6F
//tôi output 54F469
return new ByteField(ascii.c_str());
case UTF8:
ASCIIToHex(inputString, &ascii);//hello output 48656C6C6F
//tôi output 54C3B469
return new ByteField(ascii.c_str());
case UTF16BE:
ToUTF16(inputString, &ascii, Encodings);//hello output 00480065006C006C006F
//tôi output 005400F40069
return new ByteField(ascii.c_str());
case UTF16:
ToUTF16(inputString, &ascii, Encodings);//hello output FEFF00480065006C006C006F
//tôi output FEFF005400F40069
return new ByteField(ascii.c_str());
case UTF16LE:
ToUTF16(inputString, &ascii, Encodings);//hello output 480065006C006C006F00
//tôi output 5400F4006900
return new ByteField(ascii.c_str());
}
void StringUtilLib::ASCIIToHex(std::string s, std::string * result)
{
int n = s.length();
for (int i = 0; i < n; i++)
{
unsigned char c = s[i];
long val = long(c);
std::string bin = "";
while (val > 0)
{
(val % 2) ? bin.push_back('1') :
bin.push_back('0');
val /= 2;
}
reverse(bin.begin(), bin.end());
result->append(ConvertBinToHex(bin));
}
}
std::string ToUTF16(std::string s, std::string * result, int encodings) {
int n = s.length();
if (encodings == UTF16) {
result->append("FEFF");
}
for (int i = 0; i < n; i++)
{
int val = int(s[i]);
std::string bin = "";
while (val > 0)
{
(val % 2) ? bin.push_back('1') :
bin.push_back('0');
val /= 2;
}
reverse(bin.begin(), bin.end());
if (encodings == UTF16 || encodings == UTF16BE) {
result->append("00" + ConvertBinToHex(bin));
}
if (encodings == UTF16LE) {
result->append(ConvertBinToHex(bin) + "00");
}
}
}
std::string ConvertBinToHex(std::string str) {
long long temp = atoll(str.c_str());
int dec_value = 0;
int base = 1;
int i = 0;
while (temp) {
int last_digit = temp % 10;
temp = temp / 10;
dec_value += last_digit * base;
base = base * 2;
}
char hexaDeciNum[10];
while (dec_value != 0)
{
int temp = 0;
temp = dec_value % 16;
if (temp < 10)
{
hexaDeciNum[i] = temp + 48;
i++;
}
else
{
hexaDeciNum[i] = temp + 55;
i++;
}
dec_value = dec_value / 16;
}
str.clear();
for (int j = i - 1; j >= 0; j--) {
str = str + hexaDeciNum[j];
}
return str;
}

The question is completely unclear. To encode something you need an input right? So when you say "Encoding Vietnamese Character to UTF8, UTF16" what's your input string and what's the encoding before converting to UTF-8/16? How do you input it? From file or console?
And why on earth are you converting to binary and then to hex? You can print directly to binary and hex from the bytes, no need to convert from binary to hex. Note that converting to binary like that is fine for testing but vastly inefficient in production code. I also don't know what you mean by "But what if my letter is "Á" or "À" which is a Vietnamese letter I cannot get the value of it". Please show a minimal, reproducible example along with the input/output
But I think you just want to output the UTF encoded bytes from a string literal in the source code like "ÁÀ". In that case it isn't called "encoding a string" but just "outputting a string"
Both Á and À in Unicode can be represented by precomposed characters (U+00C1 and U+00C0) or combining characters (A + U+0301 ◌́/U+0300 ◌̀). You can switch between them by selecting "Unicode dựng sẵn" or "Unicode tổ hợp" in Unikey. Suppose you have those characters in string literal form then std::string str = "ÁÀ" contains a series of bytes that corresponds to the above letters in the source file encoding. So depending on which encoding you save the *.cpp file as (CP1252, CP1258, UTF-8...), the output byte values will be different
To force UTF-8/16/32 encoding you just need to use the u8, u and U suffix respectively, along with the correct type (char8_t, char16_t, char32_t or std::u8string/std::u16string/std::u32string)
std::u8string utf8 = u8"ÁÀ";
std::u16string utf16 = u"ÁÀ";
std::u32string utf32 = U"ÁÀ";
Then just use c_str() to get the underlying buffers and print the bytes. In C++14 std::u8string is not available yet so just save the file as UTF-8 and use std::string. Similarly you can read std::u*string directly from std::cin to print the encoding of a user-input string
Edit:
To convert between UTF encodings use the standard std::codecvt, std::wstring_convert, std::codecvt_utf8_utf16...
Working on non-Unicode encodings is trickier and needs some external library like ICU or OS-dependent APIs
WideCharToMultiByte and MultiByteToWideChar on Windows
iconv on Linux
Limiting to ISO-8859-1 makes it easier but you still need many lookup tables, and there's no way to convert other encodings to ASCII without loss of information

-64 is the correct representation of À if you are using signed char and CP1258. If you want a positive number you need to cast to unsigned char first.
If you are indeed using CP1258, you are probably on Windows. To convert your input string to UTF-16, you probably want to use a Windows platform API such as MultiByteToWideChar which accepts a code page parameter (of course you have to use the correct code page). Alternatively you may try a standard function like mbstowcs but you need to set up your locale correctly before using it.
You might find it easier to switch to wide characters throughout your application, and avoid most transcoding.
As a side note, converting an integer to binary only to convert that to hexadecimal is not an easy or efficient way to display a hexadecimal representation of an integer.

substitute strlen with sizeof for c-string

I want to use mbstowcs_s method but without iostream header. Therefore I cannot use strlen to predict the size of my buffer. The following method has to simply change c-string to wide c-string and return it:
char* changeToWide(char* value)
{
wchar_t* vOut = new wchar_t[strlen(value)+1];
mbstowcs_s(NULL,vOut,strlen(val)+1,val,strlen(val));
return vOut;
}
As soon as i change it to
char* changeToWide(char* value)
{
wchar_t* vOut = new wchar_t[sizeof(value)];
mbstowcs_s(NULL,vOut,sizeof(value),val,sizeof(value)-1);
return vOut;
}
I get wrong results (values are not the same in both arrays). What is the best way to work it out?
I am also open for other ideas how to make that conversion without using strings but pure arrays

Given a char* or const char* you cannot use sizeof() to get the size of the string being pointed by your char* variable. In this case, sizeof() will return you the number of bytes a pointer uses in memory (commonly 4 bytes in 32-bit architectures and 8 bytes in 64-bit architectures).
If you have an array of characters defined as array, you can use sizeof:
char text[] = "test";
auto size = sizeof(text); //will return you 5 because it includes the '\0' character.
But if you have something like this:
char text[] = "test";
const char* ptext = text;
auto size2 = sizeof(ptext); //will return you probably 4 or 8 depending on the architecture you are working on.

Not that I am an expert on this matter, but char to wchar_t conversion being made is seemingly nothing but using a wider space for the exact same bytes, in other words, prefixing each char with some set of zeroes.
I don't know C++ either, just C, but I can derive what it probably would look like in C++ by looking at your code, so here it goes:
wchar_t * changeToWide( char* value )
{
//counts the length of the value-array including the 0
int i = 0;
while ( value[i] != '\0' ) i++;
//allocates enough much memory
wchar_t * vOut = new wchar_t[i];
//assigns values including the 0
i = 0;
while ( ( vOut[i] = 0 | value[i] ) != '\0' ) i++;
return vOut;
}
0 | part looks truly obsolete to me, but I felt like including it, don't really know why...

Coverting unsigned char* returned from SHA1 to a string

I used this link to create a SHA1 hash for any data using C++. But the output buffer from SHA1 call is an unsigned char*. I want to store the hexadecimal values i.e. the Message Digest values so that I can use them for other operations.
As those are unsigned char* it doesn't make sense in converting them to a string and then performing hexadecimal conversion. So I have to do the hex conversion and then store the values as a string or char*. How can I do this?
SHA1(ibuf, strlen(ibuf), obuf);
for (i = 0; i < 20; i++) {
printf("%02x ", obuf[i]);

To format to a char[], use snprintf:
char out[61]; //null terminator
for (i = 0; i < 20; i++) {
snprintf(out+i*3, 4, "%02x ", obuf[i])
}
Edit: I see you've tagged your question C++. This is a purely C solution, mostly because I don't know C++.
We're using a max size of 4 because we need to include the null terminator in that count (by the function definition). We only move ahead by three specifically to overwrite the null terminator.
The extra space at the end is bound to happen based on our format string of "%02x ", but if we special-case the last element we can use a different format string of "%02x" to avoid that.

Convert wchar_t to char

I was wondering is it safe to do so?
wchar_t wide = /* something */;
assert(wide >= 0 && wide < 256 &&);
char myChar = static_cast<char>(wide);
If I am pretty sure the wide char will fall within ASCII range.

Why not just use a library routine wcstombs.

assert is for ensuring that something is true in a debug mode, without it having any effect in a release build. Better to use an if statement and have an alternate plan for characters that are outside the range, unless the only way to get characters outside the range is through a program bug.
Also, depending on your character encoding, you might find a difference between the Unicode characters 0x80 through 0xff and their char version.

You are looking for wctomb(): it's in the ANSI standard, so you can count on it. It works even when the wchar_t uses a code above 255. You almost certainly do not want to use it.
wchar_t is an integral type, so your compiler won't complain if you actually do:
char x = (char)wc;
but because it's an integral type, there's absolutely no reason to do this. If you accidentally read Herbert Schildt's C: The Complete Reference, or any C book based on it, then you're completely and grossly misinformed. Characters should be of type int or better. That means you should be writing this:
int x = getchar();
and not this:
char x = getchar(); /* <- WRONG! */
As far as integral types go, char is worthless. You shouldn't make functions that take parameters of type char, and you should not create temporary variables of type char, and the same advice goes for wchar_t as well.
char* may be a convenient typedef for a character string, but it is a novice mistake to think of this as an "array of characters" or a "pointer to an array of characters" - despite what the cdecl tool says. Treating it as an actual array of characters with nonsense like this:
for(int i = 0; s[i]; ++i) {
wchar_t wc = s[i];
char c = doit(wc);
out[i] = c;
}
is absurdly wrong. It will not do what you want; it will break in subtle and serious ways, behave differently on different platforms, and you will most certainly confuse the hell out of your users. If you see this, you are trying to reimplement wctombs() which is part of ANSI C already, but it's still wrong.
You're really looking for iconv(), which converts a character string from one encoding (even if it's packed into a wchar_t array), into a character string of another encoding.
Now go read this, to learn what's wrong with iconv.

An easy way is :
wstring your_wchar_in_ws(<your wchar>);
string your_wchar_in_str(your_wchar_in_ws.begin(), your_wchar_in_ws.end());
char* your_wchar_in_char = your_wchar_in_str.c_str();
I'm using this method for years :)

A short function I wrote a while back to pack a wchar_t array into a char array. Characters that aren't on the ANSI code page (0-127) are replaced by '?' characters, and it handles surrogate pairs correctly.
size_t to_narrow(const wchar_t * src, char * dest, size_t dest_len){
size_t i;
wchar_t code;
i = 0;
while (src[i] != '\0' && i < (dest_len - 1)){
code = src[i];
if (code < 128)
dest[i] = char(code);
else{
dest[i] = '?';
if (code >= 0xD800 && code <= 0xD8FF)
// lead surrogate, skip the next code unit, which is the trail
i++;
}
i++;
}
dest[i] = '\0';
return i - 1;
}

Technically, 'char' could have the same range as either 'signed char' or 'unsigned char'. For the unsigned characters, your range is correct; theoretically, for signed characters, your condition is wrong. In practice, very few compilers will object - and the result will be the same.
Nitpick: the last && in the assert is a syntax error.
Whether the assertion is appropriate depends on whether you can afford to crash when the code gets to the customer, and what you could or should do if the assertion condition is violated but the assertion is not compiled into the code. For debug work, it seems fine, but you might want an active test after it for run-time checking too.

Here's another way of doing it, remember to use free() on the result.
char* wchar_to_char(const wchar_t* pwchar)
{
// get the number of characters in the string.
int currentCharIndex = 0;
char currentChar = pwchar[currentCharIndex];
while (currentChar != '\0')
{
currentCharIndex++;
currentChar = pwchar[currentCharIndex];
}
const int charCount = currentCharIndex + 1;
// allocate a new block of memory size char (1 byte) instead of wide char (2 bytes)
char* filePathC = (char*)malloc(sizeof(char) * charCount);
for (int i = 0; i < charCount; i++)
{
// convert to char (1 byte)
char character = pwchar[i];
*filePathC = character;
filePathC += sizeof(char);
}
filePathC += '\0';
filePathC -= (sizeof(char) * charCount);
return filePathC;
}

one could also convert wchar_t --> wstring --> string --> char
wchar_t wide;
wstring wstrValue;
wstrValue[0] = wide
string strValue;
strValue.assign(wstrValue.begin(), wstrValue.end()); // convert wstring to string
char char_value = strValue[0];

In general, no. int(wchar_t(255)) == int(char(255)) of course, but that just means they have the same int value. They may not represent the same characters.
You would see such a discrepancy in the majority of Windows PCs, even. For instance, on Windows Code page 1250, char(0xFF) is the same character as wchar_t(0x02D9) (dot above), not wchar_t(0x00FF) (small y with diaeresis).
Note that it does not even hold for the ASCII range, as C++ doesn't even require ASCII. On IBM systems in particular you may see that 'A' != 65

Howto read chunk of memory as char in c++

Hello I have a chunk of memory (allocated with malloc()) that contains bits (bit literal), I'd like to read it as an array of char, or, better, I'd like to printout the ASCII value of 8 consecutively bits of the memory.
I have allocated he memory as char *, but I've not been able to take characters out in a better way than evaluating each bit, adding the value to a char and shifting left the value of the char, in a loop, but I was looking for a faster solution.
Thank you
What I've wrote for now is this:
for allocation:
char * bits = (char*) malloc(1);
for writing to mem:
ifstream cleartext;
cleartext.open(sometext);
while(cleartext.good())
{
c = cleartext.get();
for(int j = 0; j < 8; j++)
{ //set(index) and reset(index) set or reset the bit at bits[i]
(c & 0x80) ? (set(index)):(reset(index));//(*ptr++ = '1'):(*ptr++='0');
c = c << 1;
}..
}..
and until now I've not been able to get character back, I only get the bits printed out using:
printf("%s\n" bits);
An example of what I'm trying to do is:
input.txt contains the string "AAAB"
My program would have to write "AAAB" as "01000001010000010100000101000010" to memory
(it's the ASCII values in bit of AAAB that are 65656566 in bits)
Then I would like that it have a function to rewrite the content of the memory to a file.
So if memory contains again "01000001010000010100000101000010" it would write to the output file "AAAB".

int numBytes = 512;
char *pChar = (char *)malloc(numBytes);
for( int i = 0; i < numBytes; i++ ){
pChar[i] = '8';
}
Since this is C++, you can also use "new":
int numBytes = 512;
char *pChar = new char[numBytes];
for( int i = 0; i < numBytes; i++ ){
pChar[i] = '8';
}

If you want to visit every bit in the memory chunk, it looks like you need std::bitset.
char* pChunk = malloc( n );
// read in pChunk data
// iterate over all the bits.
for( int i = 0; i != n; ++i ){
std::bitset<8>& bits = *reinterpret_cast< std::bitset<8>* >( pByte );
for( int iBit = 0; iBit != 8; ++iBit ) {
std::cout << bits[i];
}
}

I'd like to printout the ASCII value of 8 consecutively bits of the memory.
The possible value for any bit is either 0 or 1. You probably want at least a byte.
char * bits = (char*) malloc(1);
Allocates 1 byte on the heap. A much more efficient and hassle-free thing would have been to create an object on the stack i.e.:
char bits; // a single character, has CHAR_BIT bits
ifstream cleartext;
cleartext.open(sometext);
The above doesn't write anything to mem. It tries to open a file in input mode.
It has ascii characters and common eof or \n, or things like this, the input would only be a textfile, so I think it should only contain ASCII characters, correct me if I'm wrong.
If your file only has ASCII data you don't have to worry. All you need to do is read in the file contents and write it out. The compiler manages how the data will be stored (i.e. which encoding to use for your characters and how to represent them in binary, the endianness of the system etc). The easiest way to read/write files will be:
// include these on as-needed basis
#include <algorithm>
#include <iostream>
#include <iterator>
#include <fstream>
using namespace std;
// ...
/* read from standard input and write to standard output */
copy((istream_iterator<char>(cin)), (istream_iterator<char>()),
(ostream_iterator<char>(cout)));
/*-------------------------------------------------------------*/
/* read from standard input and write to text file */
copy(istream_iterator<char>(cin), istream_iterator<char>(),
ostream_iterator<char>(ofstream("output.txt"), "\n") );
/*-------------------------------------------------------------*/
/* read from text file and write to text file */
copy(istream_iterator<char>(ifstream("input.txt")), istream_iterator<char>(),
ostream_iterator<char>(ofstream("output.txt"), "\n") );
/*-------------------------------------------------------------*/
The last remaining question is: Do you want to do something with the binary representation? If not, forget about it. Else, update your question one more time.
E.g: Processing the character array to encrypt it using a block cipher
/* a hash calculator */
struct hash_sha1 {
unsigned char operator()(unsigned char x) {
// process
return rc;
}
};
/* store house of characters, could've been a vector as well */
basic_string<unsigned char> line;
/* read from text file and write to a string of unsigned chars */
copy(istream_iterator<unsigned char>(ifstream("input.txt")),
istream_iterator<char>(),
back_inserter(line) );
/* Calculate a SHA-1 hash of the input */
basic_string<unsigned char> hashmsg;
transform(line.begin(), line.end(), back_inserter(hashmsg), hash_sha1());

Something like this?
char *buffer = (char*)malloc(42);
// ... put something into the buffer ...
printf("%c\n", buffer[0]);
But, since you're using C++, I wonder why you bother with malloc and such...

char* ptr = pAddressOfMemoryToRead;
while(ptr < pAddressOfMemoryToRead + blockLength)
{
char tmp = *ptr;
// temp now has the char from this spot in memory
ptr++;
}

Is this what you are trying to achieve:
char* p = (char*)malloc(10 * sizeof(char));
char* p1 = p;
memcpy(p,"abcdefghij", 10);
for(int i = 0; i < 10; ++i)
{
char c = *p1;
cout<<c<<" ";
++p1;
}
cout<<"\n";
free(p);

Can you please explain in more detail, perhaps including code? What you're saying makes no sense unless I'm completely misreading your question. Are you doing something like this?
char * chunk = (char *)malloc(256);
If so, you can access any character's worth of data by treating chunk as an array: chunk[5] gives you the 5th element, etc. Of course, these will be characters, which may be what you want, but I can't quite tell from your question... for instance, if chunk[5] is 65, when you print it like cout << chunk[5];, you'll get a letter 'A'.
However, you may be asking how to print out the actual number 65, in which case you want to do cout << int(chunk[5]);. Casting to int will make it print as an integer value instead of as a character. If you clarify your question, either I or someone else can help you further.

Are you asking how to copy the memory bytes of an arbitrary struct into a char* array? If so this should do the trick
SomeType t = GetSomeType();
char* ptr = malloc(sizeof(SomeType));
if ( !ptr ) {
// Handle no memory. Probably should just crash
}
memcpy(ptr,&t,sizeof(SomeType));

I'm not sure I entirely grok what you're trying to do, but a couple of suggestions:
1) use std::vector instead of malloc/free and new/delete. It's safer and doesn't have much overhead.
2) when processing, try doing chunks rather than bytes. Even though streams are buffered, it's usually more efficient grabbing a chunk at a time.
3) there's a lot of different ways to output bits, but again you don't want a stream output for each character. You might want to try something like the following:
void outputbits(char *dest, char source)
{
dest[8] = 0;
for(int i=0; i<8; ++i)
dest[i] = source & (1<<(7-i)) ? '1':'0';
}
Pass it a char[9] output buffer and a char input, and you get a printable bitstring back. Decent compilers produce OK output code for this... how much speed do you need?

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js