int main()
{
char str[200] = {0};
char out[500] = {0};
str[0]=0x00; str[1]=0x52; str[2]=0x00; str[3]=0x65; str[4]=0x00; str[5]=0x73; str[6]= 0x00; str[7]=0x74;
for(int i=0;i<sizeof(str);i++)
cout<<"-"<<str[i];
changeCharEncoding("UCS-2","ISO8859-1",str,out,sizeof(out));
cout<<"\noutput : "<<out;
for(int i=0;i<sizeof(out);i++)
cout<<":"<<out[i];
}
//encoding function
int changeCharEncoding(const char *from_charset, const char *to_charset, const char *input, char *output, int out_size)
{
size_t input_len = 8;
size_t output_len = out_size;
iconv_t l_cd;
if ((l_cd = iconv_open (to_charset, from_charset)) == (iconv_t) -1)
{
return -1;
}
int rc = iconv(l_cd, (char **)&input, &input_len, (char **)&output, &output_len);
if (rc == -1)
{
iconv_close(l_cd);
return -2;
}
else
{
iconv_close(l_cd);
}
}
Please suggest me a method to convert 16 bit data to 8 bit.I have tried it using iconv. Also suggest me if there is something else to do the same.
It looks like you are trying to convert between UTF-16 and UTF-8 encoding:
Try changing your call of changeCharEncoding() to:
changeCharEncoding("UTF-16","UTF-8",str,out,sizeof(out));
The resulting UTF-8 output should be
刀攀猀琀
On a sidenote: there are several things in your code that you should consider improving. For example both changeCharEncoding and main are declared to return an int whereas your implementation does not.
Generally speaking - you cannot convert arbitrary 16 bit data into 8 bit data, you will loose some data
if you're trying to convert encodings - the same rule applies, as you cannot convert some symbols into 8bit ASCII, so they will be lost, for different platforms you can use different functions:
Windows: WideCharToMultiByte
*nix: iconv
I suspect you have an endian-ness problem: Try changing this
changeCharEncoding("UCS-2","ISO8859-1",str,out,sizeof(out));
to this
changeCharEncoding("UCS-2BE","ISO8859-1",str,out,sizeof(out));
Related
I have a long array of char (coming from a raster file via GDAL), all composed of 0 and 1. To compact the data, I want to convert it to an array of bits (thus dividing the size by 8), 4 bytes at a time, writing the result to a different file. This is what I have come up with by now:
uint32_t bytes2bits(char b[33]) {
b[32] = 0;
return strtoul(b,0,2);
}
const char data[36] = "00000000000000000000000010000000101"; // 101 is to be ignored
char word[33];
strncpy(word,data,32);
uint32_t byte = bytes2bits(word);
printf("Data: %d\n",byte); // 128
The code is working, and the result is going to be written in a separate file. What I'd like to know is: can I do that without copying the characters to a new array?
EDIT: I'm using a const variable here just to make a minimal, reproducible example. In my program it's a char *, which is continually changing value inside a loop.
Yes, you can, as long as you can modify the source string (in your example code you can't because it is a constant, but I assume in reality you have the string in writable memory):
uint32_t bytes2bits(const char* b) {
return strtoul(b,0,2);
}
void compress (char* data) {
// You would need to make sure that the `data` argument always has
// at least 33 characters in length (the null terminator at the end
// of the original string counts)
char temp = data[32];
data[32] = 0;
uint32_t byte = bytes2bits(data);
data[32] = temp;
printf("Data: %d\n",byte); // 128
}
In this example by using char* as a buffer to store that long data there is not necessary to copy all parts into a temporary buffer to convert it to a long.
Just use a variable to step through the buffer by each 32 byte length period, but after the 32th byte there needs the 0 termination byte.
So your code would look like:
uint32_t bytes2bits(const char* b) {
return strtoul(b,0,2);
}
void compress (char* data) {
int dataLen = strlen(data);
int periodLen = 32;
char* periodStr;
char tmp;
int periodPos = periodLen+1;
uint32_t byte;
periodStr = data[0];
while(periodPos < dataLen)
{
tmp = data[periodPos];
data[periodPos] = 0;
byte = bytes2bits(periodStr);
printf("Data: %d\n",byte); // 128
data[periodPos] = tmp;
periodStr = data[periodPos];
periodPos += periodLen;
}
if(periodPos - periodLen <= dataLen)
{
byte = bytes2bits(periodStr);
printf("Data: %d\n",byte); // 128
}
}
Please than be careful to the last period, which could be smaller than 32 bytes.
const char data[36]
You are in violation of your contract with the compiler if you declare something as const and then modify it.
Generally speaking, the compiler won't let you modify it...so to even try to do so with a const declaration you'd have to cast it (but don't)
char *sneaky_ptr = (char*)data;
sneaky_ptr[0] = 'U'; /* the U is for "undefined behavior" */
See: Can we change the value of an object defined with const through pointers?
So if you wanted to do this, you'd have to be sure the data was legitimately non-const.
The right way to do this in modern C++ is by using std::string to hold your string and std::string_view to process parts of that string without copying it.
You can using string_view with that char array you have though. It's common to use it to modernize the classical null-terminated string const char*.
The response payload of my http request looks like this (but can be modified to any string best suitable for the task):
"{0X00,0X01,0XC8,0X00,0XC8,0X00,
0XFF,0XFF,0XFF,0XFF,0XFF,0XFF,0XFF,0XFF,}"
How do I turn it into an unsigned char array containing the hex values like this:
unsigned char gImage_test[14] = { 0X00,0X01,0XC8,0X00,0XC8,0X00,
0XFF,0XFF,0XFF,0XFF,0XFF,0XFF,0XFF,0XFF,}
Additional information: The length of the payload string is known in advance and always the same. Some partial solution I found can't be directly applied due to the limitations of the wrapper nature of Arduino for c++. Looking for a simple solution within the Arduino IDE.
Use sscanf("%x", ...), here an example of just 3 hex numbers:
const char *buffer = "{0X00,0X01,0XC8}";
unsigned int data[3];
int read_count = sscanf(buffer, "{%x,%x,%x}", data, data+1, data+2);
// if successful read_count will be 3
If using sscanf() (#include <stdio.h>) is within your limitations then you can call with it "%hhx" to extract each individual hex value into an unsigned char like this:
const int PAYLOAD_LENGTH = 14; // Known in advance
unsigned char gImage_test[PAYLOAD_LENGTH];
#include <stdio.h>
int main()
{
const char* bufferPtr = "{0X00,0X01,0XC8,0X00,0XC8,0X00,0XFF,0XFF,0XFF,0XFF,0XFF,0XFF,0XFF,0XFF}";
for (int i = 0; i < PAYLOAD_LENGTH && sscanf(bufferPtr + 1, "%hhx", &gImage_test[i]); i++, bufferPtr += 5);
return 0;
}
I'm trying to write a program that parses ID3 tags, for educational purposes (so please explain in depth, as I'm trying to learn). So far I've had great success, but stuck on an encoding issue.
When reading the mp3 file, the default encoding for all text is ISO-8859-1. All header info (frame IDs etc) can be read in that encoding.
This is how I've done it:
ifstream mp3File("../myfile.mp3");
mp3File.read(mp3Header, 10); // char mp3Header[10];
// .... Parsing the header
// After reading the main header, we get into the individual frames.
// Read the first 10 bytes from buffer, get size and then read data
char encoding[1];
while(1){
char frameHeader[10] = {0};
mp3File.read(frameHeader, 10);
ID3Frame frame(frameHeader); // Parses frameHeader
if (frame.frameId[0] == 'T'){ // Text Information Frame
mp3File.read(encoding, 1); // Get encoding
if (encoding[0] == 1){
// We're dealing with UCS-2 encoded Unicode with BOM
char data[frame.size];
mp3File.read(data, frame.size);
}
}
}
This is bad code, because data is a char*, its' inside should look like this (converted undisplayable chars to int):
char = [0xFF, 0xFE, C, 0, r, 0, a, 0, z, 0, y, 0]
Two questions:
What are the first two bytes? - Answered.
How can I read wchar_t from my already open file? And then get back to reading the rest of it?
Edit Clarification: I'm not sure if this is the correct way to do it, but essentially what I wanted to do was.. Read the first 11 bytes to a char array (header+encoding), then the next 12 bytes to a wchar_t array (the name of the song), and then the next 10 bytes to a char array (the next header). Is that possible?
I figured out a decent solution: create a new wchar_t buffer and add the characters from the char array in pairs.
wchar_t* charToWChar(char* cArray, int len) {
char wideChar[2];
wchar_t wideCharW;
wchar_t *wArray = (wchar_t *) malloc(sizeof(wchar_t) * len / 2);
int counter = 0;
int endian = BIGENDIAN;
// Check endianness
if ((uint8_t) cArray[0] == 255 && (uint8_t) cArray[1] == 254)
endian = LITTLEENDIAN;
else if ((uint8_t) cArray[1] == 255 && (uint8_t) cArray[0] == 254)
endian = BIGENDIAN;
for (int j = 2; j < len; j+=2){
switch (endian){
case LITTLEENDIAN: {wideChar[0] = cArray[j]; wideChar[1] = cArray[j + 1];} break;
default:
case BIGENDIAN: {wideChar[1] = cArray[j]; wideChar[0] = cArray[j + 1];} break;
}
wideCharW = (uint16_t)((uint8_t)wideChar[1] << 8 | (uint8_t)wideChar[0]);
wArray[counter] = wideCharW;
counter++;
}
wArray[counter] = '\0';
return wArray;
}
Usage:
if (encoding[0] == 1){
// We're dealing with UCS-2 encoded Unicode with BOM
char data[frame.size];
mp3File.read(data, frame.size);
wcout << charToWChar(data, frame.size) << endl;
}
I'm trying to display an integer on an LCD-Display. The way the Lcd works is that you send an 8-Bit ASCII-Character to it and it displays the character.
The code I have so far is:
unsigned char text[17] = "ABCDEFGHIJKLMNOP";
int32_t n = 123456;
lcd.printInteger(text, n);
//-----------------------------------------
void LCD::printInteger(unsigned char headLine[17], int32_t number)
{
//......
int8_t str[17];
itoa(number,(char*)str,10);
for(int i = 0; i < 16; i++)
{
if(str[i] == 0x0)
break;
this->sendCharacter(str[i]);
_delay_ms(2);
}
}
void LCD::sendCharacter(uint8_t character)
{
//....
*this->cOutputPort = character;
//...
}
So if I try to display 123456 on the LCD, it actually displays -7616, which obviously is not the correct integer.
I know that there is probably a problem because I convert the characters to signed int8_t and then output them as unsigned uint8_t. But I have to output them in unsigned format. I don't know how I can convert the int32_t input integer to an ASCII uint8_t-String.
On your architecture, int is an int16_t, not int32_t. Thus, itoa treats 123456 as -7616, because:
123456 = 0x0001_E240
-7616 = 0xFFFF_E240
They are the same if you truncate them down to 16 bits - so that's what your code is doing. Instead of using itoa, you have following options:
calculate the ASCII representation yourself;
use ltoa(long value, char * buffer, int radix), if available, or
leverage s[n]printf if available.
For the last option you can use the following, "mostly" portable code:
void LCD::printInteger(unsigned char headLine[17], int32_t number) {
...
char str[17];
if (sizeof(int) == sizeof(int32_t))
snprintf(str, sizeof(str), "%d", num);
else if (sizeof(long int) == sizeof(int32_t))
snprintf(str, sizeof(str), "%ld", num);
else if (sizeof(long long int) == sizeof(int32_t))
snprintf(str, sizeof(str), "%lld", num);
...
}
If, and only if, your platform doesn't have snprintf, you can use sprintf and remove the 2nd argument (sizeof(str)). Your go-to function should always be the n variant, as it gives you one less bullet to shoot your foot with :)
Since you're compiling with a C++ compiler that is, I assume, at least half-decent, the above should do "the right thing" in a portable way, without emitting all the unnecessary code. The test conditions passed to if are compile-time constant expressions. Even some fairly old C compilers could deal with such properly.
Nitpick: Don't use int8_t where a char would do. itoa, s[n]printf, etc. expect char buffers, not int8_t buffers.
Alright so I have a BYTE array that I need to ultimately convert into a LPCWSTR or const WCHAR* to use in a built in function. I have been able to print out the BYTE array with printf but now that I need to convert it into a string I am having problems... mainly that I have no idea how to convert something like this into a non array type.
BYTE ba[0x10];
for(int i = 0; i < 0x10; i++)
{
printf("%02X", ba[i]); // Outputs: F1BD2CC7F2361159578EE22305827ECF
}
So I need to have this same thing basically but instead of printing the array I need it transformed into a LPCWSTR or WCHAR or even a string. The main problem I am having is converting the array into a non array form.
LPCWSTR represents a UTF-16 encoded string. The array contents you have shown are outside the 7bit ASCII range, so unless the BYTE array is already encoded in UTF-16 (the array you showed is not, but if it were, you could just use a simple type-cast), you will need to do a conversion to UTF-16. You need to know the particular encoding of the array before you can do that conversion, such as with the Win32 API MultiByteToWideChar() function, or third-party libraries like iconv or ICU, or built-in locale convertors in C++11, etc. So what is the actual encoding of the array, and where is the array data coming from? It is not UTF-8, for instance, so it has to be something else.
Alright I got it working. Now I can convert the BYTE array to a char* var. Thanks for the help guys but the formatting wasn't a large problem in this instance. I appreciate the help though, its always nice to have some extra input.
// Helper function to convert
Char2Hex(unsigned char ch, char* szHex)
{
unsigned char byte[2];
byte[0] = ch/16;
byte[1] = ch%16;
for(int i = 0; i < 2; i++)
{
if(byte[i] >= 0 && byte[i] <= 9)
{
szHex[i] = '0' + byte[i];
}
else
szHex[i] = 'A' + byte[i] - 10;
}
szHex[2] = 0;
}
// Function used throughout code to convert
CharStr2HexStr(unsigned char const* pucCharStr, char* pszHexStr, int iSize)
{
int i;
char szHex[3];
pszHexStr[0] = 0;
for(i = 0; i < iSize; i++)
{
Char2Hex(pucCharStr[i], szHex);
strcat(pszHexStr, szHex);
}
}