iconv only works once - c++

I try to make method which converts s-jis string to utf-8 string using iconv.
I wrote a code below,
#include <iconv.h>
#include <iostream>
#include <stdio.h>
using namespace std;
#define BUF_SIZE 1024
size_t z = (size_t) BUF_SIZE-1;
bool sjis2utf8( char* text_sjis, char* text_utf8 )
{
iconv_t ic;
ic = iconv_open("UTF8", "SJIS"); // sjis->utf8
iconv(ic , &text_sjis, &z, &text_utf8, &z);
iconv_close(ic);
return true;
}
int main(void)
{
char hello[BUF_SIZE] = "hello";
char bye[BUF_SIZE] = "bye";
char tmp[BUF_SIZE] = "something else";
sjis2utf8(hello, tmp);
cout << tmp << endl;
sjis2utf8(bye, tmp);
cout << tmp << endl;
}
and, output should be
hello
bye
but in fact,
hello
hello
Does anyone know why this phenomenon occurs? What's wrong with my program?
Note that "hello" and "bye" are Japanese s-jis strings in my original program, but I altered it to make program easy to see.

I think you are misusing the iconv function by passing it the global variable z. The first time you call sjis2utf8, z is decremented to 0. The second call to sjis2utf8 have no effect (z==0) and leaves tmp unchanged.
From the iconv documentation :
size_t iconv (iconv_t cd,
const char* * inbuf, size_t * inbytesleft,
char* * outbuf, size_t * outbytesleft);
The iconv function converts one multibyte character at a time, and for each character conversion it increments *inbuf and decrements *inbytesleft by the number of converted input bytes, it increments *outbuf and decrements *outbytesleft by the number of converted output bytes, and it updates the conversion state contained in cd.
You should use two separate variables for the buffers lengths :
size_t il = BUF_SIZE - 1 ;
size_t ol = BUF_SIZE - 1 ;
iconv(ic, &text_sjis, &il, &text_utf8, &ol) ;
Then check the return value of iconv and the buffers lengths for the conversion success.

#include <iconv.h>
#include <iostream>
#include <stdio.h>
#include <string.h>
using namespace std;
const size_t BUF_SIZE=1024;
class IConv {
iconv_t ic_;
public:
IConv(const char* to, const char* from)
: ic_(iconv_open(to,from)) { }
~IConv() { iconv_close(ic_); }
bool convert(char* input, char* output, size_t& out_size) {
size_t inbufsize = strlen(input)+1;// s-jis string should be null terminated,
// if s-jis is not null terminated or it has
// multiple byte chars with null in them this
// will not work, or to provide in other way
// input buffer length....
return iconv(ic_, &input, &inbufsize, &output, &out_size);
}
};
int main(void)
{
char hello[BUF_SIZE] = "hello";
char bye[BUF_SIZE] = "bye";
char tmp[BUF_SIZE] = "something else";
IConv ic("UTF8","SJIS");
size_t outsize = BUF_SIZE;//you will need it
ic.convert(hello, tmp, outsize);
cout << tmp << endl;
outsize = BUF_SIZE;
ic.convert(bye, tmp, outsize);
cout << tmp << endl;
}
based on Kleist's answer

You must put length of entry string in third parameter of iconv.
Try:
//...
int len = strlen(text_sjis);
iconv(ic , &text_sjis, &len, &text_utf8, &z);
//...

size_t iconv (iconv_t cd,
const char* * inbuf, size_t * inbytesleft,
char* * outbuf, size_t * outbytesleft);
iconv changes the value pointed to by inbytesleft. So after your first run z is 0. To fix this you should use calculate the length of inbuf and store it in a local variable before each conversion.
It is described here: http://www.gnu.org/s/libiconv/documentation/libiconv/iconv.3.html
And since you tagged this as C++ I would suggest wrapping everything up in a nice little class, as far as I can tell from the documentation you can reuse the inconv_t gained from iconv_open for as many conversions as you'd like.
#include <iconv.h>
#include <iostream>
#include <stdio.h>
#include <string.h>
using namespace std;
const size_t BUF_SIZE = 1024;
size_t z = (size_t) BUF_SIZE-1;
class IConv {
iconv_t ic_;
public:
IConv(const char* to, const char* from)
: ic_(iconv_open(to,from)) { }
~IConv() { iconv_close(ic_); }
bool convert(char* input, char* output, size_t outbufsize) {
size_t inbufsize = strlen(input);
return iconv(ic_, &input, &inbufsize, &output, &outbufsize);
}
};
int main(void)
{
char hello[BUF_SIZE] = "hello";
char bye[BUF_SIZE] = "bye";
char tmp[BUF_SIZE] = "something else";
IConv ic("UTF8","SJIS");
ic.convert(hello, tmp, BUF_SIZE);
cout << tmp << endl;
ic.convert(bye, tmp, BUF_SIZE);
cout << tmp << endl;
}

Related

How to convert str number from dec to hex

Hello I have variable like this:
uint8_t *str value = "100663296";
I just wan't to convert to hexadecimal interpretation of str.
I doesn't need any math operations on this variable.
uint8_t *output value = "6000000";
How to do it correctly? I can't use convert to int and use sprintf because i don't have memmory for this on my MCU.
UPDATE: missed the part about MCU and memory limitations :) this answer won't be useful
You can try to do the following:
convert your string representation of the number to integer value (you can use int atoi( const char * str ); function
once you have your integer you can print it as HEX using, for example, sprintf function with %x as a format parameter and you integer as a value parameter
Here is a working example: https://ideone.com/axAPWH
#include <iostream>
using namespace std;
int main() {
int n;
char hex_val[50];
n = atoi("100663296");
sprintf(hex_val, "%x", n);
cout << hex_val;
return 0;
}
Could use <charconv> if available:
#include <charconv>
#include <cstdio>
#include <cstdlib>
int main() {
auto const* str = "2147483647";
errno = 0;
char* end;
long i = std::strtol(str, &end, 10);
if (end == str || errno == ERANGE)
return 1;
char out[20];
auto const [p, err] = std::to_chars(out, out + sizeof out, i, 16);
if (err != std::errc{})
return 2;
*p = 0; // null terminate
std::puts(out);
}

Why does my char* copier return different things?

Writing a simple string copier and testing it in the main() fucntion. What's odd is that sometimes the program returns
"HelloHello"
like it should, but maybe every third time I run it, the program prints out:
"Hello!Hello!▌▌▌▌▌▌▌▌▌▌▒"UòB╚"
Why is the tail of garbage data only sometimes being added to the end of my second string?
#include <iostream>
using namespace std;
int strlength(const char* c)
{
int size = 0;
while (*c) {
++c;
++size;
}
return size;
}
char* mystrdup(const char* c)
{
int size = strlength(c);
char* result = new char;
copy(c, c + size, result);
return result;
}
void print_array(const char* c)
{
int size = strlength(c);
while (*c) {
cout << *c;
++c;
}
}
int main()
{
char test[] = "Hello!";
char* res = mystrdup(test);
print_array(test);
print_array(res);
}
The program has undefined behavior because you are allocating not enough memory for the result string.
char* mystrdup(const char* c)
{
int size = strlength(c);
char* result = new char;
^^^^^^^^^^^^^^^^^^^^^^^
copy(c, c + size, result);
return result;
}
Moreover you are not copying the terminating zero to the result string.
At least the two functions strlength and mystrdup can look the following way
size_t strlength( const char *s )
{
size_t size = 0;
while ( s[size] ) ++size;
return size;
}
char * mystrdup( const char *s )
{
size_t size = strlength( s ) + 1;
char *result = new char[size];
copy( s, s + size, result );
return result;
}
Of course instead of the standard algorithm std::copy you could use the standard C function strcpy declared in the header <cstring>.
strcpy( result, s );
And do not forget to delete the allocated array.
char* res = mystrdup(test);
//…
delete [] res;
Pay attention to that the function print_array does not use the variable size. There is no need to output a C-string character by character.
The function could be defined like
std::ostream & print_array( const char *s, std::ostream &os = std::cout )
{
return os << s;
}
And at last the identifier c is usually used with single objects of the type char. If you deal with a string then it is better to use the identifier s.
You have multiple bugs in your code. You allocate wrong memory (char instead of char array). You don't delete the memory. Stop using C-string and use std::string
#include <iostream>
#include <string>
using std::cout;
void print_array(const char* c)
{
while (*c) {
cout << *c;
++c;
}
}
int main()
{
std::string = "Hello!";
std::string res = test;
print_array(test.c_str());
print_array(res.c_str());
}
In strcpy you need to create a char size.
char* mystrdup(const char* c)
{
int size = strlength(c);
char* result = new char[size];
copy(c, c + size, result);
return result;
}

How to convert the template from C++ to C

I am trying to convert some C++ code to C for my compiler that can't run with C++ code. I'd like to create the template below to C. This template converts the decimal integer to hexadecimal, and adds 0 in front of value if the size of the hexadecimal string is smaller than (sizeof(T)*2). Data type T can be unsigned char, char, short, unsigned short, int, unsigned int, long long, and unsigned long long.
template< typename T > std::string hexify(T i)
{
std::stringbuf buf;
std::ostream os(&buf);
os << std::setfill('0') << std::setw(sizeof(T) * 2)
<< std::hex << i;
std::cout<<"sizeof(T) * 2 = "<<sizeof(T) * 2<<" buf.str() = "<<buf.str()<<" buf.str.c_str() = "<<buf.str().c_str()<<std::endl;
return buf.str().c_str();
}
Thank you for tour help.
Edit 1: I have tried to use the declaration
char * hexify (void data, size_t data_size)
but when I call with the int value int_value:
char * result = hexify(int_value, sizeof(int))
it doesn't work because of:
noncompetitive type (void and int).
So in this case, do I have to use a macro? I haven't tried with macro because it's complicated.
C does not have templates. One solution is to pass the maximum width integer supported (uintmax_t, in Value below) and the size of the original integer (in Size). One routine can use the size to determine the number of digits to print. Another complication is C does not provide C++’s std::string with is automatic memory management. A typical way to handle this in C is for the called function to allocate a buffer and return it to the caller, who is responsible for freeing it when done.
The code below shows a hexify function that does this, and it also shows a Hexify macro that takes a single parameter and passes both its size and its value to the hexify function.
Note that, in C, character constants such as 'A' have type int, not char, so some care is needed in providing the desired size. The code below includes an example for that.
#include <inttypes.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
char *hexify(size_t Size, uintmax_t Value)
{
// Allocate space for "0x", 2*Size digits, and a null character.
size_t BufferSize = 2 + 2*Size + 1;
char *Buffer = malloc(BufferSize);
// Ensure a buffer was allocated.
if (!Buffer)
{
fprintf(stderr,
"Error, unable to allocate buffer of %zu bytes in %s.\n",
BufferSize, __func__);
exit(EXIT_FAILURE);
}
// Format the value as "0x" followed by 2*Size hexadecimal digits.
snprintf(Buffer, BufferSize, "0x%0*" PRIxMAX, (int) (2*Size), Value);
return Buffer;
}
/* Provide a macro that passes both the size and the value of its parameter
to the hexify function.
*/
#define Hexify(x) (hexify(sizeof (x), (x)))
int main(void)
{
char *Buffer;
/* Show two examples of using the hexify function with different integer
types. (The examples assume ASCII.)
*/
char x = 'A';
Buffer = hexify(sizeof x, x);
printf("Character '%c' = %s.\n", x, Buffer); // Prints "0x41".
free(Buffer);
int i = 123;
Buffer = hexify(sizeof i, i);
printf("Integer %d = %s.\n", i, Buffer); // Prints "0x00007b".
free(Buffer);
/* Show examples of using the Hexify macro, demonstrating that 'A' is an
int value, not a char value, so it would need to be cast if a char is
desired.
*/
Buffer = Hexify('A');
printf("Character '%c' = %s.\n", 'A', Buffer); // Prints "0x00000041".
free(Buffer);
Buffer = Hexify((char) 'A');
printf("Character '%c' = %s.\n", 'A', Buffer); // Prints "0x41".
free(Buffer);
}
You don't need templates if you step down to raw bits and bytes.
If performance is important, it is also best to roll out the conversion routine by hand, since the string handling functions in C and C++ come with lots of slow overhead. The somewhat well-optimized version would look something like this:
char* hexify_data (char*restrict dst, const char*restrict src, size_t size)
{
const char NIBBLE_LOOKUP[0xF+1] = "0123456789ABCDEF";
char* d = dst;
for(size_t i=0; i<size; i++)
{
size_t byte = size - i - 1; // assuming little endian
*d = NIBBLE_LOOKUP[ (src[byte]&0xF0u)>>4 ];
d++;
*d = NIBBLE_LOOKUP[ (src[byte]&0x0Fu)>>0 ];
d++;
}
*d = '\0';
return dst;
}
This breaks down any passed type byte-by-byte, using a character type. Which is fine, when using character types specifically. It also uses caller allocation for maximum performance. (It can also be made endianess-independent with an extra check per loop.)
We can make the call a bit more convenient with a wrapper macro:
#define hexify(buf, var) hexify_data(buf, (char*)&var, sizeof(var))
Full example:
#include <string.h>
#include <stdint.h>
#include <stdio.h>
#define hexify(buf, var) hexify_data(buf, (char*)&var, sizeof(var))
char* hexify_data (char*restrict dst, const char*restrict src, size_t size)
{
const char NIBBLE_LOOKUP[0xF+1] = "0123456789ABCDEF";
char* d = dst;
for(size_t i=0; i<size; i++)
{
size_t byte = size - i - 1; // assuming little endian
*d = NIBBLE_LOOKUP[ (src[byte]&0xF0u)>>4 ];
d++;
*d = NIBBLE_LOOKUP[ (src[byte]&0x0Fu)>>0 ];
d++;
}
*d = '\0';
return dst;
}
int main (void)
{
char buf[50];
int32_t i32a = 0xABCD;
puts(hexify(buf, i32a));
int32_t i32b = 0xAAAABBBB;
puts(hexify(buf, i32b));
char c = 5;
puts(hexify(buf, c));
uint8_t u8 = 100;
puts(hexify(buf, u8));
}
Output:
0000ABCD
AAAABBBB
05
64
an optional solution is to use format string like printf
note that you can't return pointer to local variable, but you can get the buffer as argument, (here it is without boundaries check).
char* hexify(char* result, const char* format, void* arg)
{
int size = 0;
if(0 == strcmp(format,"%d") || 0 == strcmp(format,"%u"))
{
size=4;
sprintf(result,"%08x",arg);
}
else if(0 == strcmp(format,"%hd") || 0 == strcmp(format,"%hu"))
{
size=2;
sprintf(result,"%04x",arg);
}
else if(0 == strcmp(format,"%hhd")|| 0 == strcmp(format,"%hhu"))
{
size=1;
sprintf(result,"%02x",arg);
}
else if(0 == strcmp(format,"%lld") || 0 == strcmp(format,"%llu") )
{
size=8;
sprintf(result,"%016x",arg);
}
//printf("size=%d", size);
return result;
}
int main()
{
char result[256];
printf("%s", hexify(result,"%hhu", 1));
return 0;
}

EXC_BAD_ACCESS occurred when assign char with a poitner

EXC_BAD_ACCESS occurred in *str++ = *end;. What's wrong with this?
#include <iostream>
using namespace std;
int main() {
char *hello = "abcdefgh";
char c = 'c';
char *str = hello;
//printf("%s",str);
char * end = str;
char tmp;
if (str) {
while (*end) {
++end;
}
--end;
while (str < end) {
tmp = *str;
printf("hello:%s str:%c, end:%c\n", hello, *str, *end);
*str++ = *end;
*end-- = tmp;
}
}
return 0;
}
It is undefined behavior to attempt to alter a string literal. Change your code to the equivalent below, and you will see the issue:
const char *hello = "abcdefgh";
const char *str = hello;
const char * end = str;
So do you see why the line *str++ = *end; had a problem? You're attempting to write to a const area, and you can't do that.
If you want an even simpler example:
int main()
{
char *str = "abc";
str[0] = 'x';
}
Don't be surprised if this simple program produces a crash or segmentation fault when the
str[0] = 'x';
line is executed.
Unfortunately, string-literals did not have to be declared as const char* in C, and C++ brought this syntax over. So even though it looks like you are not using const, you are.
If you want the code to actually work, declare a writeable buffer, i.e. an array of char:
char hello[] = "abcdefgh";
char str[100];
strcpy(str, hello);
char end[100];
strcpy(end, str);
It seems like you're trying to reverse the string. It also looks like you're overcomplicating things.
C-style strings declared on the stack have to be declared as const char *, which means you can't change the characters as they are constant.
In C++ we use strings, and string iterators:
#include <iostream>
#include <string>
using std::string;
using std::reverse;
using std::swap;
using std::cout;
using std::endl;
int main(int argc, const char * argv[])
{
string hello("abcdefgh");
reverse(hello.begin(), hello.end());
cout << hello << endl;
return 0;
}
Manually:
static void reverse(std::string & str)
{
string::size_type b = 0, e = str.length() - 1, c = e / 2;
while(b <= c)
{
swap(str[b++], str[e--]);
}
}
Recursively:
static void reverse_helper(std::string & str,
string::size_type b,
string::size_type e)
{
if(b >= e)
return;
swap(str[b], str[e]);
reverse_helper(str, ++b, --e);
}
static void reverse(std::string & str)
{
reverse_helper(str, 0, str.length() - 1);
}

Copy number of bytes to a position in memory

If I am not correct, the codes following are used to copy an array of bytes to a position of memory in C#:
byte[] byteName = Encoding.ASCII.GetBytes("Hello There");
int positionMemory = getPosition();
Marshal.Copy(byteName, 0, new IntPtr(positionMemory), byteName.length);
How can I achieve this in native C++?
use a pointer and memcpy:
void * memcpy ( void * destination, const void * source, size_t num );
Suppose you want to copy an array A of length n into an array B
memcpy (B, A, n * sizeof(char));
This is more C than C++, the string class have copy capabilities you can use.
size_t length;
char buffer[20];
string str ("Test string...");
length=str.copy(buffer,6,5);
buffer[length]='\0';
Here's a more specific sample with a complete code:
#include <stdio.h>
#include <string>
#include <string.h>
#include <iostream>
using namespace std;
int main()
{
string s("Hello World");
char buffer [255];
void * p = buffer; // Or void * p = getPosition()
memcpy(p,s.c_str(),s.length()+1);
cout << s << endl;
cout << buffer << endl;
return 0;
}
let me know if you need more details
memcpy(), memmove(), CopyMemory(), and MoveMemory() can all be used as native equivilents of Marshal.Copy(). As for the position handling, all the .NET code is doing as casting the integer to a pointer, which you can do in C++ as well. The .NET code you showed is equivilent to the following:
std::string byteName = "Hello There";
int positionMemory = getPosition();
memcpy(reinterpret_cast<void*>(positionMemory), byteName.c_str(), byteName.length());