How to get string from CP866 bytes

How to get string from CP866 bytes - c++

My program read bytes from file and try convert it to string. Head of file is a text in CP866.
iconv_t cd = iconv_open("UTF-8","CP866");
char* iconv_in = bytes.data(); //Bytes is a char vector
char* iconv_out = (char *)malloc(counter * sizeof(char)); //counter is a length of bytes array (char vector)
size_t iconv_in_bytes =counter;
size_t iconv_out_bytes = counter;
size_t ret = iconv(cd, &iconv_in, &iconv_in_bytes, &iconv_out, &iconv_out_bytes);
if ((size_t) -1 == ret) {
cout << "Error convert";
return NULL;
}
Program ends with failure (Error convert). And this way is not simple and beautiful. There is exist more simple solution?

Related

How to use libiconv correctly in C so that it would not report "Arg list too long"?

Code
/*char* to wchar_t* */
wchar_t*strtowstr(char*str){
iconv_t cd=iconv_open("wchar_t","UTF-8");
if(cd==(iconv_t)-1){
return NULL;
}
size_t len1=strlen(str),len2=1024;
wchar_t*wstr=(wchar_t*)malloc((len1+1)*sizeof(wchar_t));
char*ptr1=str;
wchar_t*ptr2=wstr;
if((int)iconv(cd,&ptr1,&len1,(char**)&ptr2,&len2)<0){
free(wstr);
iconv_close(cd);
return NULL;
}
*ptr2=L'\0';
iconv_close(cd);
return wstr;
}
I use strerror(errno) to get the error message,it says "Arg list too long".
How can I solve it?
Thanks to the comments,I change the code above.
I just use the function to read a text file.I think it reports the error because the file is too large.So I want to know how to use iconv for long string.

According to the man page, you get E2BIG when there's insufficient room at *outbuf.
I think the fifth argument should be a number of bytes.
wchar_t *utf8_to_wstr(const char *src) {
iconv_t cd = iconv_open("wchar_t", "UTF-8");
if (cd == (iconv_t)-1)
goto Error1;
size_t src_len = strlen(src); // In bytes, excludes NUL
size_t dst_len = sizeof(wchar_t) * src_len; // In bytes, excludes NUL
size_t dst_size = dst_len + sizeof(wchar_t); // In bytes, including NUL
wchar_t *buf = malloc(dst_size);
if (!buf)
goto Error2;
wchar_t *dst = buf;
if (iconv(cd, &(char*)src, &src_len, &(char*)dst, &dst_len) == (size_t)-1)
goto Error3;
*dst = L'\0';
iconv_close(cd);
return buf;
Error3:
free(buf);
Error2:
iconv_close(cd);
Error1:
return NULL;
}

C++: Store read binary file into buffer

I'm trying to read a binary file and store it in a buffer. The problem is, that in the binary file are multiple null-terminated characters, but they are not at the end, instead they are before other binary text, so if I store the text after the '\0' it just deletes it in the buffer.
Example:
char * a = "this is a\0 test";
cout << a;
This will just output: this is a
here's my real code:
this function reads one character
bool CStream::Read (int * _OutChar)
{
if (!bInitialized)
return false;
int iReturn = 0;
*_OutChar = fgetc (pFile);
if (*_OutChar == EOF)
return false;
return true;
}
And this is how I use it:
char * SendData = new char[4096 + 1];
for (i = 0; i < 4096; i++)
{
if (Stream.Read (&iChar))
SendData[i] = iChar;
else
break;
}

I just want to mention that there is a standard way to read from a binary file into a buffer.
Using <cstdio>:
char buffer[BUFFERSIZE];
FILE * filp = fopen("filename.bin", "rb");
int bytes_read = fread(buffer, sizeof(char), BUFFERSIZE, filp);
Using <fstream>:
std::ifstream fin("filename.bin", ios::in | ios::binary );
fin.read(buffer, BUFFERSIZE);
What you do with the buffer afterwards is all up to you of course.
Edit: Full example using <cstdio>
#include <cstdio>
const int BUFFERSIZE = 4096;
int main() {
const char * fname = "filename.bin";
FILE* filp = fopen(fname, "rb" );
if (!filp) { printf("Error: could not open file %s\n", fname); return -1; }
char * buffer = new char[BUFFERSIZE];
while ( (int bytes = fread(buffer, sizeof(char), BUFFERSIZE, filp)) > 0 ) {
// Do something with the bytes, first elements of buffer.
// For example, reversing the data and forget about it afterwards!
for (char *beg = buffer, *end=buffer + bytes; beg < end; beg++, end-- ) {
swap(*beg, *end);
}
}
// Done and close.
fclose(filp);
return 0;
}

static std::vector<unsigned char> read_binary_file (const std::string filename)
{
// binary mode is only for switching off newline translation
std::ifstream file(filename, std::ios::binary);
file.unsetf(std::ios::skipws);
std::streampos file_size;
file.seekg(0, std::ios::end);
file_size = file.tellg();
file.seekg(0, std::ios::beg);
std::vector<unsigned char> vec;
vec.reserve(file_size);
vec.insert(vec.begin(),
std::istream_iterator<unsigned char>(file),
std::istream_iterator<unsigned char>());
return (vec);
}
and then
auto vec = read_binary_file(filename);
auto src = (char*) new char[vec.size()];
std::copy(vec.begin(), vec.end(), src);

The problem is definitievely the writing of your buffer, because you read a byte at a time.
If you know the length of the data in your buffer, you could force cout to go on:
char *bf = "Hello\0 world";
cout << bf << endl;
cout << string(bf, 12) << endl;
This should give the following output:
Hello
Hello world
However this is a workaround, as cout is foreseent to output printable data. Be aware that the output of non printable chars such as '\0' is system dependent.
Alternative solutions:
But if you manipulate binary data, you should define ad-hoc data structures and printing. Here some hints, with a quick draft for the general principles:
struct Mybuff { // special strtucture to manage buffers of binary data
static const int maxsz = 512;
int size;
char buffer[maxsz];
void set(char *src, int sz) // binary copy of data of a given length
{ size = sz; memcpy(buffer, src, max(sz, maxsz)); }
} ;
Then you could overload the output operator function:
ostream& operator<< (ostream& os, Mybuff &b)
{
for (int i = 0; i < b.size; i++)
os.put(isprint(b.buffer[i]) ? b.buffer[i]:'*'); // non printables replaced with *
return os;
}
ANd you could use it like this:
char *bf = "Hello\0 world";
Mybuff my;
my.set(bf, 13); // physical copy of memory
cout << my << endl; // special output

I believe your problem is not in reading the data, but rather in how you try to print it.
char * a = "this is a\0 test";
cout << a;
This example you show us prints a C-string. Since C-string is a sequence of chars ended by '\0', the printing function stops at the first null char.
This is because you need to know where the string ends either by using special terminating character (like '\0' here) or knowing its length.
So, to print whole data, you must know the length of it and use a loop similar to the one you use for reading it.

Are you on Windows? If so you need to execute _setmode(_fileno(stdout), _O_BINARY);
Include <fcntl.h> and <io.h>

stack check fail in sha-1 c++

I'm having a __stack_chk_fail in the main thread.
I have no idea why is this happening?
I got the codes from this website:
http://www.packetizer.com/security/sha1/
Im trying to add a function to compute the digest of a file using the example.
.h file
#include <stdio.h>
#include <string>
std::string digestFile( char *filename );
.cpp file
std::string SHA1::digestFile( char *filename )
{
Reset();
FILE *fp = NULL;
if (!(fp = fopen(filename, "rb")))
{
printf("sha: unable to open file %s\n", filename);
return NULL;
}
char c = fgetc(fp);
while(!feof(fp))
{
Input(c);
c = fgetc(fp);
}
fclose(fp);
unsigned message_digest[5];
if (!Result(message_digest))
{ printf("sha: could not compute message digest for %s\n", filename); }
std::string hash;
for (int i = 0; i < 5; i++)
{
char buffer[8];
int count = sprintf(buffer, "%08x", message_digest[i]);
if (count != 8)
{ printf("converting unsiged to char ERROR"); }
hash.append(buffer);
}
return hash;
}

__stack_chk_fail occurs when you write to invalid address.
It turns out you do:
char buffer[8];
int count = sprintf(buffer, "%08x", message_digest[i]);
C strings are NUL-terminated. That means that when sprintf writes 8 digits, it adds 9-th char, '\0'. But buffer only has space for 8 chars, so the 9-th goes past the end of the buffer.
You need char buffer[9]. Or do it the C++ way with std::stringstream, which does not involve any fixed sizes and thus no risk of buffer overrun.

simulate ulltoa() with a radix/base of 36

I need to convert an unsigned 64-bit integer into a string. That is in Base 36, or characters 0-Z. ulltoa does not exist in the Linux manpages. But sprintf DOES. How do I use sprintf to achieve the desired result? i.e. what formatting % stuff?
Or if snprintf does not work, then how do I do this?

You can always just write your own conversion function. The following idea is stolen from heavily inspired by this fine answer:
char * int2base36(unsigned int n, char * buf, size_t buflen)
{
static const char digits[] = "0123456789ABCDEFGHI...";
if (buflen < 1) return NULL; // buffer too small!
char * b = buf + buflen;
*--b = 0;
do {
if (b == buf) return NULL; // buffer too small!
*--b = digits[n % 36];
n /= 36;
} while(n);
return b;
}
This will return a pointer to a null-terminated string containing the base36-representation of n, placed in a buffer that you provide. Usage:
char buf[100];
std::cout << int2base36(37, buf, 100);
If you want and you're single-threaded, you can also make the char buffer static -- I guess you can figure out a suitable maximal length:
char * int2base36_not_threadsafe(unsigned int n)
{
static char buf[128];
static const size_t buflen = 128;
// rest as above

iconv encoding conversion problem

I am having trouble converting strings from utf8 to gb2312. My convert function is below
void convert(const char *from_charset,const char *to_charset, char *inptr, char *outptr)
{
size_t inleft = strlen(inptr);
size_t outleft = inleft;
iconv_t cd; /* conversion descriptor */
if ((cd = iconv_open(to_charset, from_charset)) == (iconv_t)(-1))
{
fprintf(stderr, "Cannot open converter from %s to %s\n", from_charset, to_charset);
exit(8);
}
/* return code of iconv() */
int rc = iconv(cd, &inptr, &inleft, &outptr, &outleft);
if (rc == -1)
{
fprintf(stderr, "Error in converting characters\n");
if(errno == E2BIG)
printf("errno == E2BIG\n");
if(errno == EILSEQ)
printf("errno == EILSEQ\n");
if(errno == EINVAL)
printf("errno == EINVAL\n");
iconv_close(cd);
exit(8);
}
iconv_close(cd);
}
This is an example of how I used it:
int len = 1000;
char *result = new char[len];
convert("UTF-8", "GB2312", some_string, result);
edit: I most of the time get a E2BIG error.

outleft should be the size of the output buffer (e.g. 1000 bytes), not the size of the incoming string.
When converting, the string length usually changes in the process and you cannot know how long it is going to be until afterwards. E2BIG means that the output buffer wasn't large enough, in which case you need to give it more output buffer space (notice that it has already converted some of the data and adjusted the four variables passed to it accordingly).

As others have noted, E2BIG means that the output buffer wasn't large enough for the conversion and you were using the wrong value for outleft.
But I've also noticed some other possible problems with your function. Namely, with the way your function works, your caller has no way of knowing how many bytes are in the output string. Your convert() function neither nul-terminates the output buffer nor does it have a means of telling its caller the number of bytes it wrote to outptr.
If you want to deal with nul-terminates strings (and it appears that's what you want to do since your input string is nul-terminated), you might find the following approach to be much better:
char *
convert (const char *from_charset, const char *to_charset, const char *input)
{
size_t inleft, outleft, converted = 0;
char *output, *outbuf, *tmp;
const char *inbuf;
size_t outlen;
iconv_t cd;
if ((cd = iconv_open (to_charset, from_charset)) == (iconv_t) -1)
return NULL;
inleft = strlen (input);
inbuf = input;
/* we'll start off allocating an output buffer which is the same size
* as our input buffer. */
outlen = inleft;
/* we allocate 4 bytes more than what we need for nul-termination... */
if (!(output = malloc (outlen + 4))) {
iconv_close (cd);
return NULL;
}
do {
errno = 0;
outbuf = output + converted;
outleft = outlen - converted;
converted = iconv (cd, (char **) &inbuf, &inleft, &outbuf, &outleft);
if (converted != (size_t) -1 || errno == EINVAL) {
/*
* EINVAL An incomplete multibyte sequence has been encoun-
* tered in the input.
*
* We'll just truncate it and ignore it.
*/
break;
}
if (errno != E2BIG) {
/*
* EILSEQ An invalid multibyte sequence has been encountered
* in the input.
*
* Bad input, we can't really recover from this.
*/
iconv_close (cd);
free (output);
return NULL;
}
/*
* E2BIG There is not sufficient room at *outbuf.
*
* We just need to grow our outbuffer and try again.
*/
converted = outbuf - out;
outlen += inleft * 2 + 8;
if (!(tmp = realloc (output, outlen + 4))) {
iconv_close (cd);
free (output);
return NULL;
}
output = tmp;
outbuf = output + converted;
} while (1);
/* flush the iconv conversion */
iconv (cd, NULL, NULL, &outbuf, &outleft);
iconv_close (cd);
/* Note: not all charsets can be nul-terminated with a single
* nul byte. UCS2, for example, needs 2 nul bytes and UCS4
* needs 4. I hope that 4 nul bytes is enough to terminate all
* multibyte charsets? */
/* nul-terminate the string */
memset (outbuf, 0, 4);
return output;
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to get string from CP866 bytes - c++

Related

How to use libiconv correctly in C so that it would not report "Arg list too long"?

C++: Store read binary file into buffer

stack check fail in sha-1 c++

simulate ulltoa() with a radix/base of 36

iconv encoding conversion problem

Categories

Resources