substitute strlen with sizeof for c-string - c++

I want to use mbstowcs_s method but without iostream header. Therefore I cannot use strlen to predict the size of my buffer. The following method has to simply change c-string to wide c-string and return it:
char* changeToWide(char* value)
{
wchar_t* vOut = new wchar_t[strlen(value)+1];
mbstowcs_s(NULL,vOut,strlen(val)+1,val,strlen(val));
return vOut;
}
As soon as i change it to
char* changeToWide(char* value)
{
wchar_t* vOut = new wchar_t[sizeof(value)];
mbstowcs_s(NULL,vOut,sizeof(value),val,sizeof(value)-1);
return vOut;
}
I get wrong results (values are not the same in both arrays). What is the best way to work it out?
I am also open for other ideas how to make that conversion without using strings but pure arrays

Given a char* or const char* you cannot use sizeof() to get the size of the string being pointed by your char* variable. In this case, sizeof() will return you the number of bytes a pointer uses in memory (commonly 4 bytes in 32-bit architectures and 8 bytes in 64-bit architectures).
If you have an array of characters defined as array, you can use sizeof:
char text[] = "test";
auto size = sizeof(text); //will return you 5 because it includes the '\0' character.
But if you have something like this:
char text[] = "test";
const char* ptext = text;
auto size2 = sizeof(ptext); //will return you probably 4 or 8 depending on the architecture you are working on.

Not that I am an expert on this matter, but char to wchar_t conversion being made is seemingly nothing but using a wider space for the exact same bytes, in other words, prefixing each char with some set of zeroes.
I don't know C++ either, just C, but I can derive what it probably would look like in C++ by looking at your code, so here it goes:
wchar_t * changeToWide( char* value )
{
//counts the length of the value-array including the 0
int i = 0;
while ( value[i] != '\0' ) i++;
//allocates enough much memory
wchar_t * vOut = new wchar_t[i];
//assigns values including the 0
i = 0;
while ( ( vOut[i] = 0 | value[i] ) != '\0' ) i++;
return vOut;
}
0 | part looks truly obsolete to me, but I felt like including it, don't really know why...

Related

Subsetting char array without copying it in C++

I have a long array of char (coming from a raster file via GDAL), all composed of 0 and 1. To compact the data, I want to convert it to an array of bits (thus dividing the size by 8), 4 bytes at a time, writing the result to a different file. This is what I have come up with by now:
uint32_t bytes2bits(char b[33]) {
b[32] = 0;
return strtoul(b,0,2);
}
const char data[36] = "00000000000000000000000010000000101"; // 101 is to be ignored
char word[33];
strncpy(word,data,32);
uint32_t byte = bytes2bits(word);
printf("Data: %d\n",byte); // 128
The code is working, and the result is going to be written in a separate file. What I'd like to know is: can I do that without copying the characters to a new array?
EDIT: I'm using a const variable here just to make a minimal, reproducible example. In my program it's a char *, which is continually changing value inside a loop.
Yes, you can, as long as you can modify the source string (in your example code you can't because it is a constant, but I assume in reality you have the string in writable memory):
uint32_t bytes2bits(const char* b) {
return strtoul(b,0,2);
}
void compress (char* data) {
// You would need to make sure that the `data` argument always has
// at least 33 characters in length (the null terminator at the end
// of the original string counts)
char temp = data[32];
data[32] = 0;
uint32_t byte = bytes2bits(data);
data[32] = temp;
printf("Data: %d\n",byte); // 128
}
In this example by using char* as a buffer to store that long data there is not necessary to copy all parts into a temporary buffer to convert it to a long.
Just use a variable to step through the buffer by each 32 byte length period, but after the 32th byte there needs the 0 termination byte.
So your code would look like:
uint32_t bytes2bits(const char* b) {
return strtoul(b,0,2);
}
void compress (char* data) {
int dataLen = strlen(data);
int periodLen = 32;
char* periodStr;
char tmp;
int periodPos = periodLen+1;
uint32_t byte;
periodStr = data[0];
while(periodPos < dataLen)
{
tmp = data[periodPos];
data[periodPos] = 0;
byte = bytes2bits(periodStr);
printf("Data: %d\n",byte); // 128
data[periodPos] = tmp;
periodStr = data[periodPos];
periodPos += periodLen;
}
if(periodPos - periodLen <= dataLen)
{
byte = bytes2bits(periodStr);
printf("Data: %d\n",byte); // 128
}
}
Please than be careful to the last period, which could be smaller than 32 bytes.
const char data[36]
You are in violation of your contract with the compiler if you declare something as const and then modify it.
Generally speaking, the compiler won't let you modify it...so to even try to do so with a const declaration you'd have to cast it (but don't)
char *sneaky_ptr = (char*)data;
sneaky_ptr[0] = 'U'; /* the U is for "undefined behavior" */
See: Can we change the value of an object defined with const through pointers?
So if you wanted to do this, you'd have to be sure the data was legitimately non-const.
The right way to do this in modern C++ is by using std::string to hold your string and std::string_view to process parts of that string without copying it.
You can using string_view with that char array you have though. It's common to use it to modernize the classical null-terminated string const char*.

Need to find the number of contents in the array

I have a char array and I want to find out the number of contents in it.
For example, my array is:
char myArray[10];
And after input it's content is:
ABC
Now I want to store in a variable 'size', the size of the area related to content. So, in this case:
size = 3
How do I find that?
A naive way of doing this would be to look for the null-terminating character \0, this is already implemented for you in the C-function strlen, so there are two ways of doing this:
int StringLength( const char* str, int maxLength )
{
for( int i = 0; i < maxLength; ++i )
{
if( str[i] == '\0' )
return i;
}
return -1;
}
Or you could just call strlen as follows:
int iLength = strlen( myArray );
However, as you have tagged this c++, the best way to do this would be to not deal with C-style character arrays and instead use the extremely useful std::string class.
strlen(myArray) is what you want.
Try this:
int len = strlen(myArray).
strlen is a part of stdlib.h library. Don't forget to declare it in your program.
defining array as char myArray[10]; will not always initialize it's content to zeros, so depending on how you fill it with ABC you either can or cannot find the corect lenght. In worst case regular strlen() will always report numbers >10, or even result in read access vialation. I try initialize it like char myArray[10] = {}; first

Dynamic memory allocation to char array

I have given the array size manually as below:
int main(int argc, char *argv[] )
{
char buffer[1024];
strcpy(buffer,argv[1]);
...
}
But if the data passed in the argument exceeds this size, it may will create problems.
Is this the correct way to allocate memory dynamically?
int main(int argc, char *argv[] )
{
int length;
char *buffer;
length = sizeof(argv[1]); //or strlen(argv[1])?
buffer = (char*)malloc(length*sizeof(char *));
...
}
sizeof tells you the size of char*. You want strlen instead
if (argc < 2) {
printf("Error - insufficient arguments\n");
return 1;
}
length=strlen(argv[1]);
buffer = (char*)malloc(length+1); // cast required for C++ only
I've suggested a few other changes here
you need to add an extra byte to buffer for the null terminator
you should check that the user passed in an argv[1]
sizeof(char *) is incorrect when calculating storage required for a string. A C string is an array of chars so you need sizeof(char), which is guaranteed to be 1 so you don't need to multiply by it
Alternatively, if you're running on a Posix-compatible system, you could simplify things and use strdup instead:
buffer = strdup(argv[1]);
Finally, make sure to free this memory when you're finished with it
free(buffer);
The correct way is to use std::string and let C++ do the work for you
#include <string>
int main()
{
std::string buffer = argv[1];
}
but if you want to do it the hard way then this is correct
int main()
{
int length = strlen(argv[1]);
char* buffer = (char*)malloc(length + 1);
}
Don't forget to +1 for the null terminator used in C style strings.
In C++, you can do this to get your arguements in a nice data structure.
const std::vector<std::string>(argv, argv + argc)
length= strlen(argv[1]) //not sizeof(argv[1]);
and
//extra byte of space is to store Null character.
buffer = (char*)malloc((length+1) * sizeof(char));
Since sizeof(char) is always one, you can also use this:
buffer = (char*)malloc(length+1);
Firstly, if you use C++ I think it's better to use new instead of malloc.
Secondly, you're malloc size is false : buffer = malloc(sizeof(char) * length); because you allocate a char buffer not a char* buffer.
thirdly, you must allocate 1 byte more for the end of your string and store '\0'.
Finally, sizeof get only the size of the type not a string, you must use strlen for getting string size.
You need to add an extra byte to hold the terminating null byte of the string:
length=sizeof(argv[1]) + 1;
Then it should be OK.

Convert wchar_t to char

I was wondering is it safe to do so?
wchar_t wide = /* something */;
assert(wide >= 0 && wide < 256 &&);
char myChar = static_cast<char>(wide);
If I am pretty sure the wide char will fall within ASCII range.
Why not just use a library routine wcstombs.
assert is for ensuring that something is true in a debug mode, without it having any effect in a release build. Better to use an if statement and have an alternate plan for characters that are outside the range, unless the only way to get characters outside the range is through a program bug.
Also, depending on your character encoding, you might find a difference between the Unicode characters 0x80 through 0xff and their char version.
You are looking for wctomb(): it's in the ANSI standard, so you can count on it. It works even when the wchar_t uses a code above 255. You almost certainly do not want to use it.
wchar_t is an integral type, so your compiler won't complain if you actually do:
char x = (char)wc;
but because it's an integral type, there's absolutely no reason to do this. If you accidentally read Herbert Schildt's C: The Complete Reference, or any C book based on it, then you're completely and grossly misinformed. Characters should be of type int or better. That means you should be writing this:
int x = getchar();
and not this:
char x = getchar(); /* <- WRONG! */
As far as integral types go, char is worthless. You shouldn't make functions that take parameters of type char, and you should not create temporary variables of type char, and the same advice goes for wchar_t as well.
char* may be a convenient typedef for a character string, but it is a novice mistake to think of this as an "array of characters" or a "pointer to an array of characters" - despite what the cdecl tool says. Treating it as an actual array of characters with nonsense like this:
for(int i = 0; s[i]; ++i) {
wchar_t wc = s[i];
char c = doit(wc);
out[i] = c;
}
is absurdly wrong. It will not do what you want; it will break in subtle and serious ways, behave differently on different platforms, and you will most certainly confuse the hell out of your users. If you see this, you are trying to reimplement wctombs() which is part of ANSI C already, but it's still wrong.
You're really looking for iconv(), which converts a character string from one encoding (even if it's packed into a wchar_t array), into a character string of another encoding.
Now go read this, to learn what's wrong with iconv.
An easy way is :
wstring your_wchar_in_ws(<your wchar>);
string your_wchar_in_str(your_wchar_in_ws.begin(), your_wchar_in_ws.end());
char* your_wchar_in_char = your_wchar_in_str.c_str();
I'm using this method for years :)
A short function I wrote a while back to pack a wchar_t array into a char array. Characters that aren't on the ANSI code page (0-127) are replaced by '?' characters, and it handles surrogate pairs correctly.
size_t to_narrow(const wchar_t * src, char * dest, size_t dest_len){
size_t i;
wchar_t code;
i = 0;
while (src[i] != '\0' && i < (dest_len - 1)){
code = src[i];
if (code < 128)
dest[i] = char(code);
else{
dest[i] = '?';
if (code >= 0xD800 && code <= 0xD8FF)
// lead surrogate, skip the next code unit, which is the trail
i++;
}
i++;
}
dest[i] = '\0';
return i - 1;
}
Technically, 'char' could have the same range as either 'signed char' or 'unsigned char'. For the unsigned characters, your range is correct; theoretically, for signed characters, your condition is wrong. In practice, very few compilers will object - and the result will be the same.
Nitpick: the last && in the assert is a syntax error.
Whether the assertion is appropriate depends on whether you can afford to crash when the code gets to the customer, and what you could or should do if the assertion condition is violated but the assertion is not compiled into the code. For debug work, it seems fine, but you might want an active test after it for run-time checking too.
Here's another way of doing it, remember to use free() on the result.
char* wchar_to_char(const wchar_t* pwchar)
{
// get the number of characters in the string.
int currentCharIndex = 0;
char currentChar = pwchar[currentCharIndex];
while (currentChar != '\0')
{
currentCharIndex++;
currentChar = pwchar[currentCharIndex];
}
const int charCount = currentCharIndex + 1;
// allocate a new block of memory size char (1 byte) instead of wide char (2 bytes)
char* filePathC = (char*)malloc(sizeof(char) * charCount);
for (int i = 0; i < charCount; i++)
{
// convert to char (1 byte)
char character = pwchar[i];
*filePathC = character;
filePathC += sizeof(char);
}
filePathC += '\0';
filePathC -= (sizeof(char) * charCount);
return filePathC;
}
one could also convert wchar_t --> wstring --> string --> char
wchar_t wide;
wstring wstrValue;
wstrValue[0] = wide
string strValue;
strValue.assign(wstrValue.begin(), wstrValue.end()); // convert wstring to string
char char_value = strValue[0];
In general, no. int(wchar_t(255)) == int(char(255)) of course, but that just means they have the same int value. They may not represent the same characters.
You would see such a discrepancy in the majority of Windows PCs, even. For instance, on Windows Code page 1250, char(0xFF) is the same character as wchar_t(0x02D9) (dot above), not wchar_t(0x00FF) (small y with diaeresis).
Note that it does not even hold for the ASCII range, as C++ doesn't even require ASCII. On IBM systems in particular you may see that 'A' != 65

Very strange char array behaviour

.
unsigned int fname_length = 0;
//fname length equals 30
file.read((char*)&fname_length,sizeof(unsigned int));
//fname contains random data as you would expect
char *fname = new char[fname_length];
//fname contains all the data 30 bytes long as you would expect, plus 18 bytes of random data on the end (intellisense display)
file.read((char*)fname,fname_length);
//m_material_file (std:string) contains all 48 characters
m_material_file = fname;
// count = 48
int count = m_material_file.length();
now when trying this way, intellisense still shows the 18 bytes of data after setting the char array to all ' ' and I get exactly the same results. even without the file read
char name[30];
for(int i = 0; i < 30; ++i)
{
name[i] = ' ';
}
file.read((char*)fname,30);
m_material_file = name;
int count = m_material_file.length();
any idea whats going wrong here, its probably something completely obvious but im stumped!
thanks
Sounds like the string in the file isn't null-terminated, and intellisense is assuming that it is. Or perhaps when you wrote the length of the string (30) into the file, you didn't include the null character in that count. Try adding:
fname[fname_length] = '\0';
after the file.read(). Oh yeah, you'll need to allocate an extra character too:
char * fname = new char[fname_length + 1];
I guess that intellisense is trying to interpret char* as C string and is looking for a '\0' byte.
fname is a char* so both the debugger display and m_material_file = fname will be expecting it to be terminated with a '\0'. You're never explicitly doing that, but it just happens that whatever data follows that memory buffer has a zero byte at some point, so instead of crashing (which is a likely scenario at some point), you get a string that's longer than you expect.
Use
m_material_file.assign(fname, fname + fname_length);
which removes the need for the zero terminator. Also, prefer std::vector to raw arrays.
std::string::operator=(char const*) is expecting a sequence of bytes terminated by a '\0'. You can solve this with any of the following:
extend fname by a character and add the '\0' explicitly as others have suggested or
use m_material_file.assign(&fname[0], &fname[fname_length]); instead or
use repeated calls to file.get(ch) and m_material_file.push_back(ch)
Personally, I would use the last option since it eliminates the explicitly allocated buffer altogether. One fewer explicit new is one fewer chance of leaking memory. The following snippet should do the job:
std::string read_name(std::istream& is) {
unsigned int name_length;
std::string file_name;
if (is.read((char*)&name_length, sizeof(name_length))) {
for (unsigned int i=0; i<name_length; ++i) {
char ch;
if (is.get(ch)) {
file_name.push_back(ch);
} else {
break;
}
}
}
return file_name;
}
Note:
You probably don't want to use sizeof(unsigned int) to determine how many bytes to write to a binary file. The number of bytes read/written is dependent on the compiler and platform. If you have a maximum length, then use it to determine the specific byte size to write out. If the length is guaranteed to fewer than 255 bytes, then only write a single byte for the length. Then your code will not depend on the byte size of intrinsic types.