I'm writing code and a good portion of it requires returning wchar arrays. Returning wstrings aren't really an option (although I can use them) and I know I can pass a pointer as an argument and populate that, but I'm looking specifically to return a pointer to this array of wide chars. The first few iterations, I found that I would return the arrays alright, but by the time they are processed and printed, the memory would be overwritten and I would be left with gibberish. To fix this, I started using wcsdup, which fixed everything, but I'm struggling to grasp exactly what is happening, and thus, when it should be called so that it works and I leak no memory. As it is, I pretty much use wcsdup every time I return a string and every time a string is returned, which I know leaks memory. Here is what I'm doing. Where and why should I use wcsdup, or is there a better solution than wcsdup altogether?
wchar_t *intToWChar(int toConvert, int base)
{
wchar_t converted[12];
/* Conversion happens... */
return converted;
}
wchar_t *intToHexWChar(int toConvert)
{
/* Largest int is 8 hex digits, plus "0x", plus /0 is 11 characters. */
wchar_t converted[11];
/* Prefix with "0x" for hex string. */
converted[0] = L'0';
converted[1] = L'x';
/* Populate the rest of converted with the number in hex. */
wchar_t *hexString = intToWChar(toConvert, 16);
wcscpy((converted + 2), hexString);
return converted;
}
int main()
{
wchar_t *hexConversion = intToHexWChar(12345);
/* Other code. */
/* Without wcsdup calls, this spits out gibberish. */
wcout << "12345 in Hex is " << hexConversion << endl;
}
wchar_t *intToWChar(int toConvert, int base)
{
wchar_t converted[12];
/* Conversion happens... */
return converted;
}
This returns a pointer to a local variable.
wchar_t *hexString = intToWChar(toConvert, 16);
After this line, hexString will point to invalid memory and using it is undefined (may still have value or may be garbage!).
You do the same thing with the return from intToHexWChar.
Solutions:
use std::wstring
use std::vector<wchar_t>
pass in an array to the function for it to use
use smart pointers
use dynamic memory allocation (please don't!)
Note: you might also need to change to wcout instead of cout
Since you tagged your question with 'C++' the answer is a resounding: no, you should not use wcsdup at all. Instead, for passing arrays of wchar_t values around, use std::vector<wchar_t>.
If needed, you can turn those into a wchar_t* by taking the address of the first element (since vectors are guaranteed to be stored in contiguous memory), e.g.
cout << "12345 in Hex is " << &hexConversion[0] << endl;
Related
This is the scenario;
// I have created a buffer
void *buffer = operator new(100)
/* later some data from a different buffer is put into the buffer at this pointer
by a function in an external header so I don't know what it's putting in there */
cout << buffer;
I want to print out the data that was put into the buffer at this pointer to see what went in. I would like to just print it out as raw ASCII, I know there will be some non-printable characters in there but I also know some legible text was pushed there.
From what I have read on the Internet cout can't print out uncasted data like a void, as opposed to an int or char. However, the compiler wont let me cast it on the fly using (char) for example. Should I create a seperate variable that casts the value at the pointer then cout that variable, or is there a way I can do this directly to save on another variable?
Do something like:
// C++11
std::array<char,100> buf;
// use std::vector<char> for a large or dynamic buffer size
// buf.data() will return a raw pointer suitable for functions
// expecting a void* or char*
// buf.size() returns the size of the buffer
for (char c : buf)
std::cout << (isprint(c) ? c : '.');
// C++98
std::vector<char> buf(100);
// The expression `buf.empty() ? NULL : &buf[0]`
// evaluates to a pointer suitable for functions expecting void* or char*
// The following struct needs to have external linkage
struct print_transform {
char operator() (char c) { return isprint(c) ? c : '.'; }
};
std::transform(buf.begin(), buf.end(),
std::ostream_iterator<char>(std::cout, ""),
print_transform());
Do this:
char* buffer = new char[100];
std::cout << buffer;
// at some point
delete[] buffer;
void* you only need in certain circumstances, mostly for interop with C interfaces, but this is definitely not a circumstance requiring a void*, which essentially loses all type information.
You need to cast it to char*: reinterpret_cast<char*>(buffer). The problem is that void* represents anything, so only th pointer is printed; when you cast it to char*, the contents of the memory are interpreted as a C-style string
Note: use reinterpret_cast<> instead of the C-style (char *) to make your intent clear and avoid subtle-and-hard-to-find bugs later
Note: of course you might get a segfault instead, as if the data is indeed not a C-style string, memory not associated with the buffer might be accessed
Update: You could allocate the memory to a char* buffer to begin with and it would solve your problem too: you could still call your 3rd party function (char* is implicitly convertible to void*, which I presume is the 3rd party function's parameter type) and you don't need to do the cast-ing at all. Your best bet is to zero-out the memory and restrict the 3rd party function to copy more than 99*sizeof(char) bytes into your buffer to preserve the ending '\0' C-style string terminator
If you want to go byte by byte you could use an unsigned char and iterate over it.
unsigned char* currByte = new unsigned char[100];
for(int i = 0; i < 100; ++i)
{
printf("| %02X |", currByte[i]);
}
It's not a very modern (or even very "C++") answer but it will print it as a hex value for you.
I have got a const char which is made by concatenation like this:
const char *fileName = "background1";
std::stringstream sstm;
sstm << fileName << "-hd.png";
fileName = sstm.str().c_str();
My problem is that the following instruction:
printf("const char = %s size = %d", fileName, sizeof(fileName));
returns:
"const char = background1-hd.png size = 4"
whereas I would expect that it returns:
"const char = background1-hd.png size = 19"
For example, the following gives the convenient result (as there is no concatenation):
const char *fileName2 = "background1-hd";
printf("const char = %s size = %d", fileName2, sizeof(fileName2));
returns:
"const char = background1-hd.png size = 19"
How to avoid this issue and guarantee that the characters will be correctly counted in my concatenated char ?
Thanks !!
sizeof() returns the number of bytes the variable occupies in memory (in this case returns the size of the pointer fileName).
strlen() returns the length of the string (which is what you need).
You could as well try something like:
#include <iostream>
#include <cstdio>
int main()
{
std::string fileName("background1");
fileName.append("-hd.png");
printf("const char = %s size = %d", fileName.c_str(), fileName.length());
return 0;
}
sizeof returns the size of the variable you give to it; it's evaluated at compile time. The "4" is the size of a pointer on your system. You want to use strlen() to determine the length of a string.
The result of sizeof(fileName) is related to fileName being a pointer, not an array. It literally returns the size of a pointer to a constant character string, and on a 32-bit system, all pointers are 32 bits (so sizeof == 4).
What you should use instead is strlen or similar, which will count the characters in the string, up to the trailing null, and return that. The results with strlen in place of sizeof will be about what you expect.
Side-related, with const char strings there is only ever one character per "cell" (actually byte). There are character sets which make for multiple bytes per character, but packing multiple characters into a single byte is quite rare, at least in C-family languages.
sizeof calculates the size of the data type in bytes and not the size of its contents (what it points to). In your example you are calculating the sizeof char* which is 4 bytes on your system. To get the length of a C string use strlen.
There is a distinction in the language between arrays and pointers, even if this distinction seems diluted both by implicit conversions (arrays tend to decay into pointers quite easily), and common statements that they are the same.
How does this even relate to your code?
Well, a string literal is actually an array of constant characters, not a pointer to character(s). In the initialization const char *fileName = "background1"; you are creating a pointer variable that points to the first element of the array ("background1" is decaying into a pointer to the first element), and from there on the variable you are managing is pointer and not the literal.
If you mix this with the fact that sizeof will tell you the size of the variable, you get that in a platform with 32bit pointers and 8 bit chars, sizeof( const char* ) is always 4, regardless of the object that is pointed by that pointer (if there is even one).
Now, if you were treating the literal as what it actually is you would be having a bit more luck there:
const char filename[] = "background1";
assert( sizeof filename == 12 ); // note: NUL character is counted!
const char *fname = filename;
assert( sizeof filename == sizeof( void* ) );
In real code, you are not a so lucky and in many cases the literals have decayed into pointers well before you get a chance of getting the compile time size of the literal, so you cannot ask the compiler to tell you the size. In that case you need to calculate the length of the C style string, which can be done by calling strlen.
strlen has been suggested a number of times already, and for this case it's probably perfectly reasonable.
There is an alternative that will let you use sizeof though:
char fileName[] = "background1";
std::cout << sizeof(fileName) << "\n";
Since you're making fileName an array, it has all the characteristics of an array -- including the fact that your later attempt at assigning to it:
fileName = sstm.str().c_str();
...would fail (won't even compile when fileName is defined as an array). I should add, however, that it seems to me that you'd be better off just using std::string throughout:
std::string fileName("background1");
std::stringstream sstm;
sstm << fileName << "-hd.png";
fileName = sstm.str();
In this case, you can use string's size() or length() member.
I was working with a program that uses a function to set a new value in the registry, I used a const char * to get the value. However, the size of the value is only four bytes. I've tried to use std::string as a parameter instead, it didn't work.
I have a small example to show you what I'm talking about, and rather than solving my problem with the function I'd like to know the reason it does this.
#include <iostream>
void test(const char * input)
{
std::cout << input;
std::cout << "\n" << sizeof("THIS IS A TEST") << "\n" << sizeof(input) << "\n";
/* The code above prints out the size of an explicit string (THIS IS A TEST), which is 15. */
/* It then prints out the size of input, which is 4.*/
int sum = 0;
for(int i = 0; i < 15; i++) //Printed out each character, added the size of each to sum and printed it out.
//The result was 15.
{
sum += sizeof(input[i]);
std::cout << input[i];
}
std::cout << "\n" << sum;
}
int main(int argc, char * argv[])
{
test("THIS IS A TEST");
std::cin.get();
return 0;
}
Output:
THIS IS A TEST
15
4
THIS IS A TEST
15
What's the correct way to get string parameters? Do I have to loop through the whole array of characters and print each to a string (the value in the registry was only the first four bytes of the char)? Or can I use std::string as a parameter instead?
I wasn't sure if this was SO material, but I decided to post here as I consider this to be one of my best sources for programming related information.
sizeof(input) is the size of a const char* What you want is strlen(input) + 1
sizeof("THIS IS A TEST") is size of a const char[]. sizeof gives the size of the array when passed an array type which is why it is 15 .
For std::string use length()
sizeof gives a size based on the type you give it as a parameter. If you use the name of a variable, sizeof still only bases its result on the type of that variable. In the case of char *whatever, it's telling you the size of a pointer to char, not the size of the zero-terminated buffer it's point at. If you want the latter, you can use strlen instead. Note that strlen tells you the length of the content of the string, not including the terminating '\0'. As such, if (for example) you want to allocate space to duplicate a string, you need to add 1 to the result to tell you the total space occupied by the string.
Yes, as a rule in C++ you normally want to use std::string instead of pointers to char. In this case, you can use your_string.size() (or, equivalently, your_string.length()).
std::string is a C++ object, which cannot be passed to most APIs. Most API's take char* as you noticed, which is very different from a std::string. However, since this is a common need, std::string has a function for that: c_str.
std::string input;
const char* ptr = input.c_str(); //note, is const
In C++11, it is now also safe-ish to do this:
char* ptr = &input[0]; //nonconst
and you can alter the characters, but the size is fixed, and the pointer is invalidated if you call any mutating member of the std::string.
As for the code you posted, "THIS IS A TEST" has the type of const char[15], which has a size of 15 bytes. The char* input however, has a type char* (obviously), which has a size of 4 on your system. (Might be other sizes on other systems)
To find the size of a c-string pointed at by a char* pointer, you can call strlen(...) if it is NULL-terminated. It will return the number of characters before the first NULL character.
If the registry you speak of is the Windows registry, it may be an issue of Unicode vs. ASCII.
Modern Windows stores almost all strings as Unicode, which uses 2 bytes per character.
If you try to put a Unicode string into an std::string, it may be getting a 0 (null), which some implementations of string classes treat as "end of string."
You may try using a std::wstring (wide string) or vector< wchar_t > (wide character type). These can store strings of two-byte characters.
sizeof() is also not giving you the value you may think it is giving you. Your system probably runs 32-bit Windows -- that "4" value is the size of the pointer to the first character of that string.
If this doesn't help, please post the specific results that occur when you use std::string or std::wstring (more than saying that it doesn't work).
To put it simply, the size of a const char * != the size of a const char[] (if they are equal, it's by coincidence). The former is a pointer. A pointer, in the case of your system, is 4 bytes REGARDLESS of the datatype. It could be int, char, float, whatever. This is because a pointer is always a memory address, and is numeric. Print out the value of your pointer and you'll see it's actually 4 bytes. const char[] now, is the array itself and will return the length of the array when requested.
I'm writing a small proof-of-concept console program with Visual Studio 2008 and I wanted it to output colored text for readability. For ease of coding I also wanted to make a quick printf-replacement, something where I could write like this:
MyPrintf(L"Some text \1[bright red]goes here\1[default]. %d", 21);
This will be useful because I also build and pass strings around in some places so my strings will be able to contain formatting info.
However I hit a wall against wsprintf because I can't find a function that would allow me to find out the required buffer size before passing it to the function. I could, of course, allocate 1MB just-to-be-sure, but that wouldn't be pretty and I'd rather leave that as a backup solution if I can't find a better way.
Also, alternatively I'm considering using std::wstring (I'm actually more of a C guy with little C++ experience so I find plain-old-char-arrays easier for now), but that doesn't have anything like wsprintf where you could build a string with values replaced in them.
So... what should I do?
Your question is tagged C++, in which case I'd say std::wstringstream is the way to go. Example:
#include <sstream>
void func()
{
// ...
std::wstringstream ss; // the string stream
// like cout, you can add strings and numbers by operator<<
ss << L"Some text \1[bright red]goes here\1[default]. " << 21;
// function takes a C-style const wchar_t* string
some_c_function(ss.str().c_str()); // convert to std::wstring then const wchar_t*
// note: lifetime of the returned pointer probably temporary
// you may need a permanent std::wstring to return the c_str() from
// if you need it for longer.
// ...
}
You want _snwprintf. That function takes a buffer size, and if the buffer isn't big enough, just double the size of the buffer and try again. To keep from having to do multiple _snwprintf calls each time, keep track of what the buffer size was that you ended up using last time, and always start there. You'll make a few excess calls here and there, and you'll waste a bit of ram now and then, but it works great, and can't over-run anything.
I'd go for a C++ stringstream. It's not as compact as sprintf but it will give you the functionality you want.
If you can afford using boost, you could consider boost::format. It would give you the flexibility of std::strings, and formatting features of sprintf. It is fairly different from C-style, but is also fairly easy to use. Here's an example.
_scprintf, _scprintf_l, _scwprintf, _scwprintf_l
This functions will return the number of characters in the formatted string.
Using std::wstring seems like a good solution if you plan on passing strings between your objects - it handles the size and has a nice c_str method that will give you the array of wide chars.
The additional benefit is that you can pass it by reference instead of by pointer.
When you need the actuall string just use c_str method:
wprintf(L"string %s recieved!", myWString.c_str());
This answer is an expansion of the answer from #mheyman that uses vswprintf().
I also struggled with the same problem. The Microsoft documentation is weak, but this page was helpful: https://en.cppreference.com/w/c/io/vfwprintf
CppRef Description: If bufsz is greater than zero, writes the results to a wide string buffer. At most bufsz-1 wide characters are written followed by null wide character. If bufsz is zero, nothing is written (and buffer may be a null pointer).
CppRef Return value: Number of wide characters written (not counting the terminating null wide character) if successful or negative value if an encoding error occurred or if the number of characters to be generated was equal or greater than size (including when size is zero).
Roughly:
Measure required buffer size by calling vswprintf() with buffer == NULL and bufsz == 0
Call malloc() (or friends) to allocate a buffer.
Again, call vswprintf() with allocated buffer and buffer size + 1
Use result
Call free() on allocated buffer
Your example uses wchar_t: MyPrintf(L"Some text \1[bright red]goes here\1[default]. %d", 21);, so I recommend something like this:
#include <stdio.h> // includes both <wchar.h> and <stdarg.h>
#include <stdlib.h> // calloc()
void MyPrintf(const wchar_t *lpFormatWCharArr,
...)
{
// Ref: https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/va-arg-va-copy-va-end-va-start?view=msvc-172
va_list ap;
va_start(ap, lpFormatWCharArr);
// does not include trailing NUL char
const int cch = vswprintf(NULL, // wchar_t *buffer
0, // size_t bufsz
lpFormatWCharArr, // const wchar_t *format
ap); // va_list vlist
va_end(ap);
if (cch < 0)
{
// handle error
}
const size_t NUL_CHAR_LEN = 1;
const size_t buf_len = cch + NUL_CHAR_LEN;
// malloc() is faster, but does not memset() result to zero
wchar_t *buf = calloc(buf_len, // size_t number
sizeof(wchar_t)); // size_t size
if (NULL == buf)
{
// handle error
}
va_list ap2;
va_start(ap2, lpFormatWCharArr);
// does not include trailing NUL char
const int cch2 = vswprintf(buf, // wchar_t *buffer
buf_len, // size_t bufsz
lpFormatWCharArr, // const wchar_t *format
ap2); // va_list vlist
va_end(ap2);
if (cch2 < 0)
{
// handle error
}
if (cch != cch2)
{
// handle error
}
// use 'buf' and 'buf_len'
free(buf);
}
There might be a (code) typo in this answer, but similar code was tested against 64-bit Win 10.
For C, we use char[] to represent strings.
For C++, I see examples using both std::string and char arrays.
#include <iostream>
#include <string>
using namespace std;
int main () {
string name;
cout << "What's your name? ";
getline(cin, name);
cout << "Hello " << name << ".\n";
return 0;
}
#include <iostream>
using namespace std;
int main () {
char name[256];
cout << "What's your name? ";
cin.getline(name, 256);
cout << "Hello " << name << ".\n";
return 0;
}
(Both examples adapted from http://www.cplusplus.com.)
What is the difference between these two types in C++? (In terms of performance, API integration, pros/cons, ...)
A char array is just that - an array of characters:
If allocated on the stack (like in your example), it will always occupy eg. 256 bytes no matter how long the text it contains is
If allocated on the heap (using malloc() or new char[]) you're responsible for releasing the memory afterwards and you will always have the overhead of a heap allocation.
If you copy a text of more than 256 chars into the array, it might crash, produce ugly assertion messages or cause unexplainable (mis-)behavior somewhere else in your program.
To determine the text's length, the array has to be scanned, character by character, for a \0 character.
A string is a class that contains a char array, but automatically manages it for you. Most string implementations have a built-in array of 16 characters (so short strings don't fragment the heap) and use the heap for longer strings.
You can access a string's char array like this:
std::string myString = "Hello World";
const char *myStringChars = myString.c_str();
C++ strings can contain embedded \0 characters, know their length without counting, are faster than heap-allocated char arrays for short texts and protect you from buffer overruns. Plus they're more readable and easier to use.
However, C++ strings are not (very) suitable for usage across DLL boundaries, because this would require any user of such a DLL function to make sure he's using the exact same compiler and C++ runtime implementation, lest he risk his string class behaving differently.
Normally, a string class would also release its heap memory on the calling heap, so it will only be able to free memory again if you're using a shared (.dll or .so) version of the runtime.
In short: use C++ strings in all your internal functions and methods. If you ever write a .dll or .so, use C strings in your public (dll/so-exposed) functions.
Arkaitz is correct that string is a managed type. What this means for you is that you never have to worry about how long the string is, nor do you have to worry about freeing or reallocating the memory of the string.
On the other hand, the char[] notation in the case above has restricted the character buffer to exactly 256 characters. If you tried to write more than 256 characters into that buffer, at best you will overwrite other memory that your program "owns". At worst, you will try to overwrite memory that you do not own, and your OS will kill your program on the spot.
Bottom line? Strings are a lot more programmer friendly, char[]s are a lot more efficient for the computer.
Well, string type is a completely managed class for character strings, while char[] is still what it was in C, a byte array representing a character string for you.
In terms of API and standard library everything is implemented in terms of strings and not char[], but there are still lots of functions from the libc that receive char[] so you may need to use it for those, apart from that I would always use std::string.
In terms of efficiency of course a raw buffer of unmanaged memory will almost always be faster for lots of things, but take in account comparing strings for example, std::string has always the size to check it first, while with char[] you need to compare character by character.
I personally do not see any reason why one would like to use char* or char[] except for compatibility with old code. std::string's no slower than using a c-string, except that it will handle re-allocation for you. You can set it's size when you create it, and thus avoid re-allocation if you want. It's indexing operator ([]) provides constant time access (and is in every sense of the word the exact same thing as using a c-string indexer). Using the at method gives you bounds checked safety as well, something you don't get with c-strings, unless you write it. Your compiler will most often optimize out the indexer use in release mode. It is easy to mess around with c-strings; things such as delete vs delete[], exception safety, even how to reallocate a c-string.
And when you have to deal with advanced concepts like having COW strings, and non-COW for MT etc, you will need std::string.
If you are worried about copies, as long as you use references, and const references wherever you can, you will not have any overhead due to copies, and it's the same thing as you would be doing with the c-string.
One of the difference is Null termination (\0).
In C and C++, char* or char[] will take a pointer to a single char as a parameter and will track along the memory until a 0 memory value is reached (often called the null terminator).
C++ strings can contain embedded \0 characters, know their length without counting.
#include<stdio.h>
#include<string.h>
#include<iostream>
using namespace std;
void NullTerminatedString(string str){
int NUll_term = 3;
str[NUll_term] = '\0'; // specific character is kept as NULL in string
cout << str << endl <<endl <<endl;
}
void NullTerminatedChar(char *str){
int NUll_term = 3;
str[NUll_term] = 0; // from specific, all the character are removed
cout << str << endl;
}
int main(){
string str = "Feels Happy";
printf("string = %s\n", str.c_str());
printf("strlen = %d\n", strlen(str.c_str()));
printf("size = %d\n", str.size());
printf("sizeof = %d\n", sizeof(str)); // sizeof std::string class and compiler dependent
NullTerminatedString(str);
char str1[12] = "Feels Happy";
printf("char[] = %s\n", str1);
printf("strlen = %d\n", strlen(str1));
printf("sizeof = %d\n", sizeof(str1)); // sizeof char array
NullTerminatedChar(str1);
return 0;
}
Output:
strlen = 11
size = 11
sizeof = 32
Fee s Happy
strlen = 11
sizeof = 12
Fee
Think of (char *) as string.begin(). The essential difference is that (char *) is an iterator and std::string is a container. If you stick to basic strings a (char *) will give you what std::string::iterator does. You could use (char *) when you want the benefit of an iterator and also compatibility with C, but that's the exception and not the rule. As always, be careful of iterator invalidation. When people say (char *) isn't safe this is what they mean. It's as safe as any other C++ iterator.
Strings have helper functions and manage char arrays automatically. You can concatenate strings, for a char array you would need to copy it to a new array, strings can change their length at runtime. A char array is harder to manage than a string and certain functions may only accept a string as input, requiring you to convert the array to a string. It's better to use strings, they were made so that you don't have to use arrays. If arrays were objectively better we wouldn't have strings.