char* convert()
{
std::string data = "stack\0over\0flow";
return data.c_str();
}
This will return pointer and upon building the string from it on the caller it will have stack instead of complete string. Is there a workaround for this with out changing input and return types in c++ 11?
I want to rebuild the entire string on the caller side from char*.
Convert std::string to char* when string has nulls in middle
In order to convert a std::string that has nulls in the middle to a char*, you must first have a std::string that has nulls in the middle. You don't have such string.
Because you used the constructor std::string(const char*), the string that you created treated the passed pointer as a pointer to first element of a null terminated string, and as such the std::string only contains "stack".
You can use:
const auto& str = "stack\0over\0flow";
std::string data(str, std::size(str) - 1);
This will return stack instead of complete string
If the string were to actually contain "stack\0over\0flow", then c_str will return a pointer to the first element of the complete string "stack\0over\0flow".
If you treat the pointer as a pointer to null terminated string, then the first null terminator character terminates the null terminated string. There is no way to avoid that if you treat the pointer as a pointer to null terminated string. So, if you wish to avoid the string being terminated by the first null terminator character, then don't treat it as a pointer to a null terminated string (such as when you used the string literal as a pointer to null terminated string in your example).
However, that's mostly a moot issue since the pointed string will have been deallocated and the returned pointer will be dangling when the function returns. Attempting to access through the danging pointer will result in undefined behaviour.
Furthermore, c_str always returns a const char* and never char*.
To be able to use the full string with the \0's safely. you must put the data in a buffer.
E.g. like this :
#include <iostream>
#include <array>
#include <vector>
template<std::size_t N>
auto make_buffer(const char(&chars)[N])
{
// note returing an object uses RVO and nothing is left dangling.
std::array<char, N> buffer{};
for (std::size_t n = 0; n < N; ++n) buffer[n] = chars[n];
return buffer;
}
int main()
{
auto buffer = make_buffer("stack\0over\0flow");
// this is the closest you can get to having a char*
// pointing to ALL the data.
char* data_ptr = buffer.data();
// but you still must rely on the buffer size
// for correct looping over the valid values!
for (std::size_t n = 0; n < buffer.size(); ++n)
{
std::cout << data_ptr[n];
}
std::cout << std::endl;
// But with a buffer like this a range based for loop is recommended
for (const auto c : buffer)
{
std::cout << c;
}
std::cout << std::endl;
return 0;
}
Related
What I am trying to do is to save multiple different pointers to unique wchar_t strings into a vector. My current code is this:
std::vector<wchar_t*> vectorOfStrings;
wchar_t* bufferForStrings;
for (i = 0, i > some_source.length; i++) {
// copy some string to the buffer...
vectorOfStrings.push_back(bufferForStrings);
}
This results in bufferForStrings being added to the vector again and again, which is not what I want.
RESULT:
[0]: (pointer to buffer)
[1]: (pointer to buffer)
...
What I want is this:
[0]: (pointer to unique string)
[1]: (pointer to other unique string)
...
From what I know about this type of string, the pointer points to the beginning of an array of characters which ends in a null terminator.
So, the current code effectively results in the same string being copied to the buffer again and again. How do I fix this?
The simplest way is to use std:wstring, provided by the STL, as the type for your vector's elements. You can use the constructor that class provides to implicitly copy the contents of your wchar_t*-pointed buffer to the vector (in the push_back() call).
Here's a short demo:
#include <string>
#include <vector>
#include <iostream>
int main()
{
wchar_t test[][8] = { L"first", L"second", L"third", L"fourth" };
std::vector<std::wstring> vectorOfStrings;
wchar_t* bufferForStrings;
size_t i, length = 4;
for (i = 0; i < length; i++) {
// copy some string to the buffer...
bufferForStrings = test[i];
vectorOfStrings.push_back(bufferForStrings);
}
for (auto s : vectorOfStrings) {
std::wcout << s << std::endl;
}
return 0;
}
Further, if you later need access to the vector's elements as wchar_t* pointers, you can use each element's c_str() member function to retrieve such a pointer (though that will be const qualified).
There are other methods, if you want to avoid using the std::wstring class; for 'ordinary' char* buffers, you could use the strdup() function to create a copy of the current buffer, and send that to push_back(). Unfortunately, the equivalent wcsdup() function is not (yet) part of the standard library (though Microsoft and others have implemented it).
When you pass char arrays as arguments and try to find the length of the array, it returns the length without the null operator?
For example, if I passed charArray[4] = "aaa" and found the length of this using strlen, the returned value would be 3. Why is this so?
More detailed example below:
#include <iostream>
using namespace std;
int main() {
void function(char[]);
char charArray[4] = "aaa";
function(charArray);
cin.get();
return 0;
}
void function (char *array)
{
size_t index = 0;
index = strlen(array);
std::cout << index; // prints value: 3
}
You are confusing char arrays with the behavior of string literals, c-style strings respectively.
strlen() operates on NUL terminated character arrays and doesn't count the terminating \0 character by definition:
Returns the length of the given null-terminated byte string, that is, the number of characters in a character array whose first element is pointed to by str up to and not including the first null character.
The behavior is undefined if str is not a pointer to a null-terminated byte string.
To get the size of an array use sizeof() like so:
char arr[4] = "abc";
cout << sizeof(arr) << endl;
You should note that the above sample will not give you correct results, as soon the array is decayed to a pointer that is passed to a function:
char arr[4] = "abc";
void func(char* arr)
{
cout << sizeof(arr) << endl; // Prints the size of the pointer variable itself
}
Such functions need to get the array size from an extra parameter:
void func(char* arr, size_t arrsize)
{
// ...
}
If you think of the null as more of an implementation detail, that might help.
If you know std::string at all, you want length() to return the actual length. Every other major language has a string of some sort, and they all have a function to get the length.
Also, you really want strlen(a+b) == strlen(a) + strlen(b), otherwise a lot of operations get a bit more complicated. You only want to count a null once. When you strcat.
strlen does not return the length with the null character because the null character determines where the C string ends in memory.
(ie it is not part of the string, it tells the runtime where the string ends).
See:
https://en.wikipedia.org/wiki/String_%28computer_science%29#Null-terminated
https://en.wikipedia.org/wiki/Null-terminated_string
I'm just starting c++ and am having difficulty understanding const char*. I'm trying to convert the input in the method to string, and then change the strings to add hyphens where I want and ultimately take that string and convert it back to char* to return. So far when I try this it gives me a bus error 10.
char* getHyphen(const char* input){
string vowels [12] = {"A","E","I","O","U","Y","a","e","i","o","u","y"};
//convert char* to string
string a;
int i = 0;
while(input != '\0'){
a += input[i];
input++;
i++;
}
//convert a string to char*
return NULL;
}
A: The std::string class has a constructor that takes a char const*, so you simply create an instance to do your conversion.
B: Instances of std::string have a c_str() member function that returns a char const* that you can use to convert back to char const*.
auto my_cstr = "Hello"; // A
std::string s(my_cstr); // A
// ... modify 's' ...
auto back_to_cstr = s.c_str(); // B
First of all, you don't need all of that code to construct a std::string from the input. You can just use:
string a(input);
As far as returning a new char*, you can use:
return strdup(a.c_str()); // strdup is a non-standard function but it
// can be easily implemented if necessary.
Make sure to deallocate the returned value.
It will be better to just return a std::string so the users of your function don't have to worry about memory allocation/deallocation.
std::string getHyphen(const char* input){
Don't use char*. Use std::string, like all other here are telling you. This will eliminate all such problems.
However, for the sake of completeness and because you want to understand the background, let's analyse what is going on.
while(input != '\0'){
You probably mean:
while(*input != '\0') {
Your code compares the input pointer itself to \0, i.e. it checks for a null-pointer, which is due to the unfortunate automatic conversion from a \0 char. If you tried to compare with, say, 'x' or 'a', then you would get a compilation error instead of runtime crashes.
You want to dereference the pointer via *input to get to the char pointed to.
a += input[i];
input++;
i++;
This will also not work. You increment the input pointer, yet with [i] you advance even further. For example, if input has been incremented three times, then input[3] will be the 7th character of the original array passed into the function, not the 4th one. This eventually results in undefined behaviour when you leave the bounds of the array. Undefined behaviour can also be the "bus error 10" you mention.
Replace with:
a += *input;
input++;
i++;
(Actually, now that i is not used any longer, you can remove it altogether.)
And let me repeat it once again: Do not use char*. Use std::string.
Change your function declaration from
char* getHyphen(const char* input)
to
auto hyphenated( string const& input )
-> string
and avoid all the problems of conversion to char const* and back.
That said, you can construct a std::string from a char_const* as follows:
string( "Blah" )
and you get back a temporary char const* by using the c_str method.
Do note that the result of c_str is only valid as long as the original string instance exists and is not modified. For example, applying c_str to a local string and returning that result, yields Undefined Behavior and is not a good idea. If you absolutely must return a char* or char const*, allocate an array with new and copy the string data over with strcpy, like this: return strcpy( new char[s.length()+1], s.c_str() ), where the +1 is to accomodate a terminating zero-byte.
In a custom string class called Str I have a function c_str() that just returns the private member char* data as const char* c_str() const { return data; }. This works when called after I create a new Str but if I then overwrite the Str using cin, calling c_str() on it only sometimes works, but always works if I cin a bigger Str than the original.
Str b("this is b");
cout << b.c_str() << endl;
cin >> b;
cout << b.c_str() << endl;
Here the first b.c_str() works but if I attempt to change Str b to just 'b' on the cin >> b; line then it outputs 'b' + a bit of garbage. But if I try to change it to 'bb' it usually works, and if I change it to something longer than "this is b", it always works.
This is odd because my istream operator (which is friended) completely deallocates the Str and ends up allocating a new char array only 1 char larger for each char it reads in (just to see if it would work, it doesn't). So it seems like returning the array after reading in something else would return the new array that data is set it.
Relevant functions:
istream& operator>>(istream& is, Str& s) {
delete[] s.data;
s.data = nullptr;
s.length = s.limit = 0;
char c;
while (is.get(c) && isspace(c)) ;
if (is) {
do s.push_back(c);
while (is.get(c) && !isspace(c));
if (is)
is.unget();
}
return is;
}
void Str::push_back(char c) {
if (length == limit) {
++limit;
char* newData = new char[limit];
for (size_type i = 0; i != length; ++i)
newData[i] = data[i];
delete[] data;
data = newData;
}
data[length++] = c;
}
With push_back() like this, the array never has a capacity larger than what it holds, so I don't see how my c_str() could output any memory garbage.
Based on the push_back() in the question and the c_str() in the comment, there is no guarantee that the C-string returned from c_str() is null-terminated. Since a char const* doesn't know the length of the string without the null-terminator this is the source of the problem!
When allocating small memory objects you probably get back one of the small memory object previously used by you string class and that contains non-null characters, causing the printed character appear as if it is of what is the length to first null byte found. When allocating bigger chunks you seem to get back "fresh" memory which still contains null character, making the situation appear as if all is OK.
There are basically two ways to fix this problem:
Add a null-terminator before returning a char const* from c_str(). If you don't care multi-threading for now, this can be done in the c_str() function. In contexts where multi-threading matters it is probably a bad idea to make any mutations in const member functions as these would introduce data races. Thus, the C++ standard string classes add the null-terminator in one of the mutating operations.
Do not support a c_str() function at all but rather implement an output operator for your string class. This way, no null-termination is needed.
I have a situation in which I'm performing binary serialization of a some items and I'm writing them to an opaque byte buffer:
int SerializeToBuffer(unsigned char* buffer)
{
stringstream ss;
vector<Serializable> items = GetSerializables();
string serializedItem("");
short len = 0;
for(int i = 0; i < items.size(); ++i)
{
serializedItem = items[i].Serialize();
len = serializedItem.length();
// Write the bytes to the stream
ss.write(*(char*)&(len), 2);
ss.write(serializedItem.c_str(), len);
}
buffer = reinterpret_cast<unsigned char*>(
const_cast<char*>(ss.str().c_str()));
return items.size();
}
Is it safe to remove the const-ness from the ss.str().c_str() and then reinterpret_cast the result as unsigned char* then assign it to the buffer?
Note: the code is just to give you an idea of what I'm doing, it doesn't necessarily compile.
No removing const-ness of an inherently contant string will result in Undefined Behavior.
const char* c_str ( ) const;
Get C string equivalent
Generates a null-terminated sequence of characters (c-string) with the same content as the string object and returns it as a pointer to an array of characters.
A terminating null character is automatically appended.
The returned array points to an internal location with the required storage space for this sequence of characters plus its terminating null-character, but the values in this array should not be modified in the program and are only guaranteed to remain unchanged until the next call to a non-constant member function of the string object.
Short answer: No
Long Answer: No. You really can't do that. The internal buffer of those object belong to the objects. Taking a reference to an internal structure is definitely a no-no and breaks encapsulation. Anyway those objects (with their internal buffer) will be destroyed at the end of the function and your buffer variable will point at uninitialized memory.
Use of const_cast<> is usually a sign that something in your design is wrong.
Use of reinterpret_cast<> usually means you are doing it wrong (or you are doing some very low level stuff).
You probably want to write something like this:
std::ostream& operator<<(std::ostream& stream, Data const& serializable)
{
return stream << serializable.internalData;
// Or if you want to write binary data to the file:
stream.write(static_cast<char*>(&serializable.internalData), sizeof(serializable.internalData);
return stream;
}
This is unsafe, partially because you're stripping off const, but more importantly because you're returning a pointer to an array that will be reclaimed when the function returns.
When you write
ss.str().c_str()
The return value of c_str() is only valid as long as the string object you invoked it on still exists. The signature of stringstream::str() is
string stringstream::str() const;
Which means that it returns a temporary string object. Consequently, as soon as the line
ss.str().c_str()
finishes executing, the temporary string object is reclaimed. This means that the outstanding pointer you received via c_str() is no longer valid, and any use of it leads to undefined behavior.
To fix this, if you really must return an unsigned char*, you'll need to manually copy the C-style string into its own buffer:
/* Get a copy of the string that won't be automatically destroyed at the end of a statement. */
string value = ss.str();
/* Extract the C-style string. */
const char* cStr = value.c_str();
/* Allocate a buffer and copy the contents of cStr into it. */
unsigned char* result = new unsigned char[value.length() + 1];
copy(cStr, cStr + value.length() + 1, result);
/* Hand back the result. */
return result;
Additionally, as #Als has pointed out, the stripping-off of const is a Bad Idea if you're planning on modifying the contents. If you aren't modifying the contents, it should be fine, but then you ought to be returning a const unsigned char* instead of an unsigned char*.
Hope this helps!
Since it appears that your primary consumer of this function is a C# application, making the signature more C#-friendly is a good start. Here's what I'd do if I were really crunched for time and didn't have time to do things "The Right Way" ;-]
using System::Runtime::InteropServices::OutAttribute;
void SerializeToBuffer([Out] array<unsigned char>^% buffer)
{
using System::Runtime::InteropServices::Marshal;
vector<Serializable> const& items = GetSerializables();
// or, if Serializable::Serialize() is non-const (which it shouldn't be)
//vector<Serializable> items = GetSerializables();
ostringstream ss(ios_base::binary);
for (size_t i = 0u; i != items.size(); ++i)
{
string const& serializedItem = items[i].Serialize();
unsigned short const len =
static_cast<unsigned short>(serializedItem.size());
ss.write(reinterpret_cast<char const*>(&len), sizeof(unsigned short));
ss.write(serializedItem.data(), len);
}
string const& s = ss.str();
buffer = gcnew array<unsigned char>(static_cast<int>(s.size()));
Marshal::Copy(
IntPtr(const_cast<char*>(s.data())),
buffer,
0,
buffer->Length
);
}
To C# code, this will have the signature:
void SerializeToBuffer(out byte[] buffer);
Here is the underlying problem:
buffer = ... ;
return items.size();
In the second-to last line you're assigning a new value to the local variable that used (up until that point) to hold the pointer your function was given as an argument. Then, immediately after, you return from the function, forgetting everything about the variable you just assigned to. That does not make sense!
What you probably want to do is to copy data from the memory pointed to by ss_str().c_str() to the memory pointed to by the pointer stored in buffer. Something like
memcpy(buffer, ss_str().s_str(), <an appropriate length here>)