I have a code which is similar to the following:
myLibFunc(std::vector<char > &data)
{
// dosomthing with data
}
myFunc(char *buffer,int bufferSize)
{
std::vector<char > mydata(buffer,buffer+bufferSize);
myLibFunc(mydata);
}
The code works, but the vector allocates memory for itself and not using a memory that is already available.
How can I change the code in such a way that the vector uses the memory that already available and not allocating an extra memory?
Note that I can not change the signature of functions.
Update
I have two functions:
In one of them, I receive a buffer and I need to manipulate the memory and pass it to the next function as a vector. The function that I am trying to implement is part of an interface so I can not change it. Another function is a library that I need to call, so I can not change the signature of functions.
The problem is that the above code allocates new memory and copies the data from the buffer to it which is not optimal.
std::vector is designed to exclusively own the data it holds so doing the memory copy is the only safe way for std::vector to work. That leaves only unsafe hacks. If we can assume the function does not change the vector size, you can abuse std::vector. In my compiler (tested on g++4.8 and cpp.sh) std::vector is implemented as three pointers (to begin data, end used data and end alloc) therefore I can abuse the vector as:
#include <vector>
#include <iostream>
void myLibFunc( std::vector< char > & a )
{
for( char c : a )
{
std::cout << '[' << c << ']';
}
a[0] = 'B'
std::cout << '\n';
}
void myFunc(char *buffer,int bufferSize)
{
std::vector<char > mydata;
// cast to alterable pointers, cast should also keep
// mydata in scope until after the last assignment.
char ** abuser = (char**)&mydata;
// save old values and substitute new values
char *tmp0 = abuser[0];
abuser[0] = buffer;
char *tmp1 = abuser[1];
abuser[1] = buffer+bufferSize;
char *tmp2 = abuser[2];
abuser[2] = buffer+bufferSize;
myLibFunc(mydata);
// return old values to avoid crash when mydata goes out of scope.
abuser[0] = tmp0;
abuser[1] = tmp1;
abuser[2] = tmp2;
}
int main()
{
char p[] = "Hello World";
myFunc( &p[0] + 2, 5 );
std::cout << p << '\n';
return 0;
}
Note this abuse is likely to be non-portable and lead to unexplained crashes.
If you can not change the signature of your function it is not possible without the copy.
But a better way is to think about your interface. If you build myLibFunc on random access iterators, your problem is solved:
template <class CharRandomAccessIterator>
myLibFunc(CharRandomAccessIterator& begin, CharRandomAccessIterator& end )
{
// dosomthing with data
size = end - begin;
begin[xyz]; // access elements
}
myFunc(char *buffer,int bufferSize)
{
std::vector<char > mydata(buffer,buffer+bufferSize);
myLibFunc(mydata.begin(), mydata.end()); // This will work
myLibFunc(buffer, buffer+size); // This will work too
}
Related
I have a class foo that manages data using small buffer optimization (SBO).
When size < 16, the data is held locally (in buffer), otherwise it is stored on the heap, with reserved holding the allocated space.
class foo {
static const int sbo_size = 16;
long size = 0;
char *ptr;
union {
char buffer[sbo_size];
long reserved;
};
public:
foo()
{
for (int i = 0; i < sbo_size; ++i)
buffer[i] = 0;
}
void clone(const foo &f)
{
// release 'ptr' if necessary
if (f.size < sbo_size)
{
memcpy(this, &f, sizeof(foo));
ptr = buffer;
} else
{
// handle non-sbo case
}
}
};
Question about clone():
With the SBO case, it may not be clear for the compiler that union::buffer will be used.
is it correct to use memcpy and set ptr accordingly?
If you can use C++17, I would side-step any potential type-punning problems by using std::variant in place of a union.
Although this uses a small amount of storage internally to keep track of the current type it contains, it's probably a win overall as your ptr variable can disappear (although that should be inside your union anyway).
It's also typesafe, which a union is not (because std::get will throw if the variant doesn't contain the desired type) and will keep track of the type of data it contains simply by assigning to it.
The resulting class fragment might look something like this (no doubt this code can be improved):
class foo
{
private:
static const size_t sbo_size = 16;
using small_buf = std::array <char, sbo_size>;
size_t size = 0;
std::variant <small_buf, char *> buf = { };
public:
void clone (const foo &f)
{
char **bufptr = std::get_if <char *> (&buf);
if (bufptr)
delete [] *bufptr;
size = f.size;
if (size < sbo_size)
buf = std::get <small_buf> (f.buf);
else
{
buf = new char [size];
std::memcpy (std::get <char *> (buf), std::get <char *> (f.buf), size);
}
}
};
Notes:
You will see that I've used std::array instead of a C-style array because std:array has lots of nice features that C-style arrays do not
Why clone and not a copy constructor?
if you want foo to have an empty state (after being default constructed, say), then you can look into the strangely named std::monostate.
For raw storage, std::byte is probably to be preferred over char.
Fully worked example here.
Edit: To answer the question as posed, I am no language lawyer but it seems to me that, inside clone, the compiler has no clue what the active member of f might be as it has, in effect, been parachuted in from outer space.
In such circumstances, I would expect compiler writers to play it safe and set the active member of the union to "don't know" until some concrete information comes along. But (and it's a big but), I wouldn't like to bet my shirt on that. It's a complex job and compiler writers do make mistakes.
So, in a spirit of sharing, here's a slightly modified version of your original code which fixes that. I've also moved ptr inside your union since it clearly belongs there:
class foo {
static const int sbo_size = 16;
long size = 0;
union {
std::array <char, sbo_size> buffer; // changing this
char *ptr;
long reserved;
};
public:
foo()
{
for (int i = 0; i < sbo_size; ++i)
buffer[i] = 0;
}
void clone(const foo &f)
{
// release 'ptr' if necessary
if (f.size < sbo_size)
{
buffer = f.buffer; // lets me do this
ptr = buffer.data ();
} else
{
// handle non-sbo case
}
}
};
So you can see, by using std::array for buffer (rather than one of those hacky C-style arrays), you can directly assign to it (rather than having to resort to memcpy) and the compiler will then make that the active member of your union and you should be safe.
In conclusion, the question is actually rather meaningless since one shouldn't (ever) need to write code like that. But no doubt someone will immediately come up with something that proves me wrong.
I have a class called container containing a std::unique_ptr _ptr.
This _ptr must be kept alive for the whole process of the program. I cannot lose it. When Finally the container object is destroyed. It will be destroyed along with it.
At some part of the program. I need to add a character to _ptr array. For this purpouse I use the following function:
void resizeUniquePtrArray(std::unique_ptr<char[]> &ptr) {
std::unique_ptr<char[]> newptr(new char[strlen(ptr.get())+2]);
memcpy(newptr.get(), ptr.get(),strlen(ptr.get()));
newptr[strlen(newptr.get())]= 'X';
newptr[strlen(newptr.get()) + 1]= '\0';
ptr = std::move(newptr);
}
Constraints
I'm pretty sure I am doing something wrong. In my project Sometimes I lose the content of _ptr and sometimes I don't. The funny part is that valgrind gave no warnings.
I can't use a string instead of std::unique_ptr because i'll have to use later on const_cast it's like raping the string that I created.
I tried using vector and assignt it to char * by .data(). But Valgrind Went nuts and gave me all sort of read and write errors
The problem is the C library with its char* rawPointer. It's really hard to go around it. I can't just use new char. Because i will have to carry it with me through all the program.
Question
Is the resizing of the Char unique pointer array _ptr done correctly?
Source Code
#include <iostream>
#include <string>
#include <memory>
#include <string.h>
class Container {
public:
Container(const std::string &data):_data(data),_ptr(new char[data.size() + 1]) {}
std::unique_ptr<char[]> & initPtr(){
strcpy(_ptr.get(),_data.c_str());
_ptr[strlen(_ptr.get()) + 1] = '\0';
return _ptr;
}
private:
std::unique_ptr<char[]> _ptr;
std::string _data;
};
void resizeUniquePtrArray(std::unique_ptr<char[]> &ptr) {
std::unique_ptr<char[]> newptr(new char[strlen(ptr.get())+2]);
memcpy(newptr.get(), ptr.get(),strlen(ptr.get()));
newptr[strlen(newptr.get())]= 'X';
newptr[strlen(newptr.get()) + 1]= '\0';
ptr = std::move(newptr);
}
int main()
{
std::string name = "hello";
Container c(name);
std::unique_ptr<char[]> &ptr = c.initPtr();
resizeUniquePtrArray(ptr);
char* rawPointer = ptr.get();
std::cout<< rawPointer << std::endl;
//API LIBRARY CALL TO rawPointer it's a looonng process
return 0;
}
I can't use a std::string instead of std::unique_ptr because i'll have to use later on const_cast it's like raping the string that I created.
No const_cast is necessary to get a plain pointer to the underlying C-string:
std::string s("abc");
char* p = &s[0];
Or, in C++17:
char* q = s.data();
It might not be advisable according to what I have read at a couple of places (and that's probably the reason std::string doesn't do it already), but in a controlled environment and with careful usage, I think it might be ok to write a string class which can be implicitly converted to a proper writable char buffer when needed by third party library methods (which take only char* as an argument), and still behave like a modern string having methods like Find(), Split(), SubString() etc. While I can try to implement the usual other string manipulation methods later, I first wanted to ask about the efficient and safe way to do this main task. Currently, we have to allocate a char array of roughly the maximum size of the char* output that is expected from the third party method, pass it there, then convert the return char* to a std::string to be able to use the convenient methods it allows, then again pass its (const char*) result to another method using string.c_str(). This is both lengthy and makes the code look a little messy.
Here is my very initial implementation so far:
MyString.h
#pragma once
#include<string>
using namespace std;
class MyString
{
private:
bool mBufferInitialized;
size_t mAllocSize;
string mString;
char *mBuffer;
public:
MyString(size_t size);
MyString(const char* cstr);
MyString();
~MyString();
operator char*() { return GetBuffer(); }
operator const char*() { return GetAsConstChar(); }
const char* GetAsConstChar() { InvalidateBuffer(); return mString.c_str(); }
private:
char* GetBuffer();
void InvalidateBuffer();
};
MyString.cpp
#include "MyString.h"
MyString::MyString(size_t size)
:mAllocSize(size)
,mBufferInitialized(false)
,mBuffer(nullptr)
{
mString.reserve(size);
}
MyString::MyString(const char * cstr)
:MyString()
{
mString.assign(cstr);
}
MyString::MyString()
:MyString((size_t)1024)
{
}
MyString::~MyString()
{
if (mBufferInitialized)
delete[] mBuffer;
}
char * MyString::GetBuffer()
{
if (!mBufferInitialized)
{
mBuffer = new char[mAllocSize]{ '\0' };
mBufferInitialized = true;
}
if (mString.length() > 0)
memcpy(mBuffer, mString.c_str(), mString.length());
return mBuffer;
}
void MyString::InvalidateBuffer()
{
if (mBufferInitialized && mBuffer && strlen(mBuffer) > 0)
{
mString.assign(mBuffer);
mBuffer[0] = '\0';
}
}
Sample usage (main.cpp)
#include "MyString.h"
#include <iostream>
void testSetChars(char * name)
{
if (!name)
return;
//This length is not known to us, but the maximum
//return length is known for each function.
char str[] = "random random name";
strcpy_s(name, strlen(str) + 1, str);
}
int main(int, char*)
{
MyString cs("test initializer");
cout << cs.GetAsConstChar() << '\n';
testSetChars(cs);
cout << cs.GetAsConstChar() << '\n';
getchar();
return 0;
}
Now, I plan to call the InvalidateBuffer() in almost all the methods before doing anything else. Now some of my questions are :
Is there a better way to do it in terms of memory/performance and/or safety, especially in C++ 11 (apart from the usual move constructor/assignment operators which I plan to add to it soon)?
I had initially implemented the 'buffer' using a std::vector of chars, which was easier to implement and more C++ like, but was concerned about performance. So the GetBuffer() method would just return the beginning pointer of the resized vector of . Do you think there are any major pros/cons of using a vector instead of char* here?
I plan to add wide char support to it later. Do you think a union of two structs : {char,string} and {wchar_t, wstring} would be the way to go for that purpose (it will be only one of these two at a time)?
Is it too much overkill rather than just doing the usual way of passing char array pointer, converting to a std::string and doing our work with it. The third party function calls expecting char* arguments are used heavily in the code and I plan to completely replace both char* and std::string with this new string if it works.
Thank you for your patience and help!
If I understood you correctly, you want this to work:
mystring foo;
c_function(foo);
// use the filled foo
with a c_function like ...
void c_function(char * dest) {
strcpy(dest, "FOOOOO");
}
Instead, I propose this (ideone example):
template<std::size_t max>
struct string_filler {
char data[max+1];
std::string & destination;
string_filler(std::string & d) : destination(d) {
data[0] = '\0'; // paranoia
}
~string_filler() {
destination = data;
}
operator char *() {
return data;
}
};
and using it like:
std::string foo;
c_function(string_filler<80>{foo});
This way you provide a "normal" buffer to the C function with a maximum that you specify (which you should know either way ... otherwise calling the function would be unsafe). On destruction of the temporary (which, according to the standard, must happen after that expression with the function call) the string is copied (using std::string assignment operator) into a buffer managed by the std::string.
Addressing your questions:
Do you think there are any major pros/cons of using a vector instead of char* here?
Yes: Using a vector frees your from manual memory management. This is a huge pro.
I plan to add wide char support to it later. Do you think a union of two structs : {char,string} and {wchar_t, wstring} would be the way to go for that purpose (it will be only one of these two at a time)?
A union is a bad idea. How do you know which member is currently active? You need a flag outside of the union. Do you really want every string to carry that around? Instead look what the standard library is doing: It's using templates to provide this abstraction.
Is it too much overkill [..]
Writing a string class? Yes, way too much.
What you want to do already exists. For example with this plain old C function:
/**
* Write n characters into buffer.
* n cann't be more than size
* Return number of written characters
*/
ssize_t fillString(char * buffer, ssize_t size);
Since C++11:
std::string str;
// Resize string to be sure to have memory
str.resize(80);
auto newSize = fillSrting(&str[0], str.size());
str.resize(newSize);
or without first resizing:
std::string str;
if (!str.empty()) // To avoid UB
{
auto newSize = fillSrting(&str[0], str.size());
str.resize(newSize);
}
But before C++11, std::string isn't guaranteed to be stored in a single chunk of contiguous memory. So you have to pass through a std::vector<char> before;
std::vector<char> v;
// Resize string to be sure to have memor
v.resize(80);
ssize_t newSize = fillSrting(&v[0], v.size());
std::string str(v.begin(), v.begin() + newSize);
You can use it easily with something like Daniel's proposition
How would I go about allocating an array of a class without constructing the class, so I could fill up the array later?
I was originally trying to use
Myclass * array = new Myclass[N];
But it tries to construct Myclass to N.
First just declare it without allocating
Myclass * array[N];
when you need it
for(int i=0;i<N;i++){
array[i] = new Myclass(/*params*/);
}
But consider using std::vector/std::list if you must not have to manage memory yourself.
If you really want to do that, (not sure why), you could try
#include <iostream>
using namespace std;
class MyClass
{
public:
MyClass()
{ cout << "helo" << endl; }
};
int main(int argc, char *argv[])
{
int size = 4;
// Here is the trick, pointer to pointer.
MyClass **vec = new MyClass *[size];
cout << "before" << endl;
for (int i = 0; i < 4; ++i)
vec[i] = new MyClass;
// remember to free the vec
return 0;
}
Someone suggested placement new, so here it goes:
// allocate space
std::vector<unsigned char> mybuffer(N * sizeof(Myclass));
Myclass *array = reinterpret_cast<Myclass *>(&mybuffer[0]);
// when you're ready to use it
new( &array[0] ) Myclass(2);
new( &array[1] ) Myclass(3);
// etc...
// when you're done with it
array[0].~Myclass();
array[1].~Myclass();
// etc....
Of course, it is undefined behaviour to use array[x] before you have new'd it, or after you called the destructor.
This is generally something you wouldn't use as a solution to a "normal" problem. Consider actually defining a default constructor that does nothing, and having a function you call later which enhances the objects above their default state.
If you can use C++11, the optimal solution for you is probably std::vector<MyClass> with emplace-base insertions:
class MyClass {
public:
MyClass(int a, bool b, char c); // some non-default constructor
MyClass(double d); // another constructor
void bar();
};
void foo(int n) {
std::vector<MyClass> mv;
mv.reserve(n); // not even needed but beneficial if you know the final size.
// emplace_back uses perfect forwarding to call any arbitrary constructor:
mv.emplace_back(2, false, 'a');
mv.emplace_back(3, true, 'b');
mv.emplace_back(3.1415926535);
// can iterate vector easily:
for (auto &i : mv) {
i.bar();
}
// everything destructed automatically when the collection falls of scope ...
}
This creates the values in the collection directly without a copy and defers any construction of elements until you are ready, unlike new[], which makes a bunch of default objects at array-creation time. It is generally better than placement new as well, since it doesn't leave open opportunities for missed destruction or destructing an invalid memory location as well as being just easier to read.
Alternatively, you may use boost::optional.
So in your case:
std::vector<boost::optional<Myclass>> array(N);
When using arrays you can do something like
class SomeClass
{
public:
int* LockMember( size_t& numInts );
private:
int* member;
size_t numInts;
};
int* SomeClass::LockMember( size_t& out_numInts )
{
out_numInts = numInts - 1;
return member + 1;
}
To return an array offset by some amount so as to prevent someone from modifying some part of contingeous memory, or, atleast, show some intent that this part of contingeous memory of the object should remain untouched.
Since I use vectors everywhere, I am wondering if there was some way to accomplish the same sort of thing:
class SomeClass
{
public:
std::vector<int> LockMember( void );
private:
std::vector<int> member;
};
std::vector<int> SomeClass::LockMember( void )
{
// somehow make a vector with its beginning iterator pointing to member.begin() + 1
// have a size smaller by one, still the same end iterator. The vector must be
// pointing to the same data as in this class as it needs to be modifiable.
return magicOffsetVector;
}
With the commented part replaced by real code. Any ideas?
If I understand you correctly: You want some memory with two parts: At the beginning you want something that can't be touched, and after that you want something that is open for use by client code.
You could do something along the following code. This will give the client code a copy to play with. This does mean you would have to do a lot of copying, though.
class SomeClass
{
public:
std::vector<int> getMember( void ) const;
void setMember(std::vector<int> newContent);
private:
std::vector<int> member;
size_t magicOffset;
};
// Read restricted part
std::vector<int> SomeClass::getMember( void ) const
{
return vector<int>(member.begin() + magicOffset, member.end());
}
// Assign to restricted part
void SomeClass::setMember(const std::vector<int>& v)
{
std::copy(v.begin(), v.end(), member.begin() + magicOffset);
}
In order to avoid the copying, it is possible that you could allocate memory for two vectors, one for the protected part and one for the unprotected part, and use placement new to put both vectors into that memory, thus ensuring that they are in contiguous memory. And then give the client code more or less free access to the public part of the vector. However, there's still the thing with bookkeeping variables in vector, and basically this would be an awful hack that's just waiting to blow up.
However, if you only need access to the unrestricted part on a per-element basis, you could just do range-checking on the arguments, i.e.:
int getElement(size_t idx)
{
idx += magicOffset;
if (idx > member.size() || idx < 0) throw std::out_of_range("Illegal index");
return member[idx];
}
And then either provide a setElement, or return int&.