While reading this I was amazed on what a certain level of metaprogramming can do for your class layout. I must admit that I don't fully grasp what's the proposed optimal layout, if I had to state what I understood it would be this :
ordering class member by descending alignment i.e. the type with the greatest alignof result goes first etc
Feel free to correct me if I got this wrong (if you had a short explanation of why this happens it would be even better, I couldn't copy paste large chunks of the rationale in my question), but my question is on another topic :
Does any library implementation of std::tuple have such an optimization of layout?
If not, are there any standard algebraic data types that do so, is there another way to do this for my class apart from writing such a machinery ?
No library implementation I'm aware of optimizes layout for alignment. You can use a program such as this to inspect a tuple layout:
#include <iostream>
#include <tuple>
struct empty {};
int
main()
{
using T = std::tuple<double, int, empty, short, long>;
T t{};
std::cout << &t << '\n';
std::cout << &std::get<0>(t) << '\n';
std::cout << &std::get<1>(t) << '\n';
std::cout << &std::get<2>(t) << '\n';
std::cout << &std::get<3>(t) << '\n';
std::cout << &std::get<4>(t) << '\n';
std::cout << &t+1 << '\n';
std::cout << sizeof(T) << '\n';
}
libc++ stores elements in order of declaration, and optimizes space away for empty members. Empty members are shunted towards the front. Sample output:
0x7fff5ccf39f8
0x7fff5ccf39f8
0x7fff5ccf3a00
0x7fff5ccf39f8
0x7fff5ccf3a04
0x7fff5ccf3a08
0x7fff5ccf3a10
24
libstdc++ stores elements in reverse order of declaration, and optimizes space away for empty members. Empty members are shunted towards the front. Sample output:
0x7ffe4fc5b2a0
0x7ffe4fc5b2b0
0x7ffe4fc5b2ac
0x7ffe4fc5b2a0
0x7ffe4fc5b2a8
0x7ffe4fc5b2a0
0x7ffe4fc5b2b8
24
VS-2015 stores elements in reverse order of declaration and does not optimize away the space for empty members. Sample output:
0306FEF4
0306FF04
0306FF00
0306FEFC
0306FEF8
0306FEF4
0306FF0C
24
In this example we see that optimizing the space away for the empty member didn't buy anything since it fits in an area of padding anyway.
There are no facilities which automate the task of reducing padding in the standard.
Related
I have a question about how global variables work in c++.
I know global variables are evil so that is not the point of this question. I just want a deeper understanding of what is happening in the following example.
#include <cstdint>
#include <iostream>
#include <vector>
class B
{
public:
B();
~B();
private:
static std::vector<int32_t> v_;
};
B::B()
{
std::cout << "v_.size() = " << v_.size() << std::endl;
for (auto i : v_)
std::cout << i << std::endl;
}
B::~B()
{
std::cout << "v_.size() = " << v_.size() << std::endl;
std::cout << "v_[0] = " << v_[0] << std::endl;
std::cout << "v_.at(0) = " << v_.at(0) << std::endl;
for (auto i : v_)
std::cout << i << std::endl;
}
B b;
std::vector<int32_t> B::v_ { 5, -7 };
int main()
{
return 0;
}
Gives this output:
$ ./test
v_.size() = 0
v_.size() = 2
v_[0] = 0
v_.at(0) = 0
0
0
Why is the size of the vector in the destructor of B still 2?
When I access the elements of the vector I get random memory, which I sort of understand because the vector gets cleaned up before B. But to me the size of the vector should be 0 or even way better throw some sort of error when asking for the size. Even when using at() function It doesn't throw an error because the size is still 2.
I also know I can fix this by switching the initialization of b and the vector. My question is more why this specific example doesn't throw some sort of error, cause in my opinion it should.
Note: like my comment: why does this behavior fall under undefined behavior instead of reading or writing an illegal memory location since the vector doesn't exist at that point? I was kinda thinking this would/should generate a seg fault and I don't understand why it doesn't
Undefined behavior means that the behavior of your program is not defined by the C++ standard. A conforming C++ compiler can do anything with a program that has, or will, exhibit undefined behavior (yes, UB can time travel).
Your program exhibits undefined behavior by accessing v_ as an object prior to it being constructed in B::B. Given that it does this, nothing about your programs execution is specified or constrained by the C++ standard.
In this case, the compiler treats the UB access as if it was accessing an empty std::vector. This is valid, because anything is valid. The program then proceeds as-if you hadn't done the UB (other than the above symptom), which is also a valid option.
If we imagine removing the UB in the ctor, during destruction your program again exhibits UB. This time by accessing v_ as a vector object after it was destroyed. Again, by doing this the behavior of your program is not defined or constrained by the C++ standard, before, at, and after the UB.
In this case, it behaves as if you have a vector of 2 values whose values are 0. That is conforming, because anything is conforming.
One of many possibilities is that the data was recycled on the heap, but the pointers where left dangling. Treating the vector's "rotted" data as pointers they still point 2 sizeof(int) apart, and .size() reads that as 2. The data pointed to, however, has been recycled on the heap, and there is different data there.
I had 2 questions regarding the std::shared_ptr control block:
(1) Regarding size:
How can I programatically find the exact size of the control block for a std::shared_ptr?
(2) Regarding logic:
Additionally, boost::shared_ptr mentions that they are completely lock-free with respect to changes in the control block.(Starting with Boost release 1.33.0, shared_ptr uses a lock-free implementation on most common platforms.) I don't think std::shared_ptr follows the same - is this planned for any future C++ version? Doesn't this also mean that boost::shared_ptr is a better idea for multithreaded cases?
(1) Regarding size: How can I programatically find the exact size of the control block for a std::shared_ptr?
There is no way. It's not directly accessible.
(2) Regarding logic: Additionally, boost::shared_ptr mentions that they are completely lock-free with respect to changes in the control block.(Starting with Boost release 1.33.0, shared_ptr uses a lock-free implementation on most common platforms.) I don't think std::shared_ptr follows the same - is this planned for any future C++ version? Doesn't this also mean that boost::shared_ptr is a better idea for multithreaded cases?
Absolutely not. Lock-free implementations are not always better than implementations that use locks. Having an additional constraint, at best, doesn't make the implementation worse but it cannot possibly make the implementation better.
Consider two equally competent programmers each doing their best to implement shared_ptr. One must produce a lock-free implementation. The other is completely free to use their best judgment. There is simply no way the one that must produce a lock-free implementation can produce a better implementation all other things being equal. At best, a lock-free implementation is best and they'll both produce one. At worse, on this platform a lock-free implementation has huge disadvantages and one implementer must use one. Yuck.
The control block is not exposed. In implementations I have read it is dynamic in size to store the deleter contiguously (and/or, in the case of make shared, the object itself).
In general it contains at least 3 pointer-size fields - weak, strong count, and deleter invoker.
At least one implementation relies on RTTI; others do not.
Operations on the count use atomic operations in the implementations I have read; note that C++ does not require atomic operatins to all be lock free (I believe a platform that doesn't have pointer-size lock-free operations can be a conforming C++ platform).
Their state is are consistent with each other and themselves, but no attempt to make them consistent with object state occurs. This is why using raw shared ptrs as copy on write pImpls may be error prone on some platforms.
(1)
Of course it is best to check implementation, however you still may make some checks from your program.
Control block is allocated dynamically, so to determine its size you may overload new operator.
Then what you may also check is if std::make_shared provides you with some optimization of control block size.
In proper implementation I would expect that this will make two allocations (objectA and control block):
std::shared_ptr<A> i(new A());
However this will make only one allocation (and then objectA initialized with placement new):
auto a = std::make_shared<A>();
Consider following example:
#include <iostream>
#include <memory>
void * operator new(size_t size)
{
std::cout << "Requested allocation: " << size << std::endl;
void * p = malloc(size);
return p;
}
class A {};
class B
{
int a[8];
};
int main()
{
std::cout << "Sizeof int: " << sizeof(int) << ", A(empty): " << sizeof(A) << ", B(8 ints): " << sizeof(B) << std::endl;
{
std::cout << "Just new:" << std::endl;
std::cout << "- int:" << std::endl;
std::shared_ptr<int> i(new int());
std::cout << "- A(empty):" << std::endl;
std::shared_ptr<A> a(new A());
std::cout << "- B(8 ints):" << std::endl;
std::shared_ptr<B> b(new B());
}
{
std::cout << "Make shared:" << std::endl;
std::cout << "- int:" << std::endl;
auto i = std::make_shared<int>();
std::cout << "- A(empty):" << std::endl;
auto a = std::make_shared<A>();
std::cout << "- B(8 ints):" << std::endl;
auto b = std::make_shared<B>();
}
}
The output I received (of course it is hw architecture and compiler specific):
Sizeof int: 4, A(empty): 1, B(8 ints): 32
Just new:
- int:
Requested allocation: 4
Requested allocation: 24
First allocation for int - 4 bytes, next one for control block - 24 bytes.
- A(empty):
Requested allocation: 1
Requested allocation: 24
- B(8 ints):
Requested allocation: 32
Requested allocation: 24
Looks that control block is (most probably) 24 bytes.
Here is why to use make_shared:
Make shared:
- int:
Requested allocation: 24
Only one allocation, int + control block = 24 bytes, less then before.
- A(empty):
Requested allocation: 24
- B(8 ints):
Requested allocation: 48
Here one could expect 56 (32+24), but it looks that implementation is optimized. If you use make_shared - pointer to actual object is not needed in control block and its size is only 16 bytes.
Other possibility to check the size of control block is to:
std::cout<< sizeof(std::enable_shared_from_this<int>);
In my case:
16
So I would say that the size of control block in my case is 16-24 bytes, depending on how it was created.
#include <iostream>
using namespace std;
class Empty{
char omg[0];
};
int main()
{
Empty em1, em2;
Empty set[100];
cout << sizeof(Empty) << " " << sizeof(em1) << " " << sizeof(em2) << endl;
cout << (long*)&em1 << " " << (long*)&em2 << endl;
cout << "total numbers of element is: " << sizeof(set)/sizeof(*set) << endl;
return 0;
}
Its output is:
0 0 0
0xbff36ad0 0xbff36ac8
numbers of elements is: 4
The results are so surprising.
As shown above, Empty is a class, the size of it and its objects are all 0, why?
Maybe I guess, because a empty class's size is 1, and when the class is not empty, its size is decided by is members, but here its member is special, it is a Arrays of Length Zero, and this array's size is 0, so the size of class and objects are all 0.
It's just my guess. As the program running, we can see that two objects both have address, and the address is different.
Here is my question: if object of 0 size can be implemented, Why the C++ standard states that empty objects have sizeof() = 1, it is for "To ensure that the addresses of two different objects will be different"Why is the size of an empty class not zero? , but now, we do have different address as the output,how does this happen?
Further more, no matter what the size of the array set is, the last line output is always 4, why?
Thanks :)
PS: I run this program on MacOS, and the compiler is Apple LLVM version 5.1 (clang-503.0.40) (based on LLVM 3.4svn)
I'll take a stab since no one more experienced has:
As shown above, Empty is a class, the size of it and its objects are all 0, why?
Zero-sized arrays are prohibited by the standard, therefore as far as the standard is concerned sizeof(Empty) is a meaningless expression, you are already in the realm of undefined behaviour.
Here is my question: if object of 0 size can be implemented, [...] Why is the size of an empty class not zero? , but now, we do have different address as the output,how does this happen?
As above, an object of size 0 cannot exist in a valid standard c++ program (with the exception of base class subobjects).
Your compiler allows this as an extension to the standard, and as long as you use this extension within the scope it was intended for (i.e. as a pre-flexible array member hack) you shouldn't have any problems, although your code is not portable. Your example above however is not how zero-sized arrays are meant to be used (not to mention there are better constructs in c++ for handling these situations anyway).
Your compiler is intelligent enough to provide separate addresses for em1 and em2, but you should find that all elements of set have in fact the same address.
Further more, no matter what the size of the array set is, the last line output is always 4, why?
Since your compiler considers sizeof(Empty) and arrays of Empty to be zero, you are dividing by zero, which is undefined behavior. You might find your program crashes if you disable optimizations, with GCC for instance your program crashes with -O0 but not with -O1.
I did not really expect the following example to work, but indeed it does (g++ 4.6.4, with --std=c++0x):
#include <boost/multiprecision/float128.hpp>
#include <blitz/array.h>
#include <fftw3.h>
int main(int /*argc*/, char** /*argv*/)
{
//these are the same
std::cout << sizeof(std::complex<boost::multiprecision::float128>) << " " << sizeof(fftwq_complex) << std::endl;
typedef std::vector< std::complex<boost::multiprecision::float128> > boost128cvec;
//typedef std::vector<std::complex<boost::multiprecision::float128> , fftw::allocator< std::complex<boost::multiprecision::float128> > > boost128cvec;
//declare a std::vector consisting of std::complex<boost::multiprecision::float128>
boost128cvec test_vector3(12);
//casting its data storatge to fftwq_complex*
fftwq_complex* test_ptr3 = reinterpret_cast<fftwq_complex*>(test_vector3.data());
//also create a view to the same data as a blitz::Array
blitz::Array<std::complex<boost::multiprecision::float128>, 1> test_array3(test_vector3.data(), blitz::TinyVector<int, 1>(12), blitz::neverDeleteData);
test_vector3[3] = std::complex<boost::multiprecision::float128>(1.23,4.56);
//this line would not work with std::vector
test_array3 = sin(test_array3);
//this line would not work with the built-in type __float128
test_vector3[4] = sin(test_vector3[3]);
//all of those print the same numbers
std::cout << "fftw::vector: " << test_vector3[3].real() << " + i " << test_vector3[3].imag() << std::endl;
std::cout << "fftw_complex: " << (long double)test_ptr3[3][0] << " + i " << (long double)test_ptr3[3][1] << std::endl;
std::cout << "blitz: " << test_array3(3).real() << " + i " << test_array3(3).imag() << std::endl << std::endl;
}
Two remarks:
The goal is to be able to use both fftw and blitz::Array operations on the same data without the need to copy them around while at the same time being able to use generic funcionst like sin() also for complex variables with quad precision
The blitz-part works fine, which is expected. But the surprise (to me) was, that the fftwq_complex* part also works fine.
The fftw::allocator is a simple replacement to std::allocator which will use fftwq_malloc to assure correct simd alignment, but that is not important for this question, so I left it out (at least I think that this is not important for this question)
My Question is: How thin is the ice I'm stepping on?
You're pretty much save:
std::vector is compatible with a C array (you can access a pointer to the first element via vector.data(), as answered in this question
std::complex<T> is designed to be compatible with a Array of form T[2], which is compatible with FFTW. This is described in the FFTW documentation
C++ has its own complex template class, defined in the standard header file. Reportedly, the C++ standards committee has recently agreed to mandate that the storage format used for this type be binary-compatible with the C99 type, i.e. an array T[2] with consecutive real [0] and imaginary [1] parts. (See report http://www.open-std.org/jtc1/sc22/WG21/docs/papers/2002/n1388.pdf WG21/N1388.) Although not part of the official standard as of this writing, the proposal stated that: “This solution has been tested with all current major implementations of the standard library and shown to be working.” To the extent that this is true, if you have a variable complex *x, you can pass it directly to FFTW via reinterpret_cast(x).
The only thing to keep in mind is that the data() gets invalidated if you add values to your vector.
For the last part there is the compatiblity between boost::multiprecision::float128 and __float128. The boost documentation gives no guarantee about this.
What can be done however, is to add some static asserts in your code, which fails if the conversion is not possible. This could look like this:
static_assert(std::is_standard_layout<float128>::value,"no standard type");
static_assert(sizeof(float128) == sizeof(__float128),"size mismatch");
Where sizeof guarantees the same size of the boost type and __float128, and is_standard_layout checks that:
A pointer to a standard-layout class may be converted (with reinterpret_cast) to a pointer to its first non-static data member and vice versa.
Of course, this only gives hints if it works in the end, as you cannot say if the type is really a __float128, but ab boost states their type is a thin wrapper around it, it should be fine. If their are changes in design or structure of float128, the static assertions should fail.
In this example, I create a vector with one integer in it and then I erase that integer from the vector. The size of the vector decreases, but the integer is still there! Why is the integer still there? How is it possible for a vector of size 0 to contain elements?
#include <vector>
#include <iostream>
using namespace std;
int main(int agrc, char* argv[])
{
vector<int> v;
v.push_back(450);
cout << "Before" << endl;
cout << "Size: " << v.size() << endl;
cout << "First element: " << (*v.begin()) << endl;
v.erase(v.begin());
cout << "After" << endl;
cout << "Size: " << v.size() << endl;
cout << "First element: " << *(v.begin()) << endl;
return(0);
}
output:
Before
Size: 1
First element: 450
After
Size: 0
First element: 450
You are invoking undefined behavior by dereferencing an invalid memory location. Normally, the heap manager will not immediately free the memory deleted using delete for efficiency purposes. However, that doesn't mean that you can access that memory location, heap manager can use this memory location for other purposes whenever it likes. So your program will behave unpredictably if you dereference a invalid memory location.
IIRC a vector doesn't release space unless specifically told to, so you're seeing an item which is still in its memory but not being tracked by the vector. This is part of the reason why you're supposed to check the size first (the other being that if you never assigned anything, you'll be dereferencing a garbage pointer).
To start, don't count on it being this way across all systems. How a vector works internally is completely implementation-dependent. By dereferencing an invalid memory location, you're circumventing the behavior that has been outlined in the documentation.
That is to say, you can only count on behavior working that is outlined in the STL docs.
The reason you can still access that memory location is because that particular implementation you are using doesn't immediately delete memory, but keeps it around for awhile(probably for performance purposes). Another implementation could very well delete that memory immediately if the author so desired.
It is just that the vector has not freed the memory, but kept it around for future use.
This is what we call "undefined behaviour" There is no guarantee that it will work next time and it may easily crash the program on a future attempt. Don't do it.
What are your compiler options? I get a crash with the usual
options, with both of the compilers I regularly use (g++ and
VC++). In the case of g++, you have to set some additional
options (-D_GLIBCXX_DEBUG, I think) for this behavior; as far as
I can tell, it's the default for VC++. (My command for VC++ was
just "cl /EHs bounds.cc".)
As others have said, it's undefined behavior, but with a good
compiler, it will be defined to cause the program to crash.