How does inlining work in LLVM? - llvm

I am trying to understand how llvm inlining work (Inliner class). The operation that I don't understand is the follow:
SmallVector<std::pair<CallSite, int>, 16> CallSites;
when SmallVector is an llvm class. In particular I don't understand what is the function of "16" in this code..

You're declaring a SmallVector of 16 elements, each element being a std::pair<CallSite, int>.
edit: As Eli has correctly pointed out, SmallVector can be dynamically resized. 16 is just the built-in size (this means that storing up to 16 elements doesn't incur in any heap allocations).

Related

How do I best force-flatten a (one dimensional) vector for N values?

I need something that behaves like an std::vector (interface/features/etc.) but I need it to be flat, i.e. it mustn't dynamically allocate a buffer. Clearly, this doesn't work in general, as the available size must be determined at compile time. But I want the type to be able to deal with N objects without additional allocations, and only if further items are pushed resort to dynamic allocation.
Some implementations of std::vector already do this, but only to the extent that it uses its existing members if the accumulated size of the content fits (I believe about three pointers-worth of payload). So, firstly, this is not a guarantee and secondly it is not configurable at compile time.
My thoughts are that I could either
A) self-cook a type (probably bad because I'd loose the ridiculous performance optimisations from vector)
B) use some sort of variant<vector<T>,array<T,N>> with an access wrapper (oh, the boilerplate)
C) come up with a MyAllocator<T,N> that has an array<T,N> member which then may be used to hold the first N items and after this defer to allocator<T> but I'm not sure if this can work because I cannot find out whether vector must permanently hold an instance of its allocator type as a member (I believe it does not)
I figure I'm not the first person to want this, so perhaps there are already approaches to this? Some empirical values or perhaps even a free library?
You might find folly/small_vector of use.
folly::small_vector is a sequence container that
implements small buffer optimization. It behaves similarly to
std::vector, except until a certain number of elements are reserved it
does not use the heap.
Like standard vector, it is guaranteed to use contiguous memory. (So,
after it spills to the heap all the elements live in the heap buffer.)
Simple usage example:
small_vector<int,2> vec;
vec.push_back(0); // Stored in-place on stack
vec.push_back(1); // Still on the stack
vec.push_back(2); // Switches to heap buffer.
// With space for 32 in situ unique pointers, and only using a
// 4-byte size_type.
small_vector<std::unique_ptr<int>, 32, uint32_t> v;
// A inline vector of up to 256 ints which will not use the heap.
small_vector<int, 256, NoHeap> v;
// Same as the above, but making the size_type smaller too.
small_vector<int, 256, NoHeap, uint16_t> v;

total memory of a C++ class object

Why does the following piece of code gives 24 as an answer? That is, how is the total size of the object of following class X, 24 bytes? I'm using 64-bit machine.
#include <bits/stdc++.h>
using namespace std;
class X
{
vector <bool> f;
int b;
public:
X(){
f.push_back(true);
}
};
int main(){
X ob;
cout<<sizeof(ob);
return 0;
}
That is, how is the total size of the object of following class X, 24
bytes? I'm using 64-bit machine.
C++ makes few guarantees about type sizes and none about the memory layout of standard containers. For questions like this, it is therefore also important to state your compiler and the options with which you invoke it.
class X
{
vector <bool> f;
int b;
public:
X(){
f.push_back(true);
}
};
You can look at the individual results for sizeof(int) and sizeof(vector<bool>). They will probably reveal the following:
8 bytes for int b.
16 bytes for vector<bool> f.
The 16 bytes for vector<bool> are harder to analyse. Several things could be stored in the object, for example:
An instance of the std::allocator<bool> that you "invisibly" pass via a default argument when you construct the vector.
A pointer to the start of the dynamically allocated memory occupied by the data to represent the vector's bool elements.
A pointer to the end of that memory (for constant-time capacity() calls).
A pointer to the last element inside that dynamically allocated memory (for constant-time size() calls).
The current element count or the current capacity count (for constant-time size and capacity() calls).
If you want to know for sure, you can probably look into your implementation's header files to see how your compiler lays out std::vector<bool> in memory.
Note that the memory layout for a std::vector<bool> can be different from all other std::vectors due to special optimisations. For example, on my machine with MSVC 2013, compiled simply with cl /EHsc /Za /W4 stackoverflow.cpp, sizeof(std::vector<bool>) is 16 whereas sizeof(std::vector<int>) is 12 [*].
Since header files internal to the implementation can be quite hard to read, an alternative way is to run your program in a debugger and inspect the object there. Here's an example screenshot from Visual Studio Express 2013:
As you can see, the sizeof(std::vector<bool>) here comes from three times sizeof(unsigned int*) for pointers to first element, last element and capacity end in the dynamically allocated memory, plus one extra sizeof(unsigned int) for the element count, which is necessary due to the aforementioned special optimisation for std::vector<bool>, which means that calculating the difference between the pointers to first and last element may not necessarily reveal the number of elements that the vector represents to outside code.
std::vector<int> does not need that special handling, which explains why it's smaller.
The inherited std::_Container_base0 is apparently not taken into account due to Empty base optimization.
All things considered, this is all quite complicated stuff. But such is the world of standard-library implementors! Remember that all things you see inside of header files are strictly internal. You cannot, for example, suppose the existence of std::_Container_base0 in your own code in any way. Pretend that it does not exist.
Coming back to your original question, the most important point is that your compiler may lay out a std::vector<bool> in any way it wants to as long as it behaves correctly to the outside world according to the C++ standard. It may also choose not to optimise std::vector<bool> at all. We cannot tell you much more without knowing more about your compiler. The information that it runs on a 64-bit machine is not enough.
[*] std::vector<bool> is supposed to be a space-efficient optimisation, but apparently in this implementation this only relates to space occupied by the dynamically allocated elements, not the static size of the vector itself.
vector maintains its own internal variables for book keeping and also allocator.
Factor in the size of int on your machine added to the size of vector and you have the sum total.
Note:
with regards to int, size of a pointer should be 8 byte on any 64-bit C/C++ compiler, but not necessarily size of int.
you can see the internals of vector(for gcc) here for a quick lookup:
https://gcc.gnu.org/onlinedocs/gcc-4.6.3/libstdc++/api/a01115_source.html

Memory allocation of C++ vector<bool>

The vector<bool> class in the C++ STL is optimized for memory to allocate one bit per bool stored, rather than one byte. Every time I output sizeof(x) for vector<bool> x, the result is 40 bytes creating the vector structure. sizeof(x.at(0)) always returns 16 bytes, which must be the allocated memory for many bool values, not just the one at position zero. How many elements do the 16 bytes cover? 128 exactly? What if my vector has more or less elements?
I would like to measure the size of the vector and all of its contents. How would I do that accurately? Is there a C++ library available for viewing allocated memory per variable?
I don't think there's any standard way to do this. The only information a vector<bool> implementation gives you about how it works is the reference member type, but there's no reason to assume that this has any congruence with how the data are actually stored internally; it's just that you get a reference back when you dereference an iterator into the container.
So you've got the size of the container itself, and that's fine, but to get the amount of memory taken up by the data, you're going to have to inspect your implementation's standard library source code and derive a solution from that. Though, honestly, this seems like a strange thing to want in the first place.
Actually, using vector<bool> is kind of a strange thing to want in the first place. All of the above is essentially why its use is frowned upon nowadays: it's almost entirely incompatible with conventions set by other standard containers… or even those set by other vector specialisations.

c++ vector of bitset pass to a function

I want to implement an algorithm in C++ that needs a dynamically assigned huge vector of bitset (512x18,000,000 bits - I have 16Gb of RAM).
a) This works fine
int nofBits=....(function read from db);
vector < bitset <nofBits> > flags;
flags.resize(512);
but how do I pass it (by reference) to a function? Keep in mind, I do not know nofBits in compile time.
I could use a
vector<vector<bool> >
but would not it be worse in terms of memory usage?
I had that same problem recently, however just like a std::array you need to know the size of the bitset at compile-time, since it's a template parameter. I found boost::dynamic_bitset as an alternative, and it worked like a charm.
std::vector<bool> is specialised to use memory efficiently. It is roughly as space efficient as std::bitset<N> (a few extra bytes because its size is dynamic and the bits live on the heap).
Note, however, that std::vector<bool> has issues, so tread lightly.

stl vector size on 64-bit machines

I have an application that will use millions of vectors.
It appears that most implementations of std::vector use 4 pointers (_First, _Last, _End, and _Alloc), which consumes 32 bytes on 64-bit machines. For most "practical" use cases of vector, one could probably get away with a single pointer and two 'unsigned int' fields to store the current size & allocated size, respectively. Ignoring the potential challenge of supporting customized allocation (instead of assuming that allocations must go through the global new & delete operator), it seems that it is possible to build an STL compliant vector class that uses only 16 bytes (or at worst 24 bytes to support the _Alloc pointer).
Before I start coding this up, 1) are there any pitfalls I should be aware of and 2) does an open source implementation exist?
You could accomplish something like this -- but it's not likely you'd gain all that much.
First, there's the performance aspect. You are trading time for memory consumption. Whatever memory you save is going to be offset by having to do the addition and a multiply on every call to end (okay, if it's a vector where sizeof(vector<t>::value_type) == 1 the multiply can be optimized out). Note that most handwritten looping code over vectors calls end on every loop iteration. On modern CPUs that's actually going to be a major win, because it allows the processor to keep more things in cache; unless those couple of extra instructions in an inner loop force the processor to swap things in the instruction cache too often)
Moreover, the memory savings is likely to be small in terms of the overall memory use in the vector, for the following reasons:
Memory manager overhead. Each allocation from the memory manager, (which vector of course needs) is going to add 16-24 bytes of overhead on its own in most memory manager implementations. (Assuming something like dlmalloc (UNIX/Linux/etc.) or RtlHeap (Windows))
Overprovisioning load. In order to achieve amortized constant insertion and removal at the end, when vector resizes, it resizes to some multiple of the size of the data in the vector. This means that the typical memory capacity vector allocates is enough for 1.6 (MSVC++) or 2 (STLPort, libstdc++) times the number of elements actually stored in the vector.
Alignment restrictions. If you are putting those many vectors into an array (or another vector), then keep in mind the first member of that vector is still a pointer to the allocated memory block. This pointer generally needs to be 8 byte aligned anyway -- so the 4 bytes you save are lost to structure padding in arrays.
I'd use the plain implementation of vector for now. If you run your code through a memory profiler and find that a significant savings would be made by getting rid of these couple of pointers, then you're probably off implementing your own optimized class which meets your performance characteristics rather than relying on the built in vector implementation. (An example of one such optimized class is std::string on those platforms that implement the small string optimization)
(Note: the only compiler of which I am aware that optimizes out the Alloc pointer is VC11, which isn't yet released. Though Nim says that the current prerelease version of libstdc++ does it as well...)
Unless these vectors are going to have contents that are extremely small, the difference between 16 vs. 32 bytes to hold the contents will be a small percentage of the total memory consumed by them. It will require lots of effort to reinvent this wheel, so be sure you're getting an adequate pay off for all that work.
BTW, there's value in education too, and you will learn a lot by doing this. If you choose to proceed, you might consider writing a test suite first, and exercise it on the current implementation and then on the one you invent.
To answer if it is worth the effort, find or write a compatible implementation that fits your needs (maybe there are other things in std::vector that you do not need), and compare the performance with std::vector<your_type> on relevant platforms. Your suggestion can at least improve performance of for the move constructor, as well as the move assignment operator:
typedef int32_t v4si __attribute__ ((vector_size (16)));
union
{
v4si data;
struct
{
T* pointer;
uint32_t length;
uint32_t capacity;
} content;
} m_data;
This only covers "sane" T (noexcept move semantices). https://godbolt.org/g/d5yU3o