I am working on some core audio code and have a problem that could be solved by a variable array in a struct--a la Flexible Array Members. In doing a bit of looking around, I see that there is a lot of dialogue about the portability and viability of Flexible Member Arrays.
From what I understand, Objective-C is C99 compliant. For this reason, I think Flexible Array Members should be a fine solution. I also see that Flexible Array Members are not a good idea in C++.
What to do in Objective-C++? Technically, I won't use it in Objective-C++. I am writing callbacks that are C and C++ based... That seems like a point against.
Anyway, can I (should I) do it? If not, is there another technique with the same results?
You can always just declare a trailing array of size 1. In the worst case here, you waste a pretty small amount of memory, and it is very slightly more complicated to compute the right size for malloc.
don't bother. it's not compatible. it is messy and error prone. c++ had solutions which are managed more easily long before this feature existed. what are you tacking onto the end of your struct? normally, you'll just use something like a std::vector, std::array, or fixed size array.
UPDATE
I want to have a list of note start times (uint64_t) and iterate through them to see which, if any, is playing. i was going to add a count var to the struct to track how many items are in the flexible array.
ok, then a fixed size array should be fine if you have fixed polyphony. you will not need more than one such array in most iOS synths. of course, 'upcoming note' array sizes could vary based on the app synth? sampler? sequencer? live input?
template <size_t NumNotes_>
class t_note_start_times {
public:
static const size_t NumNotes = NumNotes_;
typedef uint64_t t_timestamp;
/*...*/
const t_timestamp& timestampAt(const size_t& idx) const {
assert(this->d_numFutureNotes <= NumNotes);
assert(idx < NumNotes);
assert(idx < this->d_numFutureNotes);
return this->d_startTimes[idx];
}
private:
t_timestamp d_presentTime;
size_t d_numFutureNotes; // presumably, this will be the number of active notes,
// and values will be compacted to [0...d_numFutureNotes)
t_timestamp d_startTimes[NumNotes];
};
// in use
const size_t Polyphony = 16;
t_note_start_times<Polyphony> startTimes;
startTimes.addNoteAtTime(noteTimestamp); // defined in the '...' ;)
startTimes.timestampAt(0);
if you need a dynamically sized array which could be very large, then use a vector. if you need only one instance of this and the max polyphony is (say) 64, then just use this.
Related
Brief Preface
I recognize that there are many nuances and requirements for a standard-compatible allocator. There are a number of questions here covering a range of topics associated with allocators. I realize that the requirements set out by the standard are critical to ensuring that the allocator functions correctly in all cases, doesn't leak memory, doesn't cause undefined-behaviour, etc. This is particularly true where the allocator is meant to be used (or at least, can be used) in a wide range of use cases, with a variety of underlying types and different standard containers, object sizes, etc.
In contrast, I have a very specific use case where I personally strictly control all of the conditions associated with its use, as I describe in detail below. Consequently, I believe that what I've done is perfectly acceptable given the highly-specific nature of what I'm trying to implement.
I'm hoping someone with far more experience and understanding than me can either confirm that the description below is acceptable or point out the problems (and, ideally, how to fix them too).
Overview / Specific Requirements
In a nutshell, I'm trying to write an allocator that is to be used within my own code and for a single, specific purpose:
I need "a few" std::vector (probably uint16_t), with a fixed (at runtime) number of elements. I'm benchmarking to determine the best tradeoff of performance/space for the exact integer type[1]
As noted, the number of elements is always the same, but it depends on some runtime configuration data passed to the application
The number of vectors is also either fixed or at least bounded. The exact number is handled by a library providing an implementation of parallel::for(execution::par_unseq, ...)
The vectors are constructed by me (i.e. so I know with certainty that they will always be constructed with N elements)
[1] The value of the vectors are used to conditionally copy a float from one of 2 vectors to a target: c[i] = rand_vec[i] < threshold ? a[i] : b[i] where a, b, c are contiguous arrays of float, rand_vec is the std::vector I'm trying to figure out here, and threshold is a single variable of type integer_tbd. The code compiles as SSE SIMD operations. I do not remember the details of this, but I believe that this requires additional shifting instructions if the ints are smaller than the floats.
On this basis, I've written a very simple allocator, with a single static boost::lockfree::queue as the free-list. Given that I will construct the vectors myself and they will go out of scope when I'm finished with them, I know with certainty that all calls to alloc::deallocate(T*, size_t) will always return vectors of the same size, so I believe that I can simply push them back onto the queue without worrying about a pointer to a differently-sized allocation being pushed onto the free-list.
As noted in the code below, I've added in runtime tests for both the allocate and deallocate functions for now, while I've been confirming for myself that these situations cannot and will not occur. Again, I believe it is unquestionably safe to delete these runtime tests. Although some advice would be appreciated here too -- considering the surrounding code, I think they should be handled adequately by the branch predictor so they don't have a significant runtime cost (although without instrumenting, hard to say for 100% certain).
In a nutshell - as far as I can tell, everything here is completely within my control, completely deterministic in behaviour, and, thus, completely safe. This is also suggested when running the code under typical conditions -- there are no segfaults, etc. I haven't yet tried running with sanitizers yet -- I was hoping to get some feedback and guidance before doing so.
I should point out that my code runs 2x faster compared to using std::allocator which is at least qualitatively to be expected.
CR_Vector_Allocator.hpp
class CR_Vector_Allocator {
using T = CR_Range_t; // probably uint16_t or uint32_t, set elsewhere.
private:
using free_list_type = boost::lockfree::queue>;
static free_list_type free_list;
public:
T* allocate(size_t);
void deallocate(T* p, size_t) noexcept;
using value_type = T;
using pointer = T*;
using reference = T&;
template struct rebind { using other = CR_Vector_Allocator;};
};
CR_Vector_Allocator.cc
CR_Vector_Allocator::T* CR_Vector_Allocator::allocate(size_t n) {
if (n <= 1)
throw std::runtime_error("Unexpected number of elements to initialize: " +
std::to_string(n));
T* addr_;
if (free_list.pop(addr_)) return addr_;
addr_ = reinterpret_cast<T*>(std::malloc(n * sizeof(T)));
return addr_;
}
void CR_Vector_Allocator::deallocate(T* p, size_t n) noexcept {
if (n <= 1) // should never happen. but just in case, I don't want to leak
free(p);
else
free_list.push(p);
}
CR_Vector_Allocator::free_list_type CR_Vector_Allocator::free_list;
It is used in the following manner:
using CR_Vector_t = std::vector<uint16_t, CR_Vector_Allocator>;
CR_Vector_t Generate_CR_Vector(){
/* total_parameters is a member of the same class
as this member function and is defined elsewhere */
CR_Vector_t cr_vec (total_parameters);
std::uniform_int_distribution<uint16_t> dist_;
/* urng_ is a member variable of type std::mt19937_64 in the class */
std::generate(cr_vec.begin(), cr_vec.end(), [this, &dist_](){
return dist_(this->urng_);});
return cr_vec;
}
void Prepare_Next_Generation(...){
/*
...
*/
using hpx::parallel::execution::par_unseq;
hpx::parallel::for_loop_n(par_unseq, 0l, pop_size, [this](int64_t idx){
auto crossovers = Generate_CR_Vector();
auto new_parameters = Generate_New_Parameters(/* ... */, std::move(crossovers));
}
}
Any feedback, guidance or rebukes would be greatly appreciated.
Thank you!!
An external library gives me a raw pointer of doubles that I want to map to an Eigen type. The raw array is logically a big ordered collection of small dense fixed-size matrices, all of the same size. The main issue is that the small dense matrices may be in row-major or column-major ordering and I want to accommodate them both.
My current approach is as follows. Note that all the entries of a small fixed-size block (in the array of blocks) need to be contiguous in memory.
template<int bs, class Mattype>
void block_operation(double *const vals, const int numblocks)
{
Eigen::Map<Mattype> mappedvals(vals,
Mattype::IsRowMajor ? numblocks*bs : bs,
Mattype::IsRowMajor ? bs : numblocks*bs
);
for(int i = 0; i < numblocks; i++)
if(Mattype::isRowMajor)
mappedvals.template block<bs,bs>(i*bs,0) = block_operation_rowmajor(mappedvals);
else
mappedvals.template block<bs,bs>(0,i*bs) = block_operation_colmajor(mappedvals);
}
The calling function first figures out the Mattype (out of 2 options) and then calls the above function with the correct template parameter.
Thus all my algorithms need to be written twice and my code is interspersed with these layout checks. Is there a way to do this in a layout-agnostic way? Keep in mind that this code needs to be as fast as possible.
Ideally, I would Map the data just once and use it for all the operations needed. However, the only solution I could come up with was invoking the Map constructor once for every small block, whenever I need to access the block.
template<int bs, StorageOptions layout>
inline Map<Matrix<double,bs,bs,layout>> extractBlock(double *const vals,
const int bindex)
{
return Map<Matrix<double,bs,bs,layout>>(vals+bindex*bs*bs);
}
Would this function be optimized away to nothing (by a modern compiler like GCC 7.3 or Intel 2017 under -std=c++14 -O3), or would I be paying a small penalty every time I invoke this function (once for each block, and there are a LOT of small blocks)? Is there a better way to do this?
Your extractBlock is fine, a simpler but somewhat uglier solution is to use a reinterpret cast at the start of block_operation:
using BlockType = Matrix<double,bs,bs,layout|DontAlign>;
BlockType* blocks = reinterpret_cast<BlockType*>(vals);
for(int i...)
block[i] = ...;
This will work for fixed sizes matrices only. Also note the DontAlign which is important unless you can guaranty that vals is aligned on a 16 or even 32 bytes depending on the presence of AVX and bs.... so just use DontAlign!
I have to create a three-dimensional array using class A as element ,class A is defined like below, should I use vector<vector<vector<A> > > or boost::multi_array? Which one is better?
struct C
{
int C_1;
short C_2;
};
class B
{
public:
bool B_1;
vector<C> C_;
};
class A
{
public:
bool A_1;
B B_[6];
};
If you know the size of all three dimensions at the time, that you write your code, and if you don't need checking for array bounds, then just use traditional arrays:
const int N1 = ...
const int N2 = ...
const int N3 = ...
A a[N1][N2][N3]
If the array dimensions can onlybe determined at run time, but remain constant after program initialization, and if array usage is distributed uniformly, then boost::multi_array is your friend. However, if a lot of dynamic extension is going on at runtime, and/or if array sizes are not uniform (for example, you need A[0][0][0...99] but only A[2][3][0...3]), then the nested vector is likely the best solution. In the case of non-uniform sizes, put the dimension, whose size variies the most, as last dimension. Also, in the nested vector solution, it is generally a good idea to put small dimensions first.
The main concern that I would have about using vector<vector<vector<A> > > would be making sure that the second- and third-level vectors all have the same length like they would in a traditional 3D array, since there would be nothing in the data type to enforce that. I'm not terribly familiar with boost::multi_array, but it looks like this isn't an issue there - you can resize() the whole array, but unless I'm mistaken you can't accidentally remove an item from the third row and leave it a different size than all of the other rows (for example).
So assuming concerns like file size and compile time aren't much of an issue, I would think you'd want boost::multi_array. If those things are an issue, you might want to consider using a plain-old 3D array, since that should beat either of the other two options hands-down in those areas.
I was wondering whether (apart from the obvious syntax differences) there would be any efficiency difference between having a class containing multiple instances of an object (of the same type) or a fixed size array of objects of that type.
In code:
struct A {
double x;
double y;
double z;
};
struct B {
double xvec[3];
};
In reality I would be using boost::arrays which are a better C++ alternative to C-style arrays.
I am mainly concerned with construction/destruction and reading/writing such doubles, because these classes will often be constructed just to invoke one of their member functions once.
Thank you for your help/suggestions.
Typically the representation of those two structs would be exactly the same. It is, however, possible to have poor performance if you pick the wrong one for your use case.
For example, if you need to access each element in a loop, with an array you could do:
for (int i = 0; i < 3; i++)
dosomething(xvec[i]);
However, without an array, you'd either need to duplicate code:
dosomething(x);
dosomething(y);
dosomething(z);
This means code duplication - which can go either way. On the one hand there's less loop code; on the other hand very tight loops can be quite fast on modern processors, and code duplication can blow away the I-cache.
The other option is a switch:
for (int i = 0; i < 3; i++) {
int *r;
switch(i) {
case 0: r = &x; break;
case 1: r = &y; break;
case 1: r = &z; break;
}
dosomething(*r); // assume this is some big inlined code
}
This avoids the possibly-large i-cache footprint, but has a huge negative performance impact. Don't do this.
On the other hand, it is, in principle, possible for array accesses to be slower, if your compiler isn't very smart:
xvec[0] = xvec[1] + 1;
dosomething(xvec[1]);
Since xvec[0] and xvec[1] are distinct, in principle, the compiler ought to be able to keep the value of xvec[1] in a register, so it doesn't have to reload the value at the next line. However, it's possible some compilers might not be smart enough to notice that xvec[0] and xvec[1] don't alias. In this case, using seperate fields might be a very tiny bit faster.
In short, it's not about one or the other being fast in all cases. It's about matching the representation to how you use it.
Personally, I would suggest going with whatever makes the code working on xvec most natural. It's not worth spending a lot of human time worrying about something that, at best, will probably only produce such a small performance difference that you'll only catch it in micro-benchmarks.
MVC++ 2010 generated exactly the same code for reading/writing from two POD structs like in your example. Since the offsets to read/write to are computable at compile time, this is not surprising. Same goes for construction and destruction.
As for the actual performance, the general rule applies: profile it if it matters, if it doesn't - why care?
Indexing into an array member is perhaps a bit more work for the user of your struct, but then again, he can more easily iterate over the elements.
In case you can't decide and want to keep your options open, you can use an anonymous union:
struct Foo
{
union
{
struct
{
double x;
double y;
double z;
} xyz;
double arr[3];
};
};
int main()
{
Foo a;
a.xyz.x = 42;
std::cout << a.arr[0] << std::endl;
}
Some compilers also support anonymous structs, in that case you can leave the xyz part out.
It depends. For instance, the example you gave is a classic one in favor of 'old-school' arrays: a math point/vector (or matrix)
has a fixed number of elements
the data itself is usually kept
private in an object
since (if?) it has a class as an
interface, you can properly
initialize them in the constructor
(otherwise, classic array
inialization is something I don't
really like, syntax-wise)
In such cases (going with the math vector/matrix examples), I always ended up using C-style arrays internally, as you can loop over them instead of writing copy/pasted code for each component.
But this is a special case -- for me, in C++ nowadays arrays == STL vector, it's fast and I don't have to worry about nuthin' :)
The difference can be in storing the variables in memory. In the first example compiler can add padding to align the data. But in your paticular case it doesn't matter.
raw arrays offer better cache locality than c++ arrays, as presented however, the array example's only advantage over the multiple objects is the ability to iterate over the elements.
The real answer is of course, create a test case and measure.
I have an array of constant data like following:
enum Language {GERMAN=LANG_DE, ENGLISH=LANG_EN, ...};
struct LanguageName {
ELanguage language;
const char *name;
};
const Language[] languages = {
GERMAN, "German",
ENGLISH, "English",
.
.
.
};
When I have a function which accesses the array and find the entry based on the Language enum parameter. Should I write a loop to find the specific entry in the array or are there better ways to do this.
I know I could add the LanguageName-objects to an std::map but wouldn't this be overkill for such a simple problem? I do not have an object to store the std::map so the map would be constructed for every call of the function.
What way would you recommend?
Is it better to encapsulate this compile time constant array in a class which handles the lookup?
If the enum values are contiguous starting from 0, use an array with the enum as index.
If not, this is what I usually do:
const char* find_language(Language lang)
{
typedef std::map<Language,const char*> lang_map_type;
typedef lang_map_type::value_type lang_map_entry_type;
static const lang_map_entry_type lang_map_entries[] = { /*...*/ }
static const lang_map_type lang_map( lang_map_entries
, lang_map_entries + sizeof(lang_map_entries)
/ sizeof(lang_map_entries[0]) );
lang_map_type::const_iterator it = lang_map.find(lang);
if( it == lang_map.end() ) return NULL;
return it->second;
}
If you consider a map for constants, always also consider using a vector.
Function-local statics are a nice way to get rid of a good part of the dependency problems of globals, but are dangerous in a multi-threaded environment. If you're worried about that, you might rather want to use globals:
typedef std::map<Language,const char*> lang_map_type;
typedef lang_map_type::value_type lang_map_entry_type;
const lang_map_entry_type lang_map_entries[] = { /*...*/ }
const lang_map_type lang_map( lang_map_entries
, lang_map_entries + sizeof(lang_map_entries)
/ sizeof(lang_map_entries[0]) );
const char* find_language(Language lang)
{
lang_map_type::const_iterator it = lang_map.find(lang);
if( it == lang_map.end() ) return NULL;
return it->second;
}
There are three basic approaches that I'd choose from. One is the switch statement, and it is a very good option under certain conditions. Remember - the compiler is probably going to compile that into an efficient table-lookup for you, though it will be looking up pointers to the case code blocks rather than data values.
Options two and three involve static arrays of the type you are using. Option two is a simple linear search - which you are (I think) already doing - very appropriate if the number of items is small.
Option three is a binary search. Static arrays can be used with standard library algorithms - just use the first and first+count pointers in the same way that you'd use begin and end iterators. You will need to ensure the data is sorted (using std::sort or std::stable_sort), and use std::lower_bound to do the binary search.
The complication in this case is that you'll need a comparison function object which acts like operator< with a stored or referenced value, but which only looks at the key field of your struct. The following is a rough template...
class cMyComparison
{
private:
const fieldtype& m_Value; // Note - only storing a reference
public:
cMyComparison (const fieldtype& p_Value) : m_Value (p_Value) {}
bool operator() (const structtype& p_Struct) const
{
return (p_Struct.field < m_Value);
// Warning : I have a habit of getting this comparison backwards,
// and I haven't double-checked this
}
};
This kind of thing should get simpler in the next C++ standard revision, when IIRC we'll get anonymous functions (lambdas) and closures.
If you can't put the sort in your apps initialisation, you might need an already-sorted boolean static variable to ensure you only sort once.
Note - this is for information only - in your case, I think you should either stick with linear search or use a switch statement. The binary search is probably only a good idea when...
There are a lot of data items to search
Searches are done very frequently (many times per second)
The key enumerate values are sparse (lots of big gaps) - otherwise, switch is better.
If the coding effort were trivial, it wouldn't be a big deal, but C++ currently makes this a bit harder than it should be.
One minor note - it may be a good idea to define an enumerate for the size of your array, and to ensure that your static array declaration uses that enumerate. That way, your compiler should complain if you modify the table (add/remove items) and forget to update the size enum, so your searches should never miss items or go out of bounds.
I think you have two questions here:
What is the best way to store a constant global variable (with possible Multi-Threaded access) ?
How to store your data (which container use) ?
The solution described by sbi is elegant, but you should be aware of 2 potential problems:
In case of Multi-Threaded access, the initialization could be skrewed.
You will potentially attempt to access this variable after its destruction.
Both issues on the lifetime of static objects are being covered in another thread.
Let's begin with the constant global variable storage issue.
The solution proposed by sbi is therefore adequate if you are not concerned by 1. or 2., on any other case I would recommend the use of a Singleton, such as the ones provided by Loki. Read the associated documentation to understand the various policies on lifetime, it is very valuable.
I think that the use of an array + a map seems wasteful and it hurts my eyes to read this. I personally prefer a slightly more elegant (imho) solution.
const char* find_language(Language lang)
{
typedef std::map<Language, const char*> map_type;
typedef lang_map_type::value_type value_type;
// I'll let you work out how 'my_stl_builder' works,
// it makes for an interesting exercise and it's easy enough
// Note that even if this is slightly slower (?), it is only executed ONCE!
static const map_type = my_stl_builder<map_type>()
<< value_type(GERMAN, "German")
<< value_type(ENGLISH, "English")
<< value_type(DUTCH, "Dutch")
....
;
map_type::const_iterator it = lang_map.find(lang);
if( it == lang_map.end() ) return NULL;
return it->second;
}
And now on to the container type issue.
If you are concerned about performance, then you should be aware that for small data collection, a vector of pairs is normally more efficient in look ups than a map. Once again I would turn toward Loki (and its AssocVector), but really I don't think that you should worry about performance.
I tend to choose my container depending on the interface I am likely to need first and here the map interface is really what you want.
Also: why do you use 'const char*' rather than a 'std::string'?
I have seen too many people using a 'const char*' like a std::string (like in forgetting that you have to use strcmp) to be bothered by the alleged loss of memory / performance...
It depends on the purpose of the array. If you plan on showing the values in a list (for a user selection, perhaps) the array would be the most efficient way of storing them. If you plan on frequently looking up values by their enum key, you should look into a more efficient data structure like a map.
There is no need to write a loop. You can use the enum value as index for the array.
I would make an enum with sequential language codes
enum { GERMAN=0, ENGLISH, SWAHILI, ENOUGH };
The put them all into array
const char *langnames[] = {
"German", "English", "Swahili"
};
Then I would check if sizeof(langnames)==sizeof(*langnames)*ENOUGH in debug build.
And pray that I have no duplicates or swapped languages ;-)
If you want fast and simple solution , Can try like this
enum ELanguage {GERMAN=0, ENGLISH=1};
static const string Ger="GERMAN";
static const string Eng="ENGLISH";
bool getLanguage(const ELanguage& aIndex,string & arName)
{
switch(aIndex)
{
case GERMAN:
{
arName=Ger;
return true;
}
case ENGLISH:
{
arName=Eng;
}
default:
{
// Log Error
return false;
}
}
}