Zero overhead subscript operator for a set of values - c++

Assume we have a function with the following signature (the signature may not be changed, since this function is part of a legacy API):
void Foo(const std::string& s, float v0, float v1, float v2)
{ ... }
How can one access the last three arguments by index using the subscript operator [] without actually copying the data into some sort of container?
Regularly when I come across this kind of issue I put the values in a container, like const std::array<float,3> args{v0,v1,v2}; and access these values using args[0], which unfortunately needs to copy the values.
Another idea would be to access the arguments using a parameter pack, which in turn involves the creation of a templated function which seems to be overkill for this task.
I'm aware that the version using the std::array<> might be suitable since the compiler probably will optimize this kind of stuff, however, this question is kind of academically motivated.

You can't. Not in a way that guarantees zero overhead, or overhead similar to that of array subscripting.
You could, of course, do something like float* vs[]{&v0, &v1, &v2};, and then dereference the result of vs[i]. For that matter, you could make a utility class to act as a transparent reference (to try to get around arrays of references being illegal), though the result is inevitably limited.
The ultimate problem, though, is that nothing in the standard guarantees (or even suggests) that function arguments be stored in any particular memory ordering. On most platforms, at least one of those floats is going to be in a register, meaning that there's just no way to natively subscript it.
If a group of objects does not start out as an array, it's not possible to treat them as an array.

Another idea would be to access the arguments using a parameter pack, which in turn involves the creation of a templated function which seems to be overkill for this task.
Not necessarily. One thing you can do is use std::tie to build a std::tuple of references to the function parameters and then access that tuple via std::get. That should optimize out, but let you refer to the parameters as if they are part of a single collection. That would look like
void Foo(const std::string& s, float v0, float v1, float v2)
{
auto args = std::tie(v0, v1, v2);
std::cout << std::get<1>(args);
}
It's not using operator [], and requires your indices be compile time constants, but you can now pass them to something else as one object.

Danger Wil Robinson! Danger!
This is going to be horribly implementation dependent, and an all around bad idea! This relies on undefined behavior. Less awful with set hardware and tools, but less as in "we're only going to eat 5 babies, not a full dozen".
Those three floats are on the stack next to each other. I don't know if there are any packing rules for the stack. I don't know which order on the the stack they'll be ("v0 v1 v2" vs "v2 v1 v0"). Hell, some optimized build might even put them in a different order just to optimize some oddball case that doesn't actually come up in real life. I dunno. But I suspect something like this will work.
void Foo(const std::string& s, float v0, float v1, float v2)
{
float* vp = &v2;
for (int i = 0; i < 3; ++i)
{
printf("%f\n", vp[i]);
}
}
void main(void)
{
Foo("", 1.0f, 2.0f, 3.0f);
}
3.0000
2.0000
1.0000
So it is possible. It's also ugly, vile, evil, and probably both fattening and carcinogenic.
On GodBolt.org, using gcc x86-64 9.3, the above code worked fine. In VS2017 intel/64, I had to use float* vp = &v0 and for (int i = 0; i < 5; i += 2). Different alignment, different order, and different output (1, 2, 3, not 3, 2 1).
I'm pretty sure I just consigned my soul to the Nth circle of hell.

Related

I can't get the right output that I want and the answer changes every time

So I am trying to code for this question:
Yes, I have to use arrays since it is a requirement.
Consider the problem of adding two n-bit binary integers, stored in two n-element arrays A and B. The sum of the two integers should be stored in binary form in an (n+1) element array C . State the problem formally and write pseudocode for adding the two integers.
I know that the ans array contains the correct output at the end of the addd function. However, I am not able to output that answer.
Below is my code. Please help me figure where in the code I'm going wrong, and what I can do to change it so it works. I will be very grateful.
#include <iostream>
using namespace std;
int * addd(int a[], int n1, int b[], int n2)
{
int s;
if(n1<n2) {s=n2+1;}
else {s=n1+1;}
int ans[s];
int i=n1-1, j=n2-1, k=s-1;
int carry=0;
while(i>=0 && j>=0 && k>0)
{
ans[k]=(a[i]+b[j]+carry)%2;
//cout<<k<<" "<<ans[k]<<endl;
carry=(a[i]+b[j]+carry)/2;
i--; j--; k--;
}
//cout<<"Carry "<<carry<<endl;
ans[0]=carry;
return ans;
}
int main(int argc, const char * argv[]) {
// insert code here...
int a[]={0,0,0,1,1,1};
int n1=sizeof(a)/sizeof(a[0]);
int b[]={1,0,1,1,0,1};
int n2=sizeof(b)/sizeof(b[0]);
int *p=addd(a,6,b,6);
// cout<<p[1]<<endl;
// cout<<p[0]<<" "<<p[1]<<" "<<p[2]<<" "<<p[3]<<" "<<p[4]<<" "<<p[5]<<" "<<p[6]<<endl;
return 0;
}
using namespace std;
Don't write using namespace std;. I have a summary I paste in from a file of common issues when I'm active in the Code Review Stack Exchange, but I don't have that here. Instead, you should just declare the symbols you need, like using std::cout;
int * addd(int a[], int n1, int b[], int n2)
The parameters of the form int a[] are very odd. This comes from C and is actually transformed into int* a and is not passing the array per-se.
The inputs should be const.
The names are not clear, but I'm guessing that n1 is the size of the array? In the Standard Guidelines, you'll see that passing a pointer plus length is strongly discouraged. The Standard Guidelines Library supplies a simple span type to use for this instead.
And the length should be size_t not int.
Based on the description, I think each element is only one bit, right? So why are the arrays of type int? I'd use bool or perhaps int8_t as being easier to work with.
What are you returning? If a and b and their lengths are the input, where is the output that you are returning a pointer to the beginning of? This is not giving value semantics, as you are returning a pointer to something that must exist elsewhere so what is its lifetime?
int s;
int ans[s];
return ans;
Well, there's your problem. First of all, declaring an array of a size that's not a constant is not even legal. (This is a gnu extension that implements C's VLA feature but not without issues as it breaks the C++ type system)
Regardless of that, you are returning a pointer to the first element of the local array, so what happens to the memory when the function returns? Boom.
int s;
No. Initialize values when they are created.
if(n1<n2) {s=n2+1;}
else {s=n1+1;}
Learn the library.
How about:
const size_t s = 1+std::max(n1,n2);
and then the portable way to get your memory is:
std::vector<int> ans(s);
Your main logic will not work if one array is shorter than the other. The shorter input should behave as if it had leading zeros to match. Consider abstracting the problem of "getting the next bit" so you don't duplicate the code for handling each input and make an unreadable mess. You really should have learned to use collections and iterators first.
now:
return ans;
would work as intended since it is a value. You just need to declare the function to be the right type. So just use auto for the return type and it knows.
int n1=sizeof(a)/sizeof(a[0]);
Noooooooo.
There is a standard function to give the size of a built-in primitive array. But really, this should be done automatically as part of the passing, not as a separate thing, as noted earlier.
int *p=addd(a,6,b,6);
You wrote 6 instead of n1 etc.
Anyway, with the previous edits, it becomes:
using std::size;
const auto p = addd (a, size(a), b, size(b));
Finally, concerning:
cout<<p[0]<<" "<<p[1]<<" "<<p[2]<<" "<<p[3]<<" "<<p[4]<<" "<<p[5]<<" "<<p[6]<<endl;
How about using loops?
for (auto val : p) cout << val;
cout << '\n';
oh, don't use endl. It's not needed for cout which auto-flushes anyway, and it's slow. Modern best practice is to use '\n' and then flush explicitly if/when needed (like, never).
Let's look at:
int ans[s];
Apart that this is not even part of the standard and probably the compiler is giving you some warnings (see link), that command allocate temporary memory in the stack which gets deallocated on function exit: that's why you are getting every time different results, you are reading garbage, i.e. memory that in the meantime might have been overwritten.
You can replace it for example with
int* ans = new int[s];
Don't forget though to deallocate the memory when you have finished using the buffer (outside the function), to avoid memory leakage.
Some other notes:
int s;
if(n1<n2) {s=n2+1;}
else {s=n1+1;}
This can be more elegantly written as:
const int s = (n1 < n2) ? n2 + 1 : n1 + 1;
Also, the actual computation code is imprecise as it leads to wrong results if n1 is not equal to n2: You need further code to finish processing the remaining bits of the longest array. By the way you don't need to check on k > 0 because of the way you have defined s.
The following should work:
int i=n1-1, j=n2-1, k=s-1;
int carry=0;
while(i>=0 && j>=0)
{
ans[k]=(a[i]+b[j]+carry)%2;
carry=(a[i]+b[j]+carry)/2;
i--; j--; k--;
}
while(i>=0) {
ans[k]=(a[i]+carry)%2;
carry=(a[i]+carry)/2;
i--; k--;
}
while(j>=0) {
ans[k]=(b[j]+carry)%2;
carry=(b[j]+carry)/2;
j--; k--;
}
ans[0]=carry;
return ans;
}
If You Must Only Use C Arrays
Returning ans is returning the pointer to a local variable. The object the pointer refers to is no longer valid after then function has returned, so trying to read it would lead to undefined behavior.
One way to fix this is to pass in the address to an array to hold your answer, and populate that, instead of using a VLA (which is a non-standard C++ extension).
A VLA (variable length array) is an array which takes its size from a run-time computed value. In your case:
int s;
//... code that initializes s
int ans[s];
ans is a VLA because you are not using a constant to determine the array size. However, that is not a standard feature of the C++ language (it is an optional one in the C language).
You can modify your function so that ans is actually provided by the caller.
int * addd(int a[], int n1, int b[], int n2, int ans[])
{
//...
And then the caller would be responsible for passing in a large enough array to hold the answer.
Your function also appears to be incomplete.
while(i>=0 && j>=0 && k>0)
{
ans[k]=(a[i]+b[j]+carry)%2;
//cout<<k<<" "<<ans[k]<<endl;
carry=(a[i]+b[j]+carry)/2;
i--; j--; k--;
}
If one array is shorter than the other, then the index for the shorter array will reach 0 first. Then, when that corresponding index goes negative, the loop will stop, without handling the remaining terms in the longer array. This essentially makes the corresponding entries in ans be uninitialized. Reading those values results in undefined behavior.
To address this, you should populate the remaining entries in ans with the correct calculation based on carry and the remaining entries in the longer array.
A More C++ Approach
The original answer above was provided assuming you were constrained to only using C style arrays for both input and output, and that you wanted an answer that would allow you to stay close to your original implementation.
Below is a more C++ oriented solution, assuming you still need to provide C arrays as input, but otherwise no other constraint.
C Array Wrapper
A C array does not provide the amenities that you may be accustomed to have when using C++ containers. To gain some of these nice to have features, you can write an adapter that allows a C array to behave like a C++ container.
template <typename T, std::size_t N>
struct c_array_ref {
typedef T ARR_TYPE[N];
ARR_TYPE &arr_;
typedef T * iterator;
typedef std::reverse_iterator<T *> reverse_iterator;
c_array_ref (T (&arr)[N]) : arr_(arr) {}
std::size_t size () { return N; }
T & operator [] (int i) { return arr_[i]; }
operator ARR_TYPE & () { return arr_; }
iterator begin () { return &arr_[0]; }
iterator end () { return begin() + N; }
reverse_iterator rbegin () { return reverse_iterator(end()); }
reverse_iterator rend () { return reverse_iterator(begin()); }
};
Use C Array References
Instead of passing in two arguments as information about the array, you can pass in the array by reference, and use template argument deduction to deduce the array size.
Return a std::array
Although you cannot return a local C array like you attempted in your question, you can return an array that is wrapped inside a struct or class. That is precisely what the convenience container std::array provides. When you use C array references and template argument deduction to obtain the array size, you can now compute at compile time the proper array size that std::array should have for the return value.
template <std::size_t N1, std::size_t N2>
std::array<int, ((N1 < N2) ? N2 : N1) + 1>
addd(int (&a)[N1], int (&b)[N2])
{
Normalize the Input
It is much easier to solve the problem if you assume the arguments have been arranged in a particular order. If you always want the second argument to be the larger array, you can do that with a simple recursive call. This is perfectly safe, since we know the recursion will happen at most once.
if (N2 < N1) return addd(b, a);
Use C++ Containers (or Look-Alike Adapters)
We can now convert our arguments to the adapter shown earlier, and also create a std::array to hold the output.
c_array_ref<int, N1> aa(a);
c_array_ref<int, N2> bb(b);
std::array<int, std::max(N1, N2)+1> ans;
Leverage Existing Algorithms if Possible
In order to deal with the short comings of your original program, you can adjust your implementation a bit in an attempt to remove special cases. One way to do that is to store the result of adding the longer array to 0 and storing it into the output. However, this can mostly be accomplished with a simple call to std::copy.
ans[0] = 0;
std::copy(bb.begin(), bb.end(), ans.begin() + 1);
Since we know the input consists of only 1s and 0s, we can compute straight addition from the shorter array into the longer array, without concern for carry (that will be addressed in the next step). To compute this addition, we apply std::transform with a lambda expression.
std::transform(aa.rbegin(), aa.rend(), ans.rbegin(),
ans.rbegin(),
[](int a, int b) -> int { return a + b; });
Lastly, we can make a pass over the output array to fix up the carry computation. After doing so, we are ready to return the result. The return is possible because we are using std::array to represent the answer.
for (auto i = ans.rbegin(); i != ans.rend()-1; ++i) {
*(i+1) += *i / 2;
*i %= 2;
}
return ans;
}
A Simpler main Function
We now only need to pass in the two arrays to the addd function, since template type deduction will discover the sizes of the arrays. In addition, the output generator can be handled more easily with an ostream_iterator.
int main(int, const char * []) {
int a[]={1,0,0,0,1,1,1};
int b[]={1,0,1,1,0,1};
auto p=addd(a,b);
std::copy(p.begin(), p.end(),
std::ostream_iterator<int>(std::cout, " "));
return 0;
}
Try it online!
If I may editorialize a bit... I think this is a deceptively difficult question for beginners, and as-stated should flag problems in the design review long before any attempt at coding. It's telling you to do things that are not good/typical/idiomatic/proper in C++, and distracting you with issues that get in the way of the actual logic to be developed.
Consider the core algorithm you wrote (and Antonio corrected): that can be understood and discussed without worrying about just how A and B are actually passed in for this code to use, or exactly what kind of collection it is. If they were std::vector, std::array, or primitive C array, the usage would be identical. Likewise, how does one return the result out of the code? You populate ans here, and how it is gotten into and/or out of the code and back to main is not relevant.
Primitive C arrays are not first-class objects in C++ and there are special rules (inherited from C) on how they are passed as arguments.
Returning is even worse, and returning dynamic-sized things was a major headache in C and memory management like this is a major source of bugs and security flaws. What we want is value semantics.
Second, using arrays and subscripts is not idiomatic in C++. You use iterators and abstract over the exact nature of the collection. If you were interested in writing super-efficent back-end code that doesn't itself deal with memory management (it's called by other code that deals with the actual collections involved) it would look like std::merge which is a venerable function that dates back to the early 90's.
template< class InputIt1, class InputIt2, class OutputIt >
OutputIt merge( InputIt1 first1, InputIt1 last1,
InputIt2 first2, InputIt2 last2,
OutputIt d_first );
You can find others with similar signatures, that take two different ranges for input and outputs to a third area. If you write addp exactly like this, you could call it with primitive C arrays of hardcoded size:
int8_t A[] {0,0,0,1,1,1};
int8_t B[] {1,0,1,1,0,1};
int8_t C[ ??? ];
using std::begin; std::end;
addp (begin(A),end(A), begin(B), end(B), begin(C));
Note that it's up to the caller to have prepared an output area large enough, and there's no error checking.
However, the same code can be used with vectors, or even any combination of different container types. This could populate a std::vector as the result by passing an insertion iterator. But in this particular algorithm that's difficult since you're computing it in reverse order.
std::array
Improving upon the situation with primitive C arrays, you could use the std::array class which is exactly the same array but without the strange passing/returning rules. It's actually just a primitive C array inside a wrapping struct. See this documentation: https://en.cppreference.com/w/cpp/container/array
So you could write it as:
using BBBNum1 = std::array<int8_t, 6>
BBBNum1 addp (const BBBNum1& A, const BBBNum1& B) { ... }
The code inside can use A[i] etc. in the same way you are, but it also can get the size via A.size(). The issue here is that the inputs are the same length, and the output is the same as well (not 1 larger). Using templates, it could be written to make the lengths flexible but still only specified at compile time.
std::vector
The vector is like an array but with a run-time length. It's dynamic, and the go-to collection you should reach for in C++.
using BBBNum2 = std::vector<int8_t>
BBBNum2 addp (const BBBNum2& A, const BBBNum2& B) { ... }
Again, the code inside this function can refer to B[j] etc. and use B.size() exactly the same as with the array collection. But now, the size is a run-time property, and can be different for each one.
You would create your result, as in my first post, by giving the size as a constructor argument, and then you can return the vector by-value. Note that the compiler will do this efficiently and not actually have to copy anything if you write:
auto C = addp (A, B);
now for the real work
OK, now that this distraction is at least out of the way, you can worry about actually writing the implementation. I hope you are convinced that using vector instead of a C primitive array does not affect your problem logic or even the (available) syntax of using subscripts. Especially since the problem referred to psudocode, I interpret its use of "array" as "suitable indexable collection" and not specifically the primitive C array type.
The issue of going through 2 sequences together and dealing with differing lengths is actually a general purpose idea. In C++20, the Range library has things that make quick work of this. Older 3rd party libraries exist as well, and you might find it called zip or something like that.
But, let's look at writing it from scratch.
You want to read an item at a time from two inputs, but neatly make it look like they're the same length. You don't want to write the same code three times, or elaborate on the cases where A is shorter or where B may be shorter... just abstract out the idea that they are read together, and if one runs out it provides zeros.
This is its own piece of code that can be applied twice, to A and to B.
class backwards_bit_reader {
const BBBnum2& x;
size_t index;
public:
backwards_bit_reader(const BBBnum2& x) : x{x}, index{x.size()} {}
bool done() const { return index == 0; }
int8_t next()
{
if (done()) return 0; // keep reading infinite leading zeros
--index;
return x[index];
}
};
Now you can write something like:
backwards_bit_reader A_in { A };
backwards_bit_reader B_in { B };
while (!A_in.done() && !B_in.done()) {
const a = A_in.next();
const b = B_in.next();
const c = a+b+carry;
carry = c/2; // update
C[--k]= c%2;
}
C[0]= carry; // the final bit, one longer than the input
It can be written far more compactly, but this is clear.
another approach
The problem is, is writing backwards_bit_reader beyond what you've learned thus far? How else might you apply the same logic to both A and B without duplicating the statements?
You should be learning to recognize what's sometimes called "code smell". Repeating the same block of code multiple times, and repeating the same steps with nothing changed but which variable it's applying to, should be seen as ugly and unacceptable.
You can at least cut back the cases by ensuring that B is always the longer one, if they are of different length. Do this by swapping A and B if that's not the case, as a preliminary step. (Actually implementing that well is another digression)
But the logic is still nearly duplicated, since you have to deal with the possibility of the carry propagating all the way to the end. Just now you have 2 copies instead of 3.
Extending the shorter one, at least in façade, is the only way to write one loop.
how realistic is this problem?
It's simplified to the point of being silly, but if it's not done in base 2 but with larger values, this is actually implementing multi-precision arithmetic, which is a real thing people want to do. That's why I named the type above BBBNum for "Bad Binary Bignum".
Getting down to an actual range of memory and wanting the code to be fast and optimized is also something you want to do sometimes. The BigNum is one example; you often see this with string processing. But we'll want to make an efficient back-end that operates on memory without knowing how it was allocated, and higher-level wrappers that call it.
For example:
void addp (const int8_t* a_begin, const int8_t* a_end,
const int8_t* b_begin, const int8_t* b_end,
int8_t* result_begin, int8_t* result_end);
will use the provided range for output, not knowing or caring how it was allocated, and taking input that's any contiguous range without caring what type of container is used to manage it as long as it's contiguous. Note that as you saw with the std::merge example, it's more idiomatic to pass begin and end rather than begin and size.
But then you have helper functions like:
BBBNum2 addp (const BBBNum2& A, const BBBNum2& B)
{
BBBNum result (1+std::max(A.size(),B.size());
addp (A.data(), A.data()+A.size(), B.data(), B.data()+B.size(), C.data(), C.data()+C.size());
}
Now the casual user can call it using vectors and a dynamically-created result, but it's still available to call for arrays, pre-allocated result buffers, etc.

std::vector<float> to std::vector<glm::vecX> without copying

I am writing a small toy game engine using Tinyobjloader for loading .obj files. I store the vertex data and everything using glm::vecX to make things easier.
Tinyobjloader gives me an std::vector<float>, when I want an std::vector<glm::vecX>. How would I do this without copying?
To be clear, a glm::vecX is a simple struct containing, for example, the float members x, y, z.
I was thinking that since structs can behave a bit like arrays, that std::move would work, but no luck.
Thanks!
Edit:
I know I wasn't clear about this, sorry. I would like to either move the std::vector<float> into an std::vector<glm::vecX> or pass it as a std::vector<glm::vecX>&.
Copying the data using std::memcpy works fine, but it copies the data, which I would like to avoid.
It may be possible to directly interpret the contents of the vector as instances of the struct, without having to copy the data. If you can guarantee the representation is compatible, that is. The contents of a vector<float> are laid out in memory as a sequence of float values directly following each other (an array) with no extra padding, while the contents of a vector<glm::vecX> are laid out as a sequence of vecX. Thus, you need to ensure the following conditions hold:
That glm::vecX is exactly the size of X floats, with no padding. Depending on the declaration of the struct, this may be platform-dependant.
That the contents of the vector<float> are in the correct sequence, i.e. as [x1,y1,z1, x2,y2,z2, ...] for a vec3 instead of [x1,x2,...,xN,y1,y2...].
In that case, you can safely reinterpret the data pointer of the float vector as pointer to an array of vecX as in this example:
std::vector<float> myObjData = ...;
auto nVecs = myObjData.size() / 3; // You should check that there are no remainders!
glm::vec3* vecs = reinterpret_cast<glm::vec3*>(myObjData.data());
std::cout << vecs[0]; // Use vecs[0..nVecs-1]
You cannot, however, safely reinterpret the vector itself as a vector of glm::vecX, not even as a const vector, because the number of elements stored in the vector might not be consistent after the reinterpretation. It depends on whether the vector<T> code stores the number of elements directly, or the number of allocated bytes (and then size() divides that by sizeof(T)):
// Don't do this, the result of .size() and .end() may be wrong!
const std::vector<glm::vec3>& bad = *reinterpret_cast<std::vector<glm::vec3>*>(&myObjData);
bad[bad.size()-1].z = 0; // Potential BOOM!
Most of the time, however, you don't need to pass an actual vector, since most functions in the standard library accept a container range, which is easy to give for arrays like the one in the first example. So, if you wanted to sort your vec3 array based on z position, and then print it out you would do:
// nVecs and vecs from the first example
std::sort(vecs, vecs+nVecs, // Sort by Z position
[](const glm::vec3& a, const glm::vec3& b) { return a.z < b.z; });
std::copy(vecs, vecs+nVecs, std::ostream_iterator<glm::vec3>(std::cout, "\n"));
In short: It is - to the best of my knowledge - not possible without copying.
And in my opinion, std::memcpy has no business being used with std::vector.

How to calculate the standard deviation with iterators and lambda functions

After learning that one can calculate the mean of data, which is stored in a std::vector< std::vector<double> > data, can be done the following way:
void calculate_mean(std::vector<std::vector<double>>::iterator dataBegin,
std::vector<std::vector<double>>::iterator dataEnd,
std::vector<double>& rowmeans) {
auto Mean = [](std::vector<double> const& vec) {
return std::accumulate(vec.begin(), vec.end(), 0.0) / vec.size(); };
std::transform(dataBegin, dataEnd, rowmeans.begin(), Mean);
}
I made a function which takes the begin and the end of the iterator of the data vector to calculate the mean and std::vector<double> is where I store the result.
My first question is, how to handle the return value of function, when working with vectors. I mean in this case I make an Alias and modify in this way the vector I initialized before calling this function, so there is no copying back which is nice. So is this good programming practice?
Second my main questions is, how to adapt this function so one can calculate the standard deviation of each row in a similar way. I tried really hard but it only gives a huge mess, where nothing is working properly. So if someone sees it right away how to do that, I would be glad, for a insight. Thank you.
Edit: Solution
So here is my solution for the problem. Given a std::vector< vector<double> > data (rows, std::vector<double>(columns)), where the data is stored in the rows. The following function calculates the sample standard deviation of each row simultaneously.
auto begin = data.begin();
auto end = data.end();
std::vector<double> std;
std.resize(data.size());
void calculate_std(std::vector<std::vector<double>>::iterator dataBegin,
std::vector<std::vector<double>>::iterator dataEnd,
std::vector<double>& rowstds){
auto test = [](std::vector<double> const& vec) {
double sum = std::accumulate(vec.begin(), vec.end(), 0.0);
double mean = sum / vec.size();
double stdSum = 0.0;
auto Std = [&](const double x) { stdSum += (x - mean) * (x - mean); };
std::for_each(vec.begin(), vec.end(), Std);
return sqrt(stdSum / (vec.size() - 1));
};
std::transform(dataBegin, dataEnd, rowstds.begin(), test);
}
I tested it and it works just fine. So if anyone has some suggestions for improvement, please let me know. And is this piece of code good performance wise?
You will find relatively often the convention to write functions with input parameters first, followed by input / output parameters.
Output parameters (that you write to with the return values of your function) are often a pointer to the data, or a reference.
So your solution seems perfect, from that point of view.
Source:
Google's C++ coding conventions
I mean in this case I make an Alias and modify in this way the vector I initialized before calling this function, so there is no copying back which is nice. So is this good programming practice?
No, you should use a local vector<double> variable and return by value. Any compiler worth using would optimize away the copying/moving, and any conforming C++11 compiler is required to perform a move if for whatever reason it cannot elide the copy/move altogether.
Your code as written imposes additional requirements on the caller that are not obvious. For instance, rowmeans must contain enough elements to store the means, or undefined behavior results.

What is the better Matrix4x4 class design c++ newbie

What would be better to use as a way to store matrix values?
float m1,m2,m3 ... ,m16
or
float[4][4].
I first tried float[16] but when im debugging and testing VS wont show what is inside of the array :( could implement a cout and try to read answer from a console test application.
Then i tried using float m1,m2,m3,etc under testing and debugging the values could be read in VS so it seemed easier to work with.
My question is because im fairly new with c++ what is the better design?
I find the float m1,m2 ... ,m16 easier to work with when debugging.
Would also love if someone could say from experience or has benchmark data what has better performance my gut says it shouldn't really matter because the matrix data should be laid out the same in memory right?
Edit:
Some more info its a column major matrix.
As far as i know i only need a 4x4 Matrix for the view transformation pipeline.
So nothing bigger and so i have some constant values.
Busy writing a simple software renderer as a way to learn more c++ and get some more experiences and learn/improve my Linear algebra skills. Will probably only go to per fragment shading and some simple lighting models and so far that i have seen 4x4 matrix is the biggest i will need for rendering.
Edit2:
Found out why i couldn't read the array data it was a float pointer i used and debugging menu only showed the pointer value i did discover a way to see the array value in watch where you have to do pointer, n where n = the element you want to see.
Everybody that answered thanks i will use the Vector4 m[4] answer for now.
You should consider a Vector4 with float [4] members, and a Matrix4 with Vector4 [4] members. With operator [], you have two useful classes, and maintain the ability to access elements with: [i][j] - in most cases, the element data will be contiguous, provided you don't use virtual methods.
You can also benefit from vector (SIMD) instructions this way, e.g., in Vector4
union alignas(16) { __m128 _v; float _s[4]; }; // members
inline float & operator [] (int i) { return _s[i]; }
inline const float & operator [] (int i) const { return _s[i]; }
and in Matrix4
Vector4 _m[4]; // members
inline Vector4 & operator [] (int i) { return _m[i]; }
inline const Vector4 & operator [] (int i) const { return _m[i]; }
The float m1, m2 .. m16; becomes very awkward to deal with when it comes to using loops to iterate through things. Using arrays of some sort is much easier. And, most likely, the compiler will generate AT LEAST as efficient code when you use loops as if you "hand-code", unless you actually write inline assembler or use SSE intrinsics.
The 16 float solution is fine as long as the code doesn't evolve (it is a hassle to maintain and it is not really readable)
The float[4][4] is a way better design (in terms of size parametrization) but you have to understand the notion of pointers.
I would use an array of 16 floats like float m[16]; with the sole reason being that it is very easy to pass it to a library like openGL, using the Matrix4fv suffix functions.
A 2D array like float m[4][4]; should also be configured in memory identically to float m[16] (see May I treat a 2D array as a contiguous 1D array?) and using that would be more convenient as far as having [row][col] (or [col][row] I am not sure which is correct in terms of openGL) indexing (compare m[1][1] vs m[5]).
Using separate variables for matrix elements may prove to be problematic. What are you planning to do when dealing with big matrices like 100x100?
Ofcourse you need to use some array-like structure and I strongly recommend you at least to use arrays

Swap 2D Double Arrays in c++

I have the following method to swap two double arrays (double**) in c++. Profiling the code, the method is accounting for 7% of the runtime... I was thinking that this should be a low cost operation, any suggestions? I am new with c++, but i was hoping to just swap the references to the arrays.
62 void Solver::Swap(double** &v1, double** &v2)
63 {
64 double** vswap = NULL;
65 vswap = v2;
66 v2 = v1;
67 v1 = vswap;
68 }
1) Make sure your function is inlined.
2) You can inplace swap, using a XOR for instance
3) Try to force the compiler to pass arguments using register instead of the stack (even though there's lot of register stress on x86, it's worth trying) - you can use the standard register keyword or play with fastcall on MS' compiler.
typedef double** TwoDimArray;
class Solver
{
inline void Swap( register TwoDimArray& a, register TwoDimArray& b )
{
a ^= b ^= a ^= b;
}
};
4) Don't bother giving default values for temporaries like vswap.
The code looks fine. Its is just a pointer assignment. It depends on how many times the method was got called.
I guess your profiler is getting confused here a bit, as this method really only swaps two pointers, which is very cheap. Unless this method is called a lot, it shouldn't show up in a profile. Does your profiler tell you how often this method gets called?
One issue you have to be aware of with swapping is that one array might be in the cache and the other not (especially if they are large), so constantly swapping pointers might trash the cache, but this would show up as a general slow-down.
Are you sure you profiled fully optimized code?
You should inlinethis function.
Other than that the only thing I see is that you first assign NULL to vswap and immediately afterwards some other value - but this should be taken care of by the optimizer.
inline void Solver::Swap(double** &v1, double** &v2)
{
double** vswap = v2;
v2 = v1;
v1 = vswap;
}
However, why don't you use std::swap()?
Don't assume that the 7% means this operation is slow - it depends on what else is going on.
You can have an operation that only takes 1 nanosecond, and make it take nearly 100% of the time, by doing nothing else.