Original Code used Int Arrays. Need to change to Vector Arrays - c++

TL;DR: Help me pass vector array input into a function that originally took in int pointers and ints.
I am in a college class that is learning about well-known algorithms. In this class, we use C++ to code the given algorithms. There was no pre-req class to learn C++, so my knowledge is pretty low when it comes to some of the major stuff with programming.
My problem: I have to create a program that takes an input file, sorts it with the user's choice of sorting algorithm, and write the results to an output file. My original code that works perfectly uses an input file of 20 items, placed into an array of length 20, and sorts no problem with each individual sorting algorithm.
Since last night, the only thing I have changed is that my input goes to a vector array, since the teacher will give us files of varying length (10 items to 1,000,000 items). I have four sorting algorithms that need to sort these given input files. Only one of them works, and it does not pass any variables into the function.
The other 3 originally passed in array pointers and other variables, however they do not work with my new input now going to a vector array instead of an int array. I know that what I am passing in needs to be changed, but I have no idea how to do this correctly. I have tried many different ways from sources found on the internet (with pointers and references), but I have had no luck. Here is some snipets of the code I'm using.
vector<int> A;
void insertionSort() // This works no problem as is.
void split(int* A, int* B, int low, int high) //code for Merge-Sort
{
//recurisvely calls split() and splitMerge()
}
void splitMerge(int* A, int* B, int low, int mid, int high) // more code for Merge-Sort
{
// more code for Merge-Sort
}
//quickSort() and countSort() also pass ints and do not work either.
//small part of main()
for (i = 0; unsorted >> temp; i++)
{
A.push_back(temp);
cout << A[i] << "\n";
length++; //I use int length extensively in for-loops in my functions
}
Last thing. I do not get an error when trying to run the Merge-Sort. It just freezes my console window.

conversion from vector to array is done this way
vector vInts;
...fill the vector
int * intArr=vInts[0]
so you don't need to modify your code too much

I believe there is not enough code to make an accurate prediction on where the error may be, but I think the problem is that the sorting algorithms is doing something like A++ with you pointers to access the next member.
Because arrays store the pointer to the next array inside the object and not on adjacent cells of memory, your algorithms are cycling through stuff they shouldn't.
If this is the case, the solution to your problem is to use an iterator instead of a pointer.
void split(A.begin(), B.begin(), int low, int high)

Related

I can't get the right output that I want and the answer changes every time

So I am trying to code for this question:
Yes, I have to use arrays since it is a requirement.
Consider the problem of adding two n-bit binary integers, stored in two n-element arrays A and B. The sum of the two integers should be stored in binary form in an (n+1) element array C . State the problem formally and write pseudocode for adding the two integers.
I know that the ans array contains the correct output at the end of the addd function. However, I am not able to output that answer.
Below is my code. Please help me figure where in the code I'm going wrong, and what I can do to change it so it works. I will be very grateful.
#include <iostream>
using namespace std;
int * addd(int a[], int n1, int b[], int n2)
{
int s;
if(n1<n2) {s=n2+1;}
else {s=n1+1;}
int ans[s];
int i=n1-1, j=n2-1, k=s-1;
int carry=0;
while(i>=0 && j>=0 && k>0)
{
ans[k]=(a[i]+b[j]+carry)%2;
//cout<<k<<" "<<ans[k]<<endl;
carry=(a[i]+b[j]+carry)/2;
i--; j--; k--;
}
//cout<<"Carry "<<carry<<endl;
ans[0]=carry;
return ans;
}
int main(int argc, const char * argv[]) {
// insert code here...
int a[]={0,0,0,1,1,1};
int n1=sizeof(a)/sizeof(a[0]);
int b[]={1,0,1,1,0,1};
int n2=sizeof(b)/sizeof(b[0]);
int *p=addd(a,6,b,6);
// cout<<p[1]<<endl;
// cout<<p[0]<<" "<<p[1]<<" "<<p[2]<<" "<<p[3]<<" "<<p[4]<<" "<<p[5]<<" "<<p[6]<<endl;
return 0;
}
using namespace std;
Don't write using namespace std;. I have a summary I paste in from a file of common issues when I'm active in the Code Review Stack Exchange, but I don't have that here. Instead, you should just declare the symbols you need, like using std::cout;
int * addd(int a[], int n1, int b[], int n2)
The parameters of the form int a[] are very odd. This comes from C and is actually transformed into int* a and is not passing the array per-se.
The inputs should be const.
The names are not clear, but I'm guessing that n1 is the size of the array? In the Standard Guidelines, you'll see that passing a pointer plus length is strongly discouraged. The Standard Guidelines Library supplies a simple span type to use for this instead.
And the length should be size_t not int.
Based on the description, I think each element is only one bit, right? So why are the arrays of type int? I'd use bool or perhaps int8_t as being easier to work with.
What are you returning? If a and b and their lengths are the input, where is the output that you are returning a pointer to the beginning of? This is not giving value semantics, as you are returning a pointer to something that must exist elsewhere so what is its lifetime?
int s;
int ans[s];
return ans;
Well, there's your problem. First of all, declaring an array of a size that's not a constant is not even legal. (This is a gnu extension that implements C's VLA feature but not without issues as it breaks the C++ type system)
Regardless of that, you are returning a pointer to the first element of the local array, so what happens to the memory when the function returns? Boom.
int s;
No. Initialize values when they are created.
if(n1<n2) {s=n2+1;}
else {s=n1+1;}
Learn the library.
How about:
const size_t s = 1+std::max(n1,n2);
and then the portable way to get your memory is:
std::vector<int> ans(s);
Your main logic will not work if one array is shorter than the other. The shorter input should behave as if it had leading zeros to match. Consider abstracting the problem of "getting the next bit" so you don't duplicate the code for handling each input and make an unreadable mess. You really should have learned to use collections and iterators first.
now:
return ans;
would work as intended since it is a value. You just need to declare the function to be the right type. So just use auto for the return type and it knows.
int n1=sizeof(a)/sizeof(a[0]);
Noooooooo.
There is a standard function to give the size of a built-in primitive array. But really, this should be done automatically as part of the passing, not as a separate thing, as noted earlier.
int *p=addd(a,6,b,6);
You wrote 6 instead of n1 etc.
Anyway, with the previous edits, it becomes:
using std::size;
const auto p = addd (a, size(a), b, size(b));
Finally, concerning:
cout<<p[0]<<" "<<p[1]<<" "<<p[2]<<" "<<p[3]<<" "<<p[4]<<" "<<p[5]<<" "<<p[6]<<endl;
How about using loops?
for (auto val : p) cout << val;
cout << '\n';
oh, don't use endl. It's not needed for cout which auto-flushes anyway, and it's slow. Modern best practice is to use '\n' and then flush explicitly if/when needed (like, never).
Let's look at:
int ans[s];
Apart that this is not even part of the standard and probably the compiler is giving you some warnings (see link), that command allocate temporary memory in the stack which gets deallocated on function exit: that's why you are getting every time different results, you are reading garbage, i.e. memory that in the meantime might have been overwritten.
You can replace it for example with
int* ans = new int[s];
Don't forget though to deallocate the memory when you have finished using the buffer (outside the function), to avoid memory leakage.
Some other notes:
int s;
if(n1<n2) {s=n2+1;}
else {s=n1+1;}
This can be more elegantly written as:
const int s = (n1 < n2) ? n2 + 1 : n1 + 1;
Also, the actual computation code is imprecise as it leads to wrong results if n1 is not equal to n2: You need further code to finish processing the remaining bits of the longest array. By the way you don't need to check on k > 0 because of the way you have defined s.
The following should work:
int i=n1-1, j=n2-1, k=s-1;
int carry=0;
while(i>=0 && j>=0)
{
ans[k]=(a[i]+b[j]+carry)%2;
carry=(a[i]+b[j]+carry)/2;
i--; j--; k--;
}
while(i>=0) {
ans[k]=(a[i]+carry)%2;
carry=(a[i]+carry)/2;
i--; k--;
}
while(j>=0) {
ans[k]=(b[j]+carry)%2;
carry=(b[j]+carry)/2;
j--; k--;
}
ans[0]=carry;
return ans;
}
If You Must Only Use C Arrays
Returning ans is returning the pointer to a local variable. The object the pointer refers to is no longer valid after then function has returned, so trying to read it would lead to undefined behavior.
One way to fix this is to pass in the address to an array to hold your answer, and populate that, instead of using a VLA (which is a non-standard C++ extension).
A VLA (variable length array) is an array which takes its size from a run-time computed value. In your case:
int s;
//... code that initializes s
int ans[s];
ans is a VLA because you are not using a constant to determine the array size. However, that is not a standard feature of the C++ language (it is an optional one in the C language).
You can modify your function so that ans is actually provided by the caller.
int * addd(int a[], int n1, int b[], int n2, int ans[])
{
//...
And then the caller would be responsible for passing in a large enough array to hold the answer.
Your function also appears to be incomplete.
while(i>=0 && j>=0 && k>0)
{
ans[k]=(a[i]+b[j]+carry)%2;
//cout<<k<<" "<<ans[k]<<endl;
carry=(a[i]+b[j]+carry)/2;
i--; j--; k--;
}
If one array is shorter than the other, then the index for the shorter array will reach 0 first. Then, when that corresponding index goes negative, the loop will stop, without handling the remaining terms in the longer array. This essentially makes the corresponding entries in ans be uninitialized. Reading those values results in undefined behavior.
To address this, you should populate the remaining entries in ans with the correct calculation based on carry and the remaining entries in the longer array.
A More C++ Approach
The original answer above was provided assuming you were constrained to only using C style arrays for both input and output, and that you wanted an answer that would allow you to stay close to your original implementation.
Below is a more C++ oriented solution, assuming you still need to provide C arrays as input, but otherwise no other constraint.
C Array Wrapper
A C array does not provide the amenities that you may be accustomed to have when using C++ containers. To gain some of these nice to have features, you can write an adapter that allows a C array to behave like a C++ container.
template <typename T, std::size_t N>
struct c_array_ref {
typedef T ARR_TYPE[N];
ARR_TYPE &arr_;
typedef T * iterator;
typedef std::reverse_iterator<T *> reverse_iterator;
c_array_ref (T (&arr)[N]) : arr_(arr) {}
std::size_t size () { return N; }
T & operator [] (int i) { return arr_[i]; }
operator ARR_TYPE & () { return arr_; }
iterator begin () { return &arr_[0]; }
iterator end () { return begin() + N; }
reverse_iterator rbegin () { return reverse_iterator(end()); }
reverse_iterator rend () { return reverse_iterator(begin()); }
};
Use C Array References
Instead of passing in two arguments as information about the array, you can pass in the array by reference, and use template argument deduction to deduce the array size.
Return a std::array
Although you cannot return a local C array like you attempted in your question, you can return an array that is wrapped inside a struct or class. That is precisely what the convenience container std::array provides. When you use C array references and template argument deduction to obtain the array size, you can now compute at compile time the proper array size that std::array should have for the return value.
template <std::size_t N1, std::size_t N2>
std::array<int, ((N1 < N2) ? N2 : N1) + 1>
addd(int (&a)[N1], int (&b)[N2])
{
Normalize the Input
It is much easier to solve the problem if you assume the arguments have been arranged in a particular order. If you always want the second argument to be the larger array, you can do that with a simple recursive call. This is perfectly safe, since we know the recursion will happen at most once.
if (N2 < N1) return addd(b, a);
Use C++ Containers (or Look-Alike Adapters)
We can now convert our arguments to the adapter shown earlier, and also create a std::array to hold the output.
c_array_ref<int, N1> aa(a);
c_array_ref<int, N2> bb(b);
std::array<int, std::max(N1, N2)+1> ans;
Leverage Existing Algorithms if Possible
In order to deal with the short comings of your original program, you can adjust your implementation a bit in an attempt to remove special cases. One way to do that is to store the result of adding the longer array to 0 and storing it into the output. However, this can mostly be accomplished with a simple call to std::copy.
ans[0] = 0;
std::copy(bb.begin(), bb.end(), ans.begin() + 1);
Since we know the input consists of only 1s and 0s, we can compute straight addition from the shorter array into the longer array, without concern for carry (that will be addressed in the next step). To compute this addition, we apply std::transform with a lambda expression.
std::transform(aa.rbegin(), aa.rend(), ans.rbegin(),
ans.rbegin(),
[](int a, int b) -> int { return a + b; });
Lastly, we can make a pass over the output array to fix up the carry computation. After doing so, we are ready to return the result. The return is possible because we are using std::array to represent the answer.
for (auto i = ans.rbegin(); i != ans.rend()-1; ++i) {
*(i+1) += *i / 2;
*i %= 2;
}
return ans;
}
A Simpler main Function
We now only need to pass in the two arrays to the addd function, since template type deduction will discover the sizes of the arrays. In addition, the output generator can be handled more easily with an ostream_iterator.
int main(int, const char * []) {
int a[]={1,0,0,0,1,1,1};
int b[]={1,0,1,1,0,1};
auto p=addd(a,b);
std::copy(p.begin(), p.end(),
std::ostream_iterator<int>(std::cout, " "));
return 0;
}
Try it online!
If I may editorialize a bit... I think this is a deceptively difficult question for beginners, and as-stated should flag problems in the design review long before any attempt at coding. It's telling you to do things that are not good/typical/idiomatic/proper in C++, and distracting you with issues that get in the way of the actual logic to be developed.
Consider the core algorithm you wrote (and Antonio corrected): that can be understood and discussed without worrying about just how A and B are actually passed in for this code to use, or exactly what kind of collection it is. If they were std::vector, std::array, or primitive C array, the usage would be identical. Likewise, how does one return the result out of the code? You populate ans here, and how it is gotten into and/or out of the code and back to main is not relevant.
Primitive C arrays are not first-class objects in C++ and there are special rules (inherited from C) on how they are passed as arguments.
Returning is even worse, and returning dynamic-sized things was a major headache in C and memory management like this is a major source of bugs and security flaws. What we want is value semantics.
Second, using arrays and subscripts is not idiomatic in C++. You use iterators and abstract over the exact nature of the collection. If you were interested in writing super-efficent back-end code that doesn't itself deal with memory management (it's called by other code that deals with the actual collections involved) it would look like std::merge which is a venerable function that dates back to the early 90's.
template< class InputIt1, class InputIt2, class OutputIt >
OutputIt merge( InputIt1 first1, InputIt1 last1,
InputIt2 first2, InputIt2 last2,
OutputIt d_first );
You can find others with similar signatures, that take two different ranges for input and outputs to a third area. If you write addp exactly like this, you could call it with primitive C arrays of hardcoded size:
int8_t A[] {0,0,0,1,1,1};
int8_t B[] {1,0,1,1,0,1};
int8_t C[ ??? ];
using std::begin; std::end;
addp (begin(A),end(A), begin(B), end(B), begin(C));
Note that it's up to the caller to have prepared an output area large enough, and there's no error checking.
However, the same code can be used with vectors, or even any combination of different container types. This could populate a std::vector as the result by passing an insertion iterator. But in this particular algorithm that's difficult since you're computing it in reverse order.
std::array
Improving upon the situation with primitive C arrays, you could use the std::array class which is exactly the same array but without the strange passing/returning rules. It's actually just a primitive C array inside a wrapping struct. See this documentation: https://en.cppreference.com/w/cpp/container/array
So you could write it as:
using BBBNum1 = std::array<int8_t, 6>
BBBNum1 addp (const BBBNum1& A, const BBBNum1& B) { ... }
The code inside can use A[i] etc. in the same way you are, but it also can get the size via A.size(). The issue here is that the inputs are the same length, and the output is the same as well (not 1 larger). Using templates, it could be written to make the lengths flexible but still only specified at compile time.
std::vector
The vector is like an array but with a run-time length. It's dynamic, and the go-to collection you should reach for in C++.
using BBBNum2 = std::vector<int8_t>
BBBNum2 addp (const BBBNum2& A, const BBBNum2& B) { ... }
Again, the code inside this function can refer to B[j] etc. and use B.size() exactly the same as with the array collection. But now, the size is a run-time property, and can be different for each one.
You would create your result, as in my first post, by giving the size as a constructor argument, and then you can return the vector by-value. Note that the compiler will do this efficiently and not actually have to copy anything if you write:
auto C = addp (A, B);
now for the real work
OK, now that this distraction is at least out of the way, you can worry about actually writing the implementation. I hope you are convinced that using vector instead of a C primitive array does not affect your problem logic or even the (available) syntax of using subscripts. Especially since the problem referred to psudocode, I interpret its use of "array" as "suitable indexable collection" and not specifically the primitive C array type.
The issue of going through 2 sequences together and dealing with differing lengths is actually a general purpose idea. In C++20, the Range library has things that make quick work of this. Older 3rd party libraries exist as well, and you might find it called zip or something like that.
But, let's look at writing it from scratch.
You want to read an item at a time from two inputs, but neatly make it look like they're the same length. You don't want to write the same code three times, or elaborate on the cases where A is shorter or where B may be shorter... just abstract out the idea that they are read together, and if one runs out it provides zeros.
This is its own piece of code that can be applied twice, to A and to B.
class backwards_bit_reader {
const BBBnum2& x;
size_t index;
public:
backwards_bit_reader(const BBBnum2& x) : x{x}, index{x.size()} {}
bool done() const { return index == 0; }
int8_t next()
{
if (done()) return 0; // keep reading infinite leading zeros
--index;
return x[index];
}
};
Now you can write something like:
backwards_bit_reader A_in { A };
backwards_bit_reader B_in { B };
while (!A_in.done() && !B_in.done()) {
const a = A_in.next();
const b = B_in.next();
const c = a+b+carry;
carry = c/2; // update
C[--k]= c%2;
}
C[0]= carry; // the final bit, one longer than the input
It can be written far more compactly, but this is clear.
another approach
The problem is, is writing backwards_bit_reader beyond what you've learned thus far? How else might you apply the same logic to both A and B without duplicating the statements?
You should be learning to recognize what's sometimes called "code smell". Repeating the same block of code multiple times, and repeating the same steps with nothing changed but which variable it's applying to, should be seen as ugly and unacceptable.
You can at least cut back the cases by ensuring that B is always the longer one, if they are of different length. Do this by swapping A and B if that's not the case, as a preliminary step. (Actually implementing that well is another digression)
But the logic is still nearly duplicated, since you have to deal with the possibility of the carry propagating all the way to the end. Just now you have 2 copies instead of 3.
Extending the shorter one, at least in façade, is the only way to write one loop.
how realistic is this problem?
It's simplified to the point of being silly, but if it's not done in base 2 but with larger values, this is actually implementing multi-precision arithmetic, which is a real thing people want to do. That's why I named the type above BBBNum for "Bad Binary Bignum".
Getting down to an actual range of memory and wanting the code to be fast and optimized is also something you want to do sometimes. The BigNum is one example; you often see this with string processing. But we'll want to make an efficient back-end that operates on memory without knowing how it was allocated, and higher-level wrappers that call it.
For example:
void addp (const int8_t* a_begin, const int8_t* a_end,
const int8_t* b_begin, const int8_t* b_end,
int8_t* result_begin, int8_t* result_end);
will use the provided range for output, not knowing or caring how it was allocated, and taking input that's any contiguous range without caring what type of container is used to manage it as long as it's contiguous. Note that as you saw with the std::merge example, it's more idiomatic to pass begin and end rather than begin and size.
But then you have helper functions like:
BBBNum2 addp (const BBBNum2& A, const BBBNum2& B)
{
BBBNum result (1+std::max(A.size(),B.size());
addp (A.data(), A.data()+A.size(), B.data(), B.data()+B.size(), C.data(), C.data()+C.size());
}
Now the casual user can call it using vectors and a dynamically-created result, but it's still available to call for arrays, pre-allocated result buffers, etc.

C++: What is causing this stack smashing error?

Disclaimer: I have limited knowledge of C++ due to switching from a college where they didn't teach C++ to another where it was the only language that was taught.
I'm trying to implement the box counting method for a randomly generated 2D cluster in a lattice that's 54x54.
One of the requirements is that we use a 1D array to represent the 2D square lattice, so a transformation is required to associate x and y values (columns and lines, respectively) to the actual positions of the array.
The transformation is "i = x + y*N", with N being the length of the side of the square lattice (in this case, it would be 54) and i being the position of the array.
The box-counting method, simply put, involves splitting a grid into large squares that get progressively smaller and counting how many contain the cluster in each instance.
The code works in the way that it should for smaller lattice sizes, at least the ones that I could verify (for obvious reasons, I can't verify even a 10x10 lattice by hand). However, when I run it, the box size goes all the way to 1/37 and gives me a "stack smashing detected" error.
From what I understand, the error may have something to do with array sizes, but I've checked the points where the arrays are accessed and made sure they're within the actual dimensions of the array.
A "for" in the function "boxTransform(int grid[], int NNew, int div)" is responsible for the error in question, but I added other functions that I believe are relevant to it.
The rest of the code is just defining a lattice and isolating the aggregate, which is then passed to boxCounting(int grid[]), and creating a .dat file. Those work fine.
To "fit" the larger array into the smaller one, I divide each coordinate (x, y) by the ratio of squares on the large array to the small array. This is how my teacher explained it, and as mentioned before, works fine for smaller array sizes.
EDIT: Thanks to a comment by VTT, I went back and checked if the array index goes out of bounds with the code itself. It is indeed the case, which is likely the origin of the problem.
EDIT #2: It was indeed the origin of the problem. There was a slight error in the calculations that didn't appear for smaller lattice sizes (or I just missed it).
//grid[] is an array containing the cluster
//that I want to analyze.
void boxCounting(int grid[]) {
//N is a global constant; it's the length of the
//side of the square lattice that's being analyzed.
//NNew is the side of the larger squares. It will
//be increased until it reaches N
for (int NNew = 1; N - NNew > 0; NNew++) {
int div = N/NNew;
boxTransform(grid, NNew, div);
}
}
void boxTransform(int grid[], int NNew, int div) {
int gridNew[NNew*NNew];
//Here the array elements are set to zero, which
//I understand C++ cannot do natively
for (int i = 0; i < NNew*NNew; i++) {
gridNew[i] = 0;
}
for (int row = 0; row < N; row++) {
for (int col = 0; col < N; col++) {
if (grid[col + row*N] == 1) {
//This is where the error occurs. The idea here is
//that if a square on the initial grid is occupied,
//the corresponding square on the new grid will have
//its value increased by 1, so I can later check
//how many squares on the larger grid are occupied
gridNew[col/div + (row/div)*NNew]++;
}
}
}
int boxes = countBox(gridNew, NNew);
//Creates a .dat file with the relevant values
printResult(boxes, NNew);
}
int countBox(int grid[], int NNew) {
int boxes = 0;
//Any array values that weren't touched remain at zero,
//so I just have to check that it's greater than zero
//to know if the square is occupied or not
for(int i = 0; i < NNew*NNew; i++) {
if(grid[i] > 0) boxes++;
}
return boxes;
}
Unfortunately this is not enough information to find the exact problem for you but I will try to help.
There are like multiple reasons that you should use a dynamic array instead of the fixed size arrays that you are using except if it's required in your exercise.
If you've been learning other languages you might think that fixed array is good enough, but it's far more dangerous in C++ than in most of the languages.
int gridNew[NNew*NNew]; You should know that this is not valid according to C++ standard, only the GCC compiler made it work. In C++ you always have to know the size of the fixed arrays in compile time. Which means you can't use variables to declare an array.
You keep updating global variables to track the size of the array which makes your code super hard to read. You are probably doing this because you know that you are not able to query the size of the array once you pass it to a function.
For both of these problems a dynamic array is the perfect solution. The standard dynamic array implementation in C++ is the std::vector: https://en.cppreference.com/w/cpp/container/vector
When you create a vector you can define it's size and also you can query the length of the vector with the size() member function.
Even better: You can use the at() function instead of the square brackets([]) to get and element with an index which does bounds check for you and throws an exception if you provided an index which is out of bounds which helps a lot to locate these kind of errors. Because in C++ if you just simply provide an index which does not exist in an array it is an undefined behaviour which might be your problem.
I wouldn't like to write any more features of the vector because it's really easy to find examples on how to do these things, I just wanted to help you where to start.
VTT was right in his comment. There was a small issue with the transformation to fit the large array into the smaller one that made the index go out of bounds. I only checked this on pen and paper when I should've put it in the actual code, which is why I didn't notice it. Since he didn't post it as an answer, I'm doing so on his behalf.
The int gridNew[NNew*NNew]; bit was kind of a red herring, but I appreciate the lesson and will take that into account when coding in C++ in the future.

Copying vector elements to a vector pair

In my C++ code,
vector <string> strVector = GetStringVector();
vector <int> intVector = GetIntVector();
So I combined these two vectors into a single one,
void combineVectors(vector<string>& strVector, vector <int>& intVector, vector < pair <string, int>>& pairVector)
{
for (int i = 0; i < strVector.size() || i < intVector.size(); ++i )
{
pairVector.push_back(pair<string, int> (strVector.at(i), intVector.at(i)));
}
}
Now this function is called like this,
vector <string> strVector = GetStringVector();
vector <int> intVector = GetIntVector();
vector < pair <string, int>> pairVector
combineVectors(strVector, intVector, pairVector);
//rest of the implementation
The combineVectors function uses a loop to add the elements of other 2 vectors to the vector pair. I doubt this is a efficient way as this function gets called hundrands of times passing different data. This might cause a performance issue because everytime it goes through the loop.
My goal is to copy both the vectors in "one go" to the vector pair. i.e., without using a loop. Am not sure whether that's even possible.
Is there a better way of achieving this without compromising the performance?
You have clarified that the arrays will always be of equal size. That's a prerequisite condition.
So, your situation is as follows. You have vector A over here, and vector B over there. You have no guarantees whether the actual memory that vector A uses and the actual memory that vector B uses are next to each other. They could be anywhere.
Now you're combining the two vectors into a third vector, C. Again, no guarantees where vector C's memory is.
So, you have really very little to work with, in terms of optimizations. You have no additional guarantees whatsoever. This is pretty much fundamental: you have two chunks of bytes, and those two chunks need to be copied somewhere else. That's it. That's what has to be done, that's what it all comes down to, and there is no other way to get it done, other than doing exactly that.
But there is one thing that can be done to make things a little bit faster. A vector will typically allocate memory for its values in incremental steps, reserving some extra space, initially, and as values get added to the vector, one by one, and eventually reach the vector's reserved size, the vector has to now grab a new larger block of memory, copy everything in the vector to the larger memory block, then delete the older block, and only then add the next value to the vector. Then the cycle begins again.
But you know, in advance, how many values you are about to add to the vector, so you simply instruct the vector to reserve() enough size in advance, so it doesn't have to repeatedly grow itself, as you add values to it. Before your existing for loop, simply:
pairVector.reserve(pairVector.size()+strVector.size());
Now, the for loop will proceed and insert new values into pairVector which is guaranteed to have enough space.
A couple of other things are possible. Since you have stated that both vectors will always have the same size, you only need to check the size of one of them:
for (int i = 0; i < strVector.size(); ++i )
Next step: at() performs bounds checking. This loop ensures that i will never be out of bounds, so at()'s bound checking is also some overhead you can get rid of safely:
pairVector.push_back(pair<string, int> (strVector[i], intVector[i]));
Next: with a modern C++ compiler, the compiler should be able to optimize away, automatically, several redundant temporaries, and temporary copies here. It's possible you may need to help the compiler, a little bit, and use emplace_back() instead of push_back() (assuming C++11, or later):
pairVector.emplace_back(strVector[i], intVector[i]);
Going back to the loop condition, strVector.size() gets evaluated on each iteration of the loop. It's very likely that a modern C++ compiler will optimize it away, but just in case you can also help your compiler check the vector's size() only once:
int i=strVector.size();
for (int i = 0; i < n; ++i )
This is really a stretch, but it might eke out a few extra quantums of execution time. And that pretty much all obvious optimizations here. Realistically, the most to be gained here is by using reserve(). The other optimizations might help things a little bit more, but it all boils down to moving a certain number of bytes from one area in memory to another area. There aren't really special ways of doing that, that's faster than other ways.
We can use std:generate() to achieve this:
#include <bits/stdc++.h>
using namespace std;
vector <string> strVector{ "hello", "world" };
vector <int> intVector{ 2, 3 };
pair<string, int> f()
{
static int i = -1;
++i;
return make_pair(strVector[i], intVector[i]);
}
int main() {
int min_Size = min(strVector.size(), intVector.size());
vector< pair<string,int> > pairVector(min_Size);
generate(pairVector.begin(), pairVector.end(), f);
for( int i = 0 ; i < 2 ; i++ )
cout << pairVector[i].first <<" " << pairVector[i].second << endl;
}
I'll try and summarize what you want with some possible answers depending on your situation. You say you want a new vector that is essentially a zipped version of two other vectors which contain two heterogeneous types. Where you can access the two types as some sort of pair?
If you want to make this more efficient, you need to think about what you are using the new vector for? I can see three scenarios with what you are doing.
The new vector is a copy of your data so you can do stuff with it without affecting the original vectors. (ei you still need the original two vectors)
The new vector is now the storage mechanism for your data. (ei you
no longer need the original two vectors)
You are simply coupling the vectors together to make use and representation easier. (ei where they are stored doesn't actually matter)
1) Not much you can do aside from copying the data into your new vector. Explained more in Sam Varshavchik's answer.
3) You do something like Shakil's answer or here or some type of customized iterator.
2) Here you make some optimisations here where you do zero coping of the data with the use of a wrapper class. Note: A wrapper class works if you don't need to use the actual std::vector < std::pair > class. You can make a class where you move the data into it and create access operators for it. If you can do this, it also allows you to decompose the wrapper back into the original two vectors without copying. Something like this might suffice.
class StringIntContainer {
public:
StringIntContaint(std::vector<std::string>& _string_vec, std::vector<int>& _int_vec)
: string_vec_(std::move(_string_vec)), int_vec_(std::move(_int_vec))
{
assert(string_vec_.size() == int_vec_.size());
}
std::pair<std::string, int> operator[] (std::size_t _i) const
{
return std::make_pair(string_vec_[_i], int_vec_[_i]);
}
/* You may want methods that return reference to data so you can edit it*/
std::pair<std::vector<std::string>, std::vector<int>> Decompose()
{
return std::make_pair(std::move(string_vec_), std::move(int_vec_[_i])));
}
private:
std::vector<std::string> _string_vec_;
std::vector<int> int_vec_;
};

Alternative to using mpfr arrays

I am trying to write a function in C++ using MPFR to calculate multiple values. I am currently using an mpfr array to store those values. It is unknown how many values need to be calculated and stored each time. Here is the function:
void Calculator(mpfr_t x, int v, mpfr_t *Values, int numOfTerms, int mpfr_bits) {
for (int i = 0; i < numOfTerms; i++) {
mpfr_init2(Values[i], mpfr_bits);
mpfr_set(Values[i], x, GMP_RNDN);
mpfr_div_si(Values[i], Values[i], pow(-1,i+1)*(i+1)*pow(v,i+1), GMP_RNDN);
}
}
The program itself has a while loop that has a nested for loop that takes these values and does calculations with them. In this way, I don't have to recalculate these values each time within the for loop. When the for loop is finished, I clear the memory with
delete[] Values;
before the the while loops starts again in which case, it redeclares the array with
mpfr_t *Values;
Values = new mpfr_t[numOfTerms];
The number of values that need to be stored are calculated by a different function and is told to the function through the variable numOfTerms. The problem is that for some reason, the array slows down the program tremendously. I am working with very large numbers so the thought is that if I recalculate those values each time, it gets extremely expensive but this method is significantly slower than just recalculating the values in each iteration of the for loop. Is there an alternative method to this?
EDIT** Instead of redeclaring the array over each time, I moved the declaration and the delete[] Values outside of the while loop. Now I am just clearing each element of the array with
for (int i = 0; i < numOfTerms; i++) {
mpfr_clear(Values[i]);
}
inside of the while loop before the while loop starts over. The program has gotten noticeably faster but is still much slower than just calculating each value over.
If I understand correctly, you are doing inside a while loop: mpfr_init2 (at the beginning of the iteration) and mpfr_clear (at the end of the iteration) on numOfTerms MPFR numbers, and the value of numOfTerms depends on the iteration. And this is what takes most of the time.
To avoid these many memory allocations by mpfr_init2 and deallocations by mpfr_clear, I suggest that you declare the array outside the while loop and initially call the mpfr_init2 outside the while loop. The length of the array (i.e. the number of terms) should be what you think is the maximum number of terms. What can happen is that for some iterations, the chosen number of terms was too small. In such a case, you need to increase the length of the array (this will need a reallocation) and call mpfr_init2 on the new elements. This will be the new length of the array for the remaining iterations, until the array needs to be enlarged again. After the while loop, do the mpfr_clear's.
When you need to enlarge the array, have a good strategy to choose the new number of elements. Just taking the needed value of numOfTerms for the current iteration may not be a good one, since it may yield many reallocations. For instance, make sure that you have at least a N% increase. Do some tests to choose the best value for N... See Dynamic array for instance. In particular, you may want to use the C++ implementation of dynamic arrays, as mentioned on this Wikipedia article.

push_back/append or appending a vector with a loop in C++ Armadillo

I would like to create a vector (arma::uvec) of integers - I do not ex ante know the size of the vector. I could not find approptiate function in Armadillo documentation, but moreover I was not successfull with creating the vector by a loop. I think the issue is initializing the vector or in keeping track of its length.
arma::uvec foo(arma::vec x){
arma::uvec vect;
int nn=x.size();
vect(0)=1;
int ind=0;
for (int i=0; i<nn; i++){
if ((x(i)>0)){
ind=ind+1;
vect(ind)=i;
}
}
return vect;
}
The error message is: Error: Mat::operator(): index out of bounds.
I would not want to assign 1 to the first element of the vector, but could live with that if necessary.
PS: I would really like to know how to obtain the vector of unknown length by appending, so that I could use it even in more general cases.
Repeatedly appending elements to a vector is a really bad idea from a performance point of view, as it can cause repeated memory reallocations and copies.
There are two main solutions to that.
Set the size of the vector to the theoretical maximum length of your operation (nn in this case), and then use a loop to set some of the values in the vector. You will need to keep a separate counter for the number of set elements in the vector so far. After the loop, take a subvector of the vector, using the .head() function. The advantage here is that there will be only one copy.
An alternative solution is to use two loops, to reduce memory usage. In the first loop work out the final length of the vector. Then set the size of the vector to the final length. In the second loop set the elements in the vector. Obviously using two loops is less efficient than one loop, but it's likely that this is still going to be much faster than appending.
If you still want to be a lazy coder and inefficiently append elements, use the .insert_rows() function.
As a sidenote, your foo(arma::vec x) is already making an unnecessary copy the input vector. Arguments in C++ are by default passed by value, which basically means C++ will make a copy of x before running your function. To avoid this unnecessary copy, change your function to foo(const arma::vec& x), which means take a constant reference to x. The & is critical here.
In addition to mtall's answer, which i agree with,
for a case in which performance wasn't needed i used this:
void uvec_push(arma::uvec & v, unsigned int value) {
arma::uvec av(1);
av.at(0) = value;
v.insert_rows(v.n_rows, av.row(0));
}