I am trying to write a function in C++ using MPFR to calculate multiple values. I am currently using an mpfr array to store those values. It is unknown how many values need to be calculated and stored each time. Here is the function:
void Calculator(mpfr_t x, int v, mpfr_t *Values, int numOfTerms, int mpfr_bits) {
for (int i = 0; i < numOfTerms; i++) {
mpfr_init2(Values[i], mpfr_bits);
mpfr_set(Values[i], x, GMP_RNDN);
mpfr_div_si(Values[i], Values[i], pow(-1,i+1)*(i+1)*pow(v,i+1), GMP_RNDN);
}
}
The program itself has a while loop that has a nested for loop that takes these values and does calculations with them. In this way, I don't have to recalculate these values each time within the for loop. When the for loop is finished, I clear the memory with
delete[] Values;
before the the while loops starts again in which case, it redeclares the array with
mpfr_t *Values;
Values = new mpfr_t[numOfTerms];
The number of values that need to be stored are calculated by a different function and is told to the function through the variable numOfTerms. The problem is that for some reason, the array slows down the program tremendously. I am working with very large numbers so the thought is that if I recalculate those values each time, it gets extremely expensive but this method is significantly slower than just recalculating the values in each iteration of the for loop. Is there an alternative method to this?
EDIT** Instead of redeclaring the array over each time, I moved the declaration and the delete[] Values outside of the while loop. Now I am just clearing each element of the array with
for (int i = 0; i < numOfTerms; i++) {
mpfr_clear(Values[i]);
}
inside of the while loop before the while loop starts over. The program has gotten noticeably faster but is still much slower than just calculating each value over.
If I understand correctly, you are doing inside a while loop: mpfr_init2 (at the beginning of the iteration) and mpfr_clear (at the end of the iteration) on numOfTerms MPFR numbers, and the value of numOfTerms depends on the iteration. And this is what takes most of the time.
To avoid these many memory allocations by mpfr_init2 and deallocations by mpfr_clear, I suggest that you declare the array outside the while loop and initially call the mpfr_init2 outside the while loop. The length of the array (i.e. the number of terms) should be what you think is the maximum number of terms. What can happen is that for some iterations, the chosen number of terms was too small. In such a case, you need to increase the length of the array (this will need a reallocation) and call mpfr_init2 on the new elements. This will be the new length of the array for the remaining iterations, until the array needs to be enlarged again. After the while loop, do the mpfr_clear's.
When you need to enlarge the array, have a good strategy to choose the new number of elements. Just taking the needed value of numOfTerms for the current iteration may not be a good one, since it may yield many reallocations. For instance, make sure that you have at least a N% increase. Do some tests to choose the best value for N... See Dynamic array for instance. In particular, you may want to use the C++ implementation of dynamic arrays, as mentioned on this Wikipedia article.
Related
Disclaimer: I have limited knowledge of C++ due to switching from a college where they didn't teach C++ to another where it was the only language that was taught.
I'm trying to implement the box counting method for a randomly generated 2D cluster in a lattice that's 54x54.
One of the requirements is that we use a 1D array to represent the 2D square lattice, so a transformation is required to associate x and y values (columns and lines, respectively) to the actual positions of the array.
The transformation is "i = x + y*N", with N being the length of the side of the square lattice (in this case, it would be 54) and i being the position of the array.
The box-counting method, simply put, involves splitting a grid into large squares that get progressively smaller and counting how many contain the cluster in each instance.
The code works in the way that it should for smaller lattice sizes, at least the ones that I could verify (for obvious reasons, I can't verify even a 10x10 lattice by hand). However, when I run it, the box size goes all the way to 1/37 and gives me a "stack smashing detected" error.
From what I understand, the error may have something to do with array sizes, but I've checked the points where the arrays are accessed and made sure they're within the actual dimensions of the array.
A "for" in the function "boxTransform(int grid[], int NNew, int div)" is responsible for the error in question, but I added other functions that I believe are relevant to it.
The rest of the code is just defining a lattice and isolating the aggregate, which is then passed to boxCounting(int grid[]), and creating a .dat file. Those work fine.
To "fit" the larger array into the smaller one, I divide each coordinate (x, y) by the ratio of squares on the large array to the small array. This is how my teacher explained it, and as mentioned before, works fine for smaller array sizes.
EDIT: Thanks to a comment by VTT, I went back and checked if the array index goes out of bounds with the code itself. It is indeed the case, which is likely the origin of the problem.
EDIT #2: It was indeed the origin of the problem. There was a slight error in the calculations that didn't appear for smaller lattice sizes (or I just missed it).
//grid[] is an array containing the cluster
//that I want to analyze.
void boxCounting(int grid[]) {
//N is a global constant; it's the length of the
//side of the square lattice that's being analyzed.
//NNew is the side of the larger squares. It will
//be increased until it reaches N
for (int NNew = 1; N - NNew > 0; NNew++) {
int div = N/NNew;
boxTransform(grid, NNew, div);
}
}
void boxTransform(int grid[], int NNew, int div) {
int gridNew[NNew*NNew];
//Here the array elements are set to zero, which
//I understand C++ cannot do natively
for (int i = 0; i < NNew*NNew; i++) {
gridNew[i] = 0;
}
for (int row = 0; row < N; row++) {
for (int col = 0; col < N; col++) {
if (grid[col + row*N] == 1) {
//This is where the error occurs. The idea here is
//that if a square on the initial grid is occupied,
//the corresponding square on the new grid will have
//its value increased by 1, so I can later check
//how many squares on the larger grid are occupied
gridNew[col/div + (row/div)*NNew]++;
}
}
}
int boxes = countBox(gridNew, NNew);
//Creates a .dat file with the relevant values
printResult(boxes, NNew);
}
int countBox(int grid[], int NNew) {
int boxes = 0;
//Any array values that weren't touched remain at zero,
//so I just have to check that it's greater than zero
//to know if the square is occupied or not
for(int i = 0; i < NNew*NNew; i++) {
if(grid[i] > 0) boxes++;
}
return boxes;
}
Unfortunately this is not enough information to find the exact problem for you but I will try to help.
There are like multiple reasons that you should use a dynamic array instead of the fixed size arrays that you are using except if it's required in your exercise.
If you've been learning other languages you might think that fixed array is good enough, but it's far more dangerous in C++ than in most of the languages.
int gridNew[NNew*NNew]; You should know that this is not valid according to C++ standard, only the GCC compiler made it work. In C++ you always have to know the size of the fixed arrays in compile time. Which means you can't use variables to declare an array.
You keep updating global variables to track the size of the array which makes your code super hard to read. You are probably doing this because you know that you are not able to query the size of the array once you pass it to a function.
For both of these problems a dynamic array is the perfect solution. The standard dynamic array implementation in C++ is the std::vector: https://en.cppreference.com/w/cpp/container/vector
When you create a vector you can define it's size and also you can query the length of the vector with the size() member function.
Even better: You can use the at() function instead of the square brackets([]) to get and element with an index which does bounds check for you and throws an exception if you provided an index which is out of bounds which helps a lot to locate these kind of errors. Because in C++ if you just simply provide an index which does not exist in an array it is an undefined behaviour which might be your problem.
I wouldn't like to write any more features of the vector because it's really easy to find examples on how to do these things, I just wanted to help you where to start.
VTT was right in his comment. There was a small issue with the transformation to fit the large array into the smaller one that made the index go out of bounds. I only checked this on pen and paper when I should've put it in the actual code, which is why I didn't notice it. Since he didn't post it as an answer, I'm doing so on his behalf.
The int gridNew[NNew*NNew]; bit was kind of a red herring, but I appreciate the lesson and will take that into account when coding in C++ in the future.
DISCLAIMER: I'm very new to C++ so I'm sorry if this is a stupid question!
I'm trying to read in data to an 1000 element array (double) and then if there are less than 1000 data points to read in ignore the excess elements for the rest of my program.
I've defined a 1000 element array and read in the data and now want to carry out a function on each element which has been defined by the read in data point. How do I test if an element is defined yet? I would use a Boolean algebra test i.e. if(array[i]) {\\function} but the data points can be any natural number including zero, so I don't know if this would work. How would I solve this problem?
The most typical approach to the problem of "the number of things in my array is not fixed ahead of time" is to have a variable that keeps track of how many things are actually in the array. Then, you just loop over that many things.
Since you add the C++ tag, you can (and should) use the vector class to manage everything for you — and you even get the added benefit that it can grow beyond 1000 elements should you happen to have more than that.
(aside: if you insist on sticking with a 1000-long array, you really should make sure you do something appropriate should you actually get more than 1000 data points)
You could initialize your array with a sentinel value like NAN (i.e., not a number):
double array[1000];
std::fill(std::begin(array), std::end(array), NAN);
Then fill sequentially your array:
array[0] = 1.2;
array[1] = 2.3;
array[2] = 3.4;
And then break the loop as soon as this value is met:
for(int i(0); i < 1000; ++i) {
if(isnan(array[i])) break;
function(array[i]);
}
LIVE DEMO
I want to sort an array with huge(millions or even billions) elements, while the values are integers within a small range(1 to 100 or 1 to 1000), in such a case, is std::sort and the parallelized version __gnu_parallel::sort the best choice for me?
actually I want to sort a vecotor of my own class with an integer member representing the processor index.
as there are other member inside the class, so, even if two data have same integer member that is used for comparing, they might not be regarded as same data.
Counting sort would be the right choice if you know that your range is so limited. If the range is [0,m) the most efficient way to do so it have a vector in which the index represent the element and the value the count. For example:
vector<int> to_sort;
vector<int> counts;
for (int i : to_sort) {
if (counts.size() < i) {
counts.resize(i+1, 0);
}
counts[i]++;
}
Note that the count at i is lazily initialized but you can resize once if you know m.
If you are sorting objects by some field and they are all distinct, you can modify the above as:
vector<T> to_sort;
vector<vector<const T*>> count_sorted;
for (const T& t : to_sort) {
const int i = t.sort_field()
if (count_sorted.size() < i) {
count_sorted.resize(i+1, {});
}
count_sorted[i].push_back(&t);
}
Now the main difference is that your space requirements grow substantially because you need to store the vectors of pointers. The space complexity went from O(m) to O(n). Time complexity is the same. Note that the algorithm is stable. The code above assumes that to_sort is in scope during the life cycle of count_sorted. If your Ts implement move semantics you can store the object themselves and move them in. If you need count_sorted to outlive to_sort you will need to do so or make copies.
If you have a range of type [-l, m), the substance does not change much, but your index now represents the value i + l and you need to know l beforehand.
Finally, it should be trivial to simulate an iteration through the sorted array by iterating through the counts array taking into account the value of the count. If you want stl like iterators you might need a custom data structure that encapsulates that behavior.
Note: in the previous version of this answer I mentioned multiset as a way to use a data structure to count sort. This would be efficient in some java implementations (I believe the Guava implementation would be efficient) but not in C++ where the keys in the RB tree are just repeated many times.
You say "in-place", I therefore assume that you don't want to use O(n) extra memory.
First, count the number of objects with each value (as in Gionvanni's and ronaldo's answers). You still need to get the objects into the right locations in-place. I think the following works, but I haven't implemented or tested it:
Create a cumulative sum from your counts, so that you know what index each object needs to go to. For example, if the counts are 1: 3, 2: 5, 3: 7, then the cumulative sums are 1: 0, 2: 3, 3: 8, 4: 15, meaning that the first object with value 1 in the final array will be at index 0, the first object with value 2 will be at index 3, and so on.
The basic idea now is to go through the vector, starting from the beginning. Get the element's processor index, and look up the corresponding cumulative sum. This is where you want it to be. If it's already in that location, move on to the next element of the vector and increment the cumulative sum (so that the next object with that value goes in the next position along). If it's not already in the right location, swap it with the correct location, increment the cumulative sum, and then continue the process for the element you swapped into this position in the vector.
There's a potential problem when you reach the start of a block of elements that have already been moved into place. You can solve that by remembering the original cumulative sums, "noticing" when you reach one, and jump ahead to the current cumulative sum for that value, so that you don't revisit any elements that you've already swapped into place. There might be a cleverer way to deal with this, but I don't know it.
Finally, compare the performance (and correctness!) of your code against std::sort. This has better time complexity than std::sort, but that doesn't mean it's necessarily faster for your actual data.
You definitely want to use counting sort. But not the one you're thinking of. Its main selling point is that its time complexity is O(N+X) where X is the maximum value you allow the sorting of.
Regular old counting sort (as seen on some other answers) can only sort integers, or has to be implemented with a multiset or some other data structure (becoming O(Nlog(N))). But a more general version of counting sort can be used to sort (in place) anything that can provide an integer key, which is perfectly suited to your use case.
The algorithm is somewhat different though, and it's also known as American Flag Sort. Just like regular counting sort, it starts off by calculating the counts.
After that, it builds a prefix sums array of the counts. This is so that we can know how many elements should be placed behind a particular item, thus allowing us to index into the right place in constant time.
since we know the correct final position of the items, we can just swap them into place. And doing just that would work if there weren't any repetitions but, since it's almost certain that there will be repetitions, we have to be more careful.
First: when we put something into its place we have to increment the value in the prefix sum so that the next element with same value doesn't remove the previous element from its place.
Second: either
keep track of how many elements of each value we have already put into place so that we dont keep moving elements of values that have already reached their place, this requires a second copy of the counts array (prior to calculating the prefix sum), as well as a "move count" array.
keep a copy of the prefix sums shifted over by one so that we stop moving elements once the stored position of the latest element
reaches the first position of the next value.
Even though the first approach is somewhat more intuitive, I chose the second method (because it's faster and uses less memory).
template<class It, class KeyOf>
void countsort (It begin, It end, KeyOf key_of) {
constexpr int max_value = 1000;
int final_destination[max_value] = {}; // zero initialized
int destination[max_value] = {}; // zero initialized
// Record counts
for (It it = begin; it != end; ++it)
final_destination[key_of(*it)]++;
// Build prefix sum of counts
for (int i = 1; i < max_value; ++i) {
final_destination[i] += final_destination[i-1];
destination[i] = final_destination[i-1];
}
for (auto it = begin; it != end; ++it) {
auto key = key_of(*it);
// while item is not in the correct position
while ( std::distance(begin, it) != destination[key] &&
// and not all items of this value have reached their final position
final_destination[key] != destination[key] ) {
// swap into the right place
std::iter_swap(it, begin + destination[key]);
// tidy up for next iteration
++destination[key];
key = key_of(*it);
}
}
}
Usage:
vector<Person> records = populateRecords();
countsort(records.begin(), records.end(), [](Person const &){
return Person.id()-1; // map [1, 1000] -> [0, 1000)
});
This can be further generalized to become MSD Radix Sort,
here's a talk by Malte Skarupke about it: https://www.youtube.com/watch?v=zqs87a_7zxw
Here's a neat visualization of the algorithm: https://www.youtube.com/watch?v=k1XkZ5ANO64
The answer given by Giovanni Botta is perfect, and Counting Sort is definitely the way to go. However, I personally prefer not to go resizing the vector progressively, but I'd rather do it this way (assuming your range is [0-1000]):
vector<int> to_sort;
vector<int> counts(1001);
int maxvalue=0;
for (int i : to_sort) {
if(i > maxvalue) maxvalue = i;
counts[i]++;
}
counts.resize(maxvalue+1);
It is essentially the same, but no need to be constantly managing the size of the counts vector. Depending on your memory constraints, you could use one solution or the other.
I would like to create a vector (arma::uvec) of integers - I do not ex ante know the size of the vector. I could not find approptiate function in Armadillo documentation, but moreover I was not successfull with creating the vector by a loop. I think the issue is initializing the vector or in keeping track of its length.
arma::uvec foo(arma::vec x){
arma::uvec vect;
int nn=x.size();
vect(0)=1;
int ind=0;
for (int i=0; i<nn; i++){
if ((x(i)>0)){
ind=ind+1;
vect(ind)=i;
}
}
return vect;
}
The error message is: Error: Mat::operator(): index out of bounds.
I would not want to assign 1 to the first element of the vector, but could live with that if necessary.
PS: I would really like to know how to obtain the vector of unknown length by appending, so that I could use it even in more general cases.
Repeatedly appending elements to a vector is a really bad idea from a performance point of view, as it can cause repeated memory reallocations and copies.
There are two main solutions to that.
Set the size of the vector to the theoretical maximum length of your operation (nn in this case), and then use a loop to set some of the values in the vector. You will need to keep a separate counter for the number of set elements in the vector so far. After the loop, take a subvector of the vector, using the .head() function. The advantage here is that there will be only one copy.
An alternative solution is to use two loops, to reduce memory usage. In the first loop work out the final length of the vector. Then set the size of the vector to the final length. In the second loop set the elements in the vector. Obviously using two loops is less efficient than one loop, but it's likely that this is still going to be much faster than appending.
If you still want to be a lazy coder and inefficiently append elements, use the .insert_rows() function.
As a sidenote, your foo(arma::vec x) is already making an unnecessary copy the input vector. Arguments in C++ are by default passed by value, which basically means C++ will make a copy of x before running your function. To avoid this unnecessary copy, change your function to foo(const arma::vec& x), which means take a constant reference to x. The & is critical here.
In addition to mtall's answer, which i agree with,
for a case in which performance wasn't needed i used this:
void uvec_push(arma::uvec & v, unsigned int value) {
arma::uvec av(1);
av.at(0) = value;
v.insert_rows(v.n_rows, av.row(0));
}
I'm working on threads, however before I use threads I am to write 2 programs.
Set up an array and write a sequential program that accesses the whole of the array and performs some simple task on the contents.
Modify the program so that it is still sequential but accesses the array by a series of calls to a function. Each call to that function will process a number of rows of the array as defined by a parameter passed to the function.
I'm having problems understanding the questions, it seems so simple but yet I can't seem to get my head around it. I am to write the programs based on the above two questions before I start creating a program that will allow the processing to be carried out in one or more threads. Each thread should access a different set of rows of the array.
For the first question, the code I have written so far is
#include <iostream>
#include <stdio.h>
int main()
{
int array [20][20];
int i, j;
/* output each array element's value */
for ( i = 0; i < 20; i++ )
{
for ( j = 0; j < 20; j++ )
{
printf("a[%d][%d] = %d\n", i,j, array[i][j] );
}
}
system ("PAUSE");
return 0;
}
I want to know if the above program is a sequential program? I have run the program and it access the whole array and perform one tasks which is to print out all data in the arrays.
I researched on on-line what it means by sequential program and I found it means the following statement: perform task a before task b but not at the same time. Is this right?
For the second part I have done the following:
#include <iostream>
#include <stdio.h>
void print_array(int array[20][20]);
int main()
{
int array [20][20];
int i, j;
print_array(array);
system ("PAUSE");
return 0;
}
// Output data in an array
void print_array(int array)
{
int i, j;
for ( i = 0; i < 20; i++ )
{
for ( j = 0; j < 20; j++ )
{
printf("a[%d][%d] = %d\n", i,j, array[i][j] );
}
}
}
Am I going in the right direction? As I also got to write a version of the program that will allow the processing to be carried out in one or more threads.
EDIT: I am to use 2D arrays, sorry it wasn't clear above
I don't think you're going in the right direction, but you're not far off. What the instructions are asking for are some of the preliminary steps needed to take the work of processing an array sequentially and make it run in parallel. When writing a parallel program, it is often useful to start with a working sequential program and slowly transform it into a parallel program. Following the instructions is a way to do this.
Let's consider the parts of the question separately:
Set up an array and write a sequential program that accesses the whole of the array and performs some simple task on the contents.
The simple task that you chose for your array is to print the contents, but this isn't a suitable task, because it has no functional result. A more suitable task would be the sum the elements in the array. Other tasks might be count the elements that meet some condition, or to multiple each element by two.
So, first try to modify your initial program to sum the elements instead of printing them.
(In your code you are using a two-dimensional array. I would suggest using a 1-dimensional array for simplicity.)
Modify the program so that it is still sequential but accesses the array by a series of calls to a function. Each call to that function will process a number of rows of the array as defined by a parameter passed to the function.
In this part what you are trying to do is break up the functionality into small pieces of work. (Eventually you will send these units of work to threads for processing, but you are just doing the preliminary steps now.) If we did a sum in part 1, then here you might write a function which is int sumKitems(int *array, int startIndex, int numItems). The main program would then call this on each set of (say) 10 items in the array, and combine the full results by summing the results from each sumKitems call.
So, if there are 100 items in the array, you could make 10 calls to sumKitems(...), telling the function to process 0...9, 10...19, ..., 90...99. This would be in place of doing the sum on all 100 items individually.
--
To summarize, part one would be a simple for loop, not too differently from what you've written, just with some computation being performed.
The second part should do exactly the same computation and return exactly the same result, just using another function call which handles k items at time. (If you pass the number of items to handle at a time as a parameter you will be able to balance the cost of communication vs work being done when moving to a threaded implementation.)
In the end, you will probably be asked to replace the call to sumKitems(...) with a queue that sends work to the threads to do independently.
I believe that if you are not creating separate threads in any way then you are in fact writing a sequential program. There is no part of your code where you jump into a new thread to do some operation while the main thread does something else.
Summary: your code runs on a single thread - it is sequential
You must pass the array not as an integer but as a double pointer -> int array[][] or int** array
To perform operations on the array sequentially would be to start at the first place in the array and increment through the array performing operations as you go
array[10] = {0,1,2,3,4,5,6,7,8,9}
iterate through array with some action such as adding +5
array [10] = {5,6....}
To make this multi-threaded you need to have different threads operate on different segments of the array such as places 0-4,5-9 and perform the action so it can be done in less time. If you're doing it this way you will not need to worry about mutexes.
So Thread one increments through array[10] {0,1,2,3}
Thread two increments through array[10] {4,5,6,7}
Each increment one place at a time and both threads run concurrently