c++ cplex access current solution to add constraint - c++

i am trying to solve a LP-model in CPLEX using C++ and Concert Technology.
I want to implement constraints (the subtour elimination constraints, to be more specific) that needs to query the value of two of my variables in the current solution:
The variable array xvar is indicating the edges, yvar is representing the nodes.
I implement these constraints by solving n (= number of nodes) Min-Cut-Problems on a modified graph, which is constructed by adding an artificial source and an artifical sink and connect these to every node of the original graph.
From what i've read so far, do i need a lazy constraint or a callback or none of this?
This is where i create the model and get it solved, access the values of the variables in the solution etc:
// Step 2: Construct the necessary CPLEX objects and the LP model
IloCplex solver(env);
std::cout<< "Original Graph g: " <<std::endl;
std::cout<< net.g() <<std::endl;
MCFModel model(env, net);
// Step 3: Load the model into cplex and solve
solver.extract(model);
solver.solve();
// Step 4: Extract the solution from the solver
if(solver.getStatus() != IloAlgorithm::Optimal) throw "Could not solve to optimality!";
IloNumArray xsol ( env, net.g().nEdges() );
IloNumArray ysol ( env, net.g().nNodes() );
IloNumArray rsol ( env, net.g().nGroups() );
IloNumArray wisol ( env, net.g().nGroups() );
IloNum ksol;
NumMatrix wsol ( env, net.g().nGroups());
for(IloInt i = 0; i < net.g().nGroups(); i++){
wsol[i] = IloNumArray( env, net.g().nGroups() );
}
solver.getValues(xsol, model.xvar());
solver.getValues(ysol, model.yvar());
solver.getValues(rsol, model.rvar());
solver.getValues(wisol, model.wivar());
ksol=solver.getValue(model.kvar());
for (IloInt i = 0; i < net.g().nGroups(); i++){
wsol[i] = wisol;
}
// Step 5: Print the solution.
The constraint, i need the current values of the variables xvar and yvar for, is created here:
//build subset constraint y(S) -x(E(S))>= y_i
void MCFModel::buildSubsetCons(){
IloExpr lhs(m_env);
IloCplex cplex(m_env);
IloNumArray xtemp ( m_env, m_net.g().nEdges() );
IloNumArray ytemp ( m_env, m_net.g().nNodes() );
std::vector<Edge> mg_twin;
std::vector<int> mg_weights;
int mg_s;
int mg_t;
SGraph mgraph;
std::vector<int> f;
int nOrigEdges = m_net.g().nEdges();
int nOrigNodes = m_net.g().nNodes();
cplex.getValues(xtemp, m_xvar);
cplex.getValues(ytemp, m_yvar);
mgraph = m_net.g().mod_graph();
mg_s = mgraph.nNodes()-1;
mg_t = mgraph.nNodes();
std::cout<<"modified graph:"<<std::endl;
std::cout<<mgraph<<std::endl;
// fill the weight of original edges with 1/2*x_e
foreach_edge(e, m_net.g()){
mg_weights.push_back((xtemp[e->idx()])/2);
}
// fill the weight of the edges from artificial source with zero
for(int i=0; i<m_net.g().nNodes(); i++){
mg_weights.push_back(0);
}
// fill the weight of the edges to artificial sink with f(i)
// first step: calculate f(i):
//f.resize(m_net.g().nNodes());
foreach_node(i, m_net.g()){
foreach_adj_edge(e, i, m_net.g()){
f[i] = f[i] + xtemp[e->idx()];
}
f[i] = (-1)*f[i]/2;
f[i] = f[i] + ytemp[i];
}
// second step: fill the weights vector with it
for(int i=0; i<m_net.g().nNodes(); i++){
mg_weights.push_back(f[i]);
}
// calculate the big M = abs(sum_(i in N) f(i))
int M;
foreach_node(i, m_net.g()){
M = M + abs(f[i]);
}
// Build the twin vector of the not artificial edges for mgraph
mg_twin.resize(2*nOrigEdges + 2*nOrigNodes);
for(int i=0; i < nOrigEdges ; ++i){
mg_twin[i] = mgraph.edges()[nOrigEdges + i];
mg_twin[nOrigEdges + i] = mgraph.edges()[i];
}
//Start the PreflowPush for every node in the original graph
foreach_node(v, m_net.g()){
// "contract" the edge between s and v
// this equals to higher the weights of the edge (s,v) to a big value M
// weight of the edge from v to s lies in mg_weights[edges of original graph + index of node v]
mg_weights[m_net.g().nEdges() + v] = M;
//Start PreflowPush for v
PreflowPush<int> pp(mgraph, mg_twin, mg_weights, mg_s, mg_t);
std::cout << "Flowvalue modified graph: " << pp.minCut() << std::endl;
}
}
The Object pp is to solve the Min-Cut-Problem on the modified graph mgraph with artificial source and sink. The original graph is in m_net.g().
When i compile and run it, i get the following error:
terminate called after throwing an instance of 'IloCplex::Exception'
Aborted
It seems to me, that it is not possible to access the values of xvar and yvar like this?
I do appreciate any help since i am quite lost how to do this.
Thank you very much!!

Two things...
I. I strongly suggest you to use a try-catch to better understand CPLEX Exceptions. You could perhaps understand the nature of the exception like this. As a matter of fact, I suggest you a try-catch-catch setting, sort of:
try {
//... YOUR CODE ...//
}
catch(IloException& e) {
cerr << "CPLEX found the following exception: " << e << endl;
e.end();
}
catch(...) {
cerr << "The following unknown exception was found: " << endl;
}
II. The only way to interact with CPLEX during the optimization process is via a Callback, and, in the case of Subtour Elimination Constraints (SECs) you will need to separate both integer and fractional SECs.
II.1 INTEGER: The first one is the easiest one, an O(n) routine would help you identify all the connected components of a node solution, then you could add the subsequent cuts to prevent this particular SEC from appearing in other nodes. You could either enforce your cuts locally, i.e. only on the current sub-tree, using the addLocal() function, or globally, i.e. on the entire Branch-and-Cut tree, using the add() function. In any case, ALWAYS remember to add .end() to terminate the cut container. Otherwise you WILL have serious memory leak issues, trust me with this, lol. This callback needs to be a done via a Lazy Constraint Callback (ILOLAZYCONSTRAINTCALLBACK)
II.2 FRACTIONAL: The second one is by far more complex. The easiest way to make it work is to use Professor Lysgaard's CVRPSEP library. It is nowadays most efficient way of computing capacity cuts, Multistar, generalized multistar, framed capacity, strengthened comb and hypotour cuts. Additionally, is rather easy to link with any existing code. The linkage also needs to be embedded on the solution process, hence, a callback is also required. In this case, it would be a User Cut Callback (ILOUSERCUTCALLBACK).
One is glad to be of service
Y

Related

C++ - How to keep a semi-random array sorted in a loop [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
I have a for loop in my code which generates an array of numbers each time like this:
1,2,3,4
and on next round of for:
2.001, 4.008, 1.002, 2.099
As you can see each element is close to a previous element but it has changed order. This loop runs thousands of time and I need to see how these values change but with them changing order it is impossible.
How to keep them sorted?
My Attempt:
1- I tried sorting them each time from the largest number to smallest with BubbleSort. This would work fine if elements were all increasing or decreasing. But not when some of them increase and some decrease.
2- I thought of a way to store the elements of the first round of the loop and compare them to the next round and change order of the elements so they have minimum change compared to the first round and so on with next rounds. But I couldn't write a working code to do it.
EDIT:
My code is a very large and complicated one and I'm sure copying it does nothing but adding to the confusion. But here is a sample code of what it looks like:
for(x=50;x=65;x+=0.01){
for(i=0;i<100;i++) w[i] = SomeCalculations(i);
output<<x<<" "<<w[1]<<endl;
output<<x<<" "<<w[2]<<endl;
...
}
You could perhaps monitor the differences (first-order derivative) for each eigenvalue. Then, you can continue "tracking" the eigenvalue even if it intersects by making educated guesses on whichever has the closest derivative(s).
For this, we need a distance function (or cost function). One such example:
double dist(Eigenvalue e1, Eigenvalue e2) {
x_dist = abs(e1.x - e2.x);
dx_dist = abs(e1.dx - e2.dx);
return x_dist + dx_dist; // example distance function
}
struct Eigenvalue {
int id; // unique identifier for eigenvalue
double x; // actual value
double dx; // first order derivative
}
We now match pairs of eigenvalues which have the least distance between each other:
void track_evals(std::vector<Eigenvalues>& evals,
const std::vector<Eigenvalues>& old_evals) {
// Loop through new eigenvalues (evals) and match with
// old eigenvalues (old_evals)
for (auto& e : evals) {
// Find closest match
auto old = std::min_element(old_evals.begin(), old_evals.end(),
[e](const Eigenvalue& a, const Eigenvalue& b) {
return dist(a, e) < dist(b, e); });
// Match by copying identifier
// You can use a dictionary or some other data structure,
// if you prefer
e.id = (*old).id;
}
}
Of course, for all this to work, you need to maintain the correct values of your Eigenvalues:
std::vector<Eigenvalue> evals;
std::vector<Eigenvalue> old_evals;
// Precompute eigenvalues
evals = SomeCalculations(0);
// Assign unique identifiers
int id = 0;
for (auto& e : evals) {
e.id = id++;
}
// Your for loop
for (int i = 1; i < 100; i++) {
// Save old eigenvalues
std::swap(old_evals, evals);
// Perform SomeCalculations() and update evals
evals = SomeCalculations(i);
// Also update derivatives (dx) for each of the evals!
auto old = old_evals.begin();
for (auto& e : evals) {
e.dx = e.x - (*old++).x;
}
// Track
track_evals(evals, old_evals);
// Sort evals into same order (if desired)
std::sort(evals.begin(), evals.end(),
[](Eigenvalue& a, Eigenvalue& b) { return a.id < b.id; });
}
This method is not foolproof. There may be collisions, in which case you might want to try more orders of derivatives or try to reduce the speed at which the eigenvalues change.
Use std::sort
std::sort(std::begin(w), std::end(w));
for (auto a : w) {
output << x << " "<< a << endl;
}

Comparing Values in a Single Vector

I'm working on a GA and seem to be having problems with the tournament selection. I think this is due to the fact that I'm not comparing what I want to compare (in terms of fitness values)
srand(static_cast <unsigned> (time(0)));
population Pop;
vector<population> popvector;
vector<population> survivors;
population *ptrP;
for (int i = 0; i <= 102; i++)
{
ptrP = new population;
ptrP->generatefit;
ptrP->findfit;
popvector.push_back(*ptrP);
//include finding the persons "overall". WIP
}
cout << "The fit values of the population are listed here: " << endl;
vector<population> ::iterator it; //iterator to print everything in the vector
for (it = popvector.begin(); it != popvector.end(); ++it)
{
it->printinfo();
}
unsigned seed = std::chrono::system_clock::now().time_since_epoch().count(); // generate a seed for the shuffle process of the vector.
cout << "Beggining selection process" << endl;
shuffle(popvector.begin(), popvector.end(), std::default_random_engine(seed));
//Shuffling done to randomize the parents I will be taking.
// I also want want to pick consecutive parents
for (int i = 0; i <= 102; i = i + 3)
{
if (popvector[i] >= popvector[i++]);
}
}
Now what I think my problem is, is that when im trying to compare the Overall values (Not found yet, working on how to properly model them to give me accurate Overall fitness values) I'm not comparing what I should be.
I'm thinking that once I find the persons "Overall" I should store it in a Float vector and proceed from there, but I'm unsure if this is the right way to proceed if I wish to create a new "parent" pool, since (I think) the "parent pool" has to be part of my population class.
Any feedback is appreciated.
srand(static_cast <unsigned> (time(0)));
This is useless: you're calling std::shuffle in a form not based on std::rand:
shuffle(popvector.begin(), popvector.end(), std::default_random_engine(seed));
If somewhere else in the program you need to generate random numbers, do it via functions / distributions / engines in random pseudo-random number generation library (do not use std::rand).
Also consider that, for debugging purpose, you should have a way to initialize the random engine with a fixed seed (debug needs repeatable results).
for (int i = 0; i <= 102; i++)
Do not use magic numbers.
Why 102? If it's the population size, store it in a constant / variable (populationSize?), document the variable use and "enjoy" the fact that when you need to change the value you haven't to remember the locations where it's used (just in this simple snippet there are two distinct use points).
Also consider that the population size is one of those parameters you need to change quite often in GA.
ptrP = new population;
ptrP->generatefit;
ptrP->findfit;
popvector.push_back(*ptrP);
Absolutely consider Sam Varshavchik's and paddy's remarks.
for (int i = 0; i <= 102; i = i + 3)
{
if (popvector[i] >= popvector[i++]);
// ...
Generally it's not a good practice to change the index variable inside the body of a for loop (in some languages, not C / C++, the loop variable is immutable within the scope of the loop body).
Here you also have an undefined behaviour:
popvector[i] >= popvector[i++]
is equivalent to
operator>=(popvector[i], popvector[i++])
The order that function parameters are evaluated is unspecified. So you may have:
auto a = popvector[i];
auto b = popvector[i++];
operator>=(a, b); // i.e. popvector[i] >= popvector[i]
or
auto b = popvector[i++];
auto a = popvector[i];
operator>=(a, b); // i.e. popvector[i + 1] >= popvector[i]
Both cases are wrong.
In the first case you're comparing the same elements and the expression is always true.
In the second case the comparison probably is the opposite of what you were thinking.
Take a look at:
Undefined behavior and sequence points
What are all the common undefined behaviours that a C++ programmer should know about?
and always compile source code with -Wall -Wextra (or their equivalent).
I'm not sure to correctly understand the role of the class population. It may be that the name is misleading.
Other questions / answers you could find interesting:
C++: "std::endl" vs "\n"
http://herbsutter.com/2013/05/13/gotw-2-solution-temporary-objects/ (the section about premature pessimization)

first value of a loop in c++ different for the others

I need to put the first value of a loop = 0, and then use a range to start the loop.
In MatLab this is possible : x = [0 -range:range] (range is a integer)
This will give a value of [0, -range, -range+1, -range+2, .... , range-1, range]
The problem is I need to do this in C++, I tried to do by an array and then put in like the value on the loop without success.
//After loading 2 images, put it into matrix values and then trying to compare each one.
for r=1:bRows
for c=1:bCols
rb=r*blockSize;
cb=c*blockSize;
%%for each block search in the near position(1.5 block size)
search=blockSize*1.5;
for dr= [0 -search:search] //Here's the problem.
for dc= [0 -search:search]
%%check if it is inside the image
if(rb+dr-blockSize+1>0 && rb+dr<=rows && cb+dc-blockSize+1>0 && cb+dc<=cols)
%compute the error and check if it is lower then the previous or not
block=I1(rb+dr-blockSize+1:rb+dr,cb+dc-blockSize+1:cb+dc,1);
TE=sum( sum( abs( block - cell2mat(B2(r,c)) ) ) );
if(TE<E)
M(r,c,:)=[dr dc]; %store the motion vector
Err(r,c,:)=TE; %store th error
E=TE;
end
end
end
end
%reset the error for the next search
E=255*blockSize^2;
end
end
C++ doesn't natively support ranges of the kind you know from MatLab, although external solutions are available, if somewhat of an overkill for your use case. However, C++ allows you to implement them easily (and efficiently) using the primitives provided by the language, such as for loops and resizable arrays. For example:
// Return a vector consisting of
// {0, -limit, -limit+1, ..., limit-1, limit}.
std::vector<int> build_range0(int limit)
{
std::vector<int> ret{0};
for (auto i = -limit; i <= limit; i++)
ret.push_back(i);
return ret;
}
The resulting vector can be easily used for iteration:
for (int dr: build_range0(search)) {
for (int dc: build_range0(search)) {
if (rb + dr - blockSize + 1 > 0 && ...)
...
}
}
The above of course wastes some space to create a temporary vector, only to throw it away (which I suspect happens in your MatLab example as well). If you want to just iterate over the values, you will need to incorporate the loop such as the one in build_range0 directly in your function. This has the potential to reduce readability and introduce repetition. To keep the code maintainable, you can abstract the loop into a generic function that accepts a callback with the loop body:
// Call fn(0), fn(-limit), fn(-limit+1), ..., fn(limit-1), and fn(limit)
template<typename F>
void for_range0(int limit, F fn) {
fn(0);
for (auto i = -limit; i <= limit; i++)
fn(i);
}
The above function can be used to implement iteration by providing the loop body as an anonymous function:
for_range0(search, [&](int dr) {
for_range0(search, [&](int dc) {
if (rb + dr - blockSize + 1 > 0 && ...)
...
});
});
(Note that both anonymous functions capture enclosing variables by reference in order to be able to mutate them.)
Reading your comment, you could do something like this
for (int i = 0, bool zero = false; i < 5; i++)
{
cout << "hi" << endl;
if (zero)
{
i = 3;
zero = false;
}
}
This would start at it 0, then after doing what I want it to do, assign i the value 3, and then continue adding to it each iteration.

Large vector "Segmentation fault" error

I have gathered a large amount of extremely useful information from other peoples' questions and answers on SO, and have searched duly for an answer to this one as well. Unfortunately I have not found a solution to this problem.
The following function to generate a list of primes:
void genPrimes (std::vector<int>* primesPtr, int upperBound = 10)
{
std::ofstream log;
log.open("log.txt");
std::vector<int>& primesRef = *primesPtr;
// Populate primes with non-neg reals
for (int i = 2; i <= upperBound; i++)
primesRef.push_back(i);
log << "Generated reals successfully." << std::endl;
log << primesRef.size() << std::endl;
// Eratosthenes sieve to remove non-primes
for (int i = 0; i < primesRef.size(); i++) {
if (primesRef[i] == 0) continue;
int jumpStart = primesRef[i];
for (int jump = jumpStart; jump < primesRef.size(); jump += jumpStart) {
if (primesRef[i+jump] == 0) continue;
primesRef[i+jump] = 0;
}
}
log << "Executed Eratosthenes Sieve successfully.\n";
for (int i = 0; i < primesRef.size(); i++) {
if (primesRef[i] == 0) {
primesRef.erase(primesRef.begin() + i);
i--;
}
}
log << "Cleaned list.\n";
log.close();
}
is called by:
const int SIZE = 500;
std::vector<int>* primes = new std::vector<int>[SIZE];
genPrimes(primes, SIZE);
This code works well. However, when I change the value of SIZE to a larger number (say, 500000), the compiler returns a "segmentation error." I'm not familiar enough with vectors to understand the problem. Any help is much appreciated.
You are accessing primesRef[i + jump] where i could be primesRef.size() - 1 and jump could be primesRef.size() - 1, leading to an out of bounds access.
It is happening with a 500 limit, it is just that you happen to not have any bad side effects from the out of bound access at the moment.
Also note that using a vector here is a bad choice as every erase will have to move all of the following entries in memory.
Are you sure you wanted to do
new std::vector<int> [500];
and not
new std::vector<int> (500);
In the latter case, you are specifying the size of the vector, whose location is available to you via the variable named 'primes'.
In the former, you are requesting space for 500 vectors, each sized to the default that the STL library wants.
That would be something like (on my system : 24*500 bytes). In the latter case, 500 length vector(only one vector) is what you are asking for.
EDIT: look at the usage - he needs just one vector.
std::vector& primesRef = *primesPtr;
The problem lies here:
// Populate primes with non-neg reals
for (int i = 2; i <= upperBound; i++)
primesRef.push_back(i);
You only have N-2 elements in your vector pushed back, but then try to access an element at N-1 (i+jump). The fact that it did not fail on 500 is just dumb luck that the memory being overwritten was not catastrophic.
This code works well. However, when I change the value of SIZE to a larger number (say, 500000), ...
That may blow your stack, and be to big allocated with it. You need dynamic memory allocation for all of the std::vector<int> instances you believe to need.
To achieve that, simply use a nested std::vetcor like this.
std::vector<std::vector<int>> primes(SIZE);
instead.
But to get straight on, I seriously doubt you need number of SIZE vector instances to store all of the prime numbers found, but just a single one initialized like this:
std::vector<int> primes(SIZE);

openMP slows down when passing from 2 to 4 threads doing binary searches in a custom container

I'm currently having a problem parallelizing a program in c++ using openMP. I am implementing a recommendation system with a user-based collaborative filtering method. To do that, I implemented a sparse_matrix class as a dictionary of dictionaries (where I mean a sort of python dictionary). In my case, since insertion is only done at the beginning of the algorithm when data is read from file, I implemented a dictionary as a std library vector of pair objects (key, value) with a flag that indicates if the vector is sorted. if the vector is sorted, a key is searched using binary searches. otherwise the vector is first sorted and then searched. Alternatively, it is possible to scan the dictionary's entries linearly for example in loops on all the keys of the dictionary. The relevant portion of the code that is causing problems is the following
void compute_predicted_ratings_omp (sparse_matrix &targets,
sparse_matrix &user_item_rating_matrix,
sparse_matrix &similarity_matrix,
int k_neighbors)
{
// Auxiliary private variables
int user, item;
double predicted_rating;
dictionary<int,double> target_vector, item_rating_vector, item_similarity_vector;
#pragma omp parallel shared(targets, user_item_rating_matrix, similarity_matrix)\
private(user, item, predicted_rating, target_vector, item_rating_vector, item_similarity_vector)
{
if (omp_get_thread_num() == 0)
std::cout << " - parallelized on " << omp_get_num_threads() << " threads: " << std::endl;
#pragma omp for schedule(dynamic, 1)
for (size_t iter_row = 0; iter_row < targets.nb_of_rows(); ++iter_row)
{
// Retrieve target user
user = targets.row(iter_row).get_key();
// Retrieve the user rating vector.
item_rating_vector = user_item_rating_matrix[user];
for (size_t iter_col = 0; iter_col < targets.row(iter_row).value().size(); ++iter_col)
{
// Retrieve target item
item = targets.row(iter_row).value().entry(iter_col).get_key();
// retrieve similarity vector associated to the target item
item_similarity_vector = similarity_matrix[item];
// Compute predicted rating
predicted_rating = predict_rating(item_rating_vector,
item_similarity_vector,
k_neighbors);
// Set result in targets
targets.row(iter_row).value().entry(iter_col).set_value(predicted_rating);
}
}
}
}
In this function I compute the predicted rating for a series of target pairs (user, item) (this is simply a weighted average). To do that, I do an outer loop on the target users (which are on the rows of the targets sparse matrix) and I retrieve the rating vector for the current user performing a binary search on the rows of the user_item_rating_matrix. Then, for each column in the current row (i.e. for each item) I retrieve another vector associated to the current item from the sparse matrix similarity_matrix. With these two vectors, I compute the prediction as a weighted average of their elements (on a subset of the items in common between the two vectors).
My problem is the following: I want to parallelize the outer loop using openMP. In the serial version, this functions takes around 3 secs. With openMP on 2 threads, it takes around 2 secs (which it is not bad since I still have some work imbalances in the outerloop). When using 4 threads, it takes 7 secs. I cannot understand what is the cause of this slowdown. Do you have any idea?
I have already thought about the problem and I share my considerations with you:
I access the sparse_matrices only in read mode. Since the matrices
are pre-sorted, all the binary searches should not modify the
matrices and no race-conditions should derive.
Various threads could access to the same vector of the sparse matrix at the same time. I read something about false sharing, but since I do not write in these vectors I think this should not be the reason of the slowdown.
The parallel version seems to work fine with two threads (even if the speedup is lower than expected).
No problem is observed with 4 threads for other choices of the parameters. In particular (cf. "Further details on predict_rating function" below), when I consider all the similar items for the weighted average and I scan the rating vector and search in the similarity vector (the opposite of what I normally do), the execution time scales well on 4 threads.
Further details on predict_rating function: This function works in the following way. The smallest between item_rating_vector and item_similarity_vector is scanned linearly and I do a binary search on the longest of the two. If the rating/similarity is positive, it is considered in the weighted average.
double predict_rating (dictionary<int, double> &item_rating_vector,
dictionary<int, double> &item_similarity_vector)
{
size_t size_item_rating_vector = item_rating_vector.size();
size_t size_item_similarity_vector = item_similarity_vector.size();
if (size_item_rating_vector == 0 || size_item_similarity_vector == 0)
return 0.0;
else
{
double s, r, sum_s = 0.0, sum_sr = 0.0;
int temp_item = 0;
if (size_item_rating_vector < size_item_similarity_vector)
{
// Scan item_rating_vector and search in item_similarity_vector
for (dictionary<int,double>::const_iterator iter = item_rating_vector.begin();
iter != item_rating_vector.end();
++iter)
{
// scan the rating vector forwards: iterate until the whole vector has
// been scanned.
temp_item = (*iter).get_key();
// Retrieve rating that user gave to temp_item (0.0 if not given)
try { s = item_similarity_vector[temp_item]; }
catch (const std::out_of_range &e) { s = 0.0; }
if (s > 0.0)
{
// temp_item is positively similar to the target item. consider it in the average
// Retrieve rating that the user gave to temp_item
r = (*iter).get_value();
// increment the sums
sum_s += s;
sum_sr += s * r;
}
}
}
else
{
// Scan item_similarity_vector and search in item_rating_vector
for (dictionary<int,double>::const_iterator iter = item_similarity_vector.begin();
iter != item_similarity_vector.end();
++iter)
{
// scan the rating vector forwards: iterate until the whole vector has
// been scanned.
temp_item = (*iter).get_key();
s = (*iter).get_value();
if (!(s > 0.0))
continue;
// Retrieve rating that user gave to temp_item (0.0 if not given)
try { r = item_rating_vector[temp_item]; }
catch (const std::out_of_range &e) { r = 0.0; }
if (r > 0.0)
{
// temp_item is positively similar to the target item: increment the sums
sum_s += s;
sum_sr += s * r;
}
}
}
if (sum_s > 0.0)
return sum_sr / sum_s;
else
return 0.0;
}
}
Further details on the hardware: I am running this program on a dell XPS15 with a quad-core i7 processor and 16Gb RAM. I execute the code on a linux virtualbox (I set the VM to use 4 processors and 4Gb RAM).
Thank in advance,
Pierpaolo
It appears you might have a false sharing problem with your targets variable. False sharing is when different threads frequently write to locations near each other (same cache line). By explicitly setting the schedule to dynamic with a chunk size of 1, you are telling OpenMP to only have each thread take tasks one element at a time, thus allowing different threads to work on data that may be near each other in targets.
I would recommend removing the schedule directive just to see how the default scheduler and chunk size do. Then I would try both static and dynamic schedules while varying the chunk size substantially. If your workload or hardware platform is unbalanced, dynamic will probably win, but I would still try static.
Well I found the solution to the problem myself: I post the explanation for the community. In the predict_rating function I used try/catch for handling out_of_range errors thrown by my dictionary structure when a key that is not contained in the dictionary is searched. I read on Are exceptions in C++ really slow that exception handling is computationally heavy in the case an exception is thrown. In my case, for each call of predict_rating I had multiple out_of_range error thrown and handled. I simply removed the try/catch block and wrote a function that searches in the dictionary and return a default value if that key does not exist. This modification produced a speedup of around 2000x and now the program scales well with respect to the number of threads even on the VM.
Thanks to all of you and if you have other suggestions don't hesitate!
Pierpaolo