Why my concurrentSkipListSet stucks while multi add? - java.util.concurrent

I want to test the performance of ConcurrentSkipListSet vs. ConcurrentLinkedQueue,so I make a test:
ConcurrentSkipListSet<Integer> concurrentSkipListSet=new ConcurrentSkipListSet<>((o1,o2)->{return 1;});
HashSet<Callable<Integer>> sets=new HashSet<>();
for(int i=0;i<1000;i++){
final int j=i;
sets.add(()->{concurrentSkipListSet.add(j);
System.out.println(j);
return null;
});
}
Long c=System.currentTimeMillis();
System.out.println(c);
ExecutorService service=Executors.newFixedThreadPool(10);
try {
service.invokeAll(sets);
}catch(Exception e){}
System.out.println(System.currentTimeMillis()-c);
I am so confused that the program stucks after sout about 20~50 j,and it won't finish in about an hour. If I change the i as i<10,it finished at 3 millis sometimes or stucks after sout about 4~5 j sometimes.
A newCachedThreadPool perfroms same as the newFixedThreadPool in IDEA and Eclipse.
Please help me to analyze it,3Q.
Now I think it is not the newCachedThreadPool's problem but that concurrentSkipListSet.add(j);
when I changed SkipList to LinkedQueue or a synchronized HashSet, it worked well and finished in 168 millis or 170 millis.
Please help me to analyze it,3Q.

The problem may be in the comparator you're supplying to the ConcurrentSkipListSet constructor. It's always returning 1 which may lead to some kind of infinite loop in ConcurrentSkipListSet implementation. You could use ConcurrentSkipListSet constructor with no parameters to use natural ordering for Integer.
Consider what's going on when you're always returning 1 from a comparator:
Suppose we have two object A and B. A sorting algorithm at some point may ask your comparator "is A greater then B?" calling compare(A, B). You return 1 which means that indeed A > B and B should precede A in sorted order. Then at some point there's a chance that the algorithm will ask "is B greater then A?" and your compare(B, A) also will return 1 which means B > A and A should precede B in sorted order.
You can see that this comparator behavior is completely inconsistent. For some algorithms this may lead to infinite loops. For instance an algorithms may endlessly swap a pair of elements.

Related

How do I calculate the time complexity of the following function?

Here is a recursive function. Which traverses a map of strings(multimap<string, string> graph). Checks the itr -> second (s_tmp) if the s_tmp is equal to the desired string(Exp), prints it (itr -> first) and the function is executed for that itr -> first again.
string findOriginalExp(string Exp){
cout<<"*****findOriginalExp Function*****"<<endl;
string str;
if(graph.empty()){
str ="map is empty";
}else{
for(auto itr=graph.begin();itr!=graph.end();itr++){
string s_tmp = itr->second;
string f_tmp = itr->first;
string nll = "null";
//s_tmp.compare(Exp) == 0
if(s_tmp == Exp){
if(f_tmp.compare(nll) == 0){
cout<< Exp <<" :is original experience.";
return Exp;
}else{
return findOriginalExp(itr->first);
}
}else{
str="No element is equal to Exp.";
}
}
}
return str;
}
There are no rules for stopping and it seems to be completely random. How is the time complexity of this function calculated?
I am not going to analyse your function but instead try to answer in a more general way. It seems like you are looking for an simple expression such as O(n) or O(n^2) for the complexity for your function. However, not always complexity is that simple to estimate.
In your case it strongly depends on what are the contents of graph and what the user passes as parameter.
As an analogy consider this function:
int foo(int x){
if (x == 0) return x;
if (x == 42) return foo(42);
if (x > 0) return foo(x-1);
return foo(x/2);
}
In the worst case it never returns to the caller. If we ignore x >= 42 then worst case complexity is O(n). This alone isn't that useful as information for the user. What I really need to know as user is:
Don't ever call it with x >= 42.
O(1) if x==0
O(x) if x>0
O(ln(x)) if x < 0
Now try to make similar considerations for your function. The easy case is when Exp is not in graph, in that case there is no recursion. I am almost sure that for the "right" input your function can be made to never return. Find out what cases those are and document them. In between you have cases that return after a finite number of steps. If you have no clue at all how to get your hands on them analytically you can always setup a benchmark and measure. Measuring the runtime for input sizes 10,50, 100,1000.. should be sufficient to distinguish between linear, quadratic and logarithmic dependence.
PS: Just a tip: Don't forget what the code is actually supposed to do and what time complexity is needed to solve that problem (often it is easier to discuss that in an abstract way rather than diving too deep into code). In the silly example above the whole function can be replaced by its equivalent int foo(int){ return 0; } which obviously has constant complexity and does not need to be any more complex than that.
This function takes a directed graph and a vertex in that graph and chases edges going into it backwards to find a vertex with no edge pointing into it. The operation of finding the vertex "behind" any given vertex takes O(n) string comparisons in n the number of k/v pairs in the graph (this is the for loop). It does this m times, where m is the length of the path it must follow (which it does through the recursion). Therefore, it has time complexity O(m * n) string comparisons in n the number of k/v pairs and m the length of the path.
Note that there's generally no such thing as "the" time complexity for just some function you see written in code. You have to define what variables you want to describe the time in terms of, and also the operations with which you want to measure the time. E.g. if we want to write this purely in terms of n the number of k/v pairs, you run into a problem, because if the graph contains a suitably placed cycle, the function doesn't terminate! If you further constrain the graph to be acyclic, then the maximum length of any path is constrained by m < n, and then you can also get that this function does O(n^2) string comparisons for an acyclic graph with n edges.
You should approximate the control flow of the recursive calling by using a recurrence relation. It's been like 30 years since I took college classes in Discrete Math, but generally you do like pseuocode, just enough to see how many calls there are. In some cases just counting how many are on the longest condition on the right hand side is useful, but you generally need to plug one expansion back in and from that derive a polynomial or power relationship.

in subset sum problem , when i am taking memo[sum][size] i have to add sum<0 but not in case of mem[size][sum] .i do not why . please explain

#include<bits/stdc++.h>
using namespace std;
int issubseset(vector<int> subset,int size,int sum,vector<vector<int>>&memo){
// if(sum<0)return 0;
if(sum==0) return 1;
if(size<0) return 0;
if(subset[size]>sum) issubseset(subset,size-1,sum,memo);
if(memo[size][sum]>=0) return memo[size][sum];
memo[size][sum] = issubseset(subset,size-1,sum-subset[size],memo)||issubseset(subset,size-1,sum,memo);
return memo[size][sum];
}
int main(){
vector<int> subset{3, 34, 4, 12, 5, 2};
int sum=9;
std::cout << subset.size() << std::endl;
vector<vector<int>> memo(subset.size(),vector<int>(sum+1,INT_MIN));
printf("%s",issubseset(subset,subset.size()-1,sum,memo)?"true":"false");
}
Question:
Given a set of non-negative integers, and a value sum, determine if there is a subset of the given set with sum equal to given sum.
When I am interchanging the memo 2d array from memo[size][sum] to memo[sum][size], I have to uncomment the the first line in issubseset function . If I am just changing the shape of memo it should not have any effect since the array will be filled as per recursion and I am already covering base cases. If memo[size][sum] can work without the if(sum<0) line, why can't memo[sum][size]?
Your code exhibits undefined behaviour thanks to sum being used as an index even though it is sometimes negative. This is true of the code you posted as well as the equivalent with the shape of memo changed.
To find out why this happens, we'll have to look closely at your code. I'll reproduce it here with a couple of helpful labels:
#include<bits/stdc++.h>
using namespace std;
int issubseset(vector<int> subset,int size,int sum,vector<vector<int>>&memo){
// (1)
if(sum==0) return 1;
if(size<0) return 0;
// (2)
if(subset[size]>sum) issubseset(subset,size-1,sum,memo);
// (3)
if(memo[size][sum]>=0) return memo[size][sum];
memo[size][sum] = issubseset(subset,size-1,sum-subset[size],memo)||issubseset(subset,size-1,sum,memo);
return memo[size][sum];
}
Now let's walk through the code, assuming sum is negative and size is non-negative. If we get to (3), we've encountered undefined behaviour.
The checks for base cases at (1) do not trigger in this case, so execution carries on.
Now we're at (2), which is a very important line. It is the last line before the potentially troublesome (3), so there's a lot riding on it. We had better be sure it doesn't let execution go to (3). Unfortunately, even without looking deeply, we can tell that it's not up to the task: there isn't any control flow in this line (aside from the branching for the if of course). There's no question about it now: execution will definitely go ahead to (3), resulting in undefined behaviour.
Thankfully the fix is easy. Add a return for the recursive call in (2):
// (2)
if(subset[size]>sum) return issubseset(subset,size-1,sum,memo);
This will prevent execution from continuing to (3) whenever sum is negative: since subset[size] is non-negative and sum is negative, subset[size] > sum will be true and the return path will be taken. I'll leave it to you to determine whether this is the correct thing to do for your given problem.
The same analysis holds when the shape of memo is changed. The fact that you only noticed a problem with one shape and not the other is luck of the draw, really. There is no "why", it just happens to be that way. Either version of the code could literally have done (or not done) anything else (we don't call it undefined behaviour for nothing). I'll avoid going on a tangent about best practices, but I will give one piece of advice: use .at() instead of [], at least until you've proven the code correct (and even then, keeping .at() around may not be a bad idea). .at() will check each index and will scream at you (throw an exception) if it is invalid. Unlike [], .at() will not silently break your code when given a bad index, making much nicer from a debugging standpoint.

Is there any elegant way of iterating through a list whose elements' positions can change?

I am currently running into a disgusting problem. Suppose there is a list aList of objects(whose type we call Object), and I want to iterate through it. Basically, the code would be like this:
for(int i = 0; i < aList.Size(); ++i)
{
aList[i].DoSth();
}
The difficult part here is, the DoSth() method could change the caller's position in the list! So two consequences could occur: first, the iteration might never be able to come to an end; second, some elements might be skipped (the iteration is not necessarily like above, since it might be a linked list). Of course, the first one is the major concern.
The problem must be solved with these constraints:
1) The possibility of doing position-exchanging operations cannot be excluded;
2) The position-exchanging operations can be delayed until the iteration finishes, if necessary and doable;
3) Since it happens quite often, the iteration can be modified only minimally (so actions like creating a copy of the list is not recommended).
The language I'm using is C++, but I think there are similar problems in JAVA and C#, etc.
The following are what I've tried:
a) Try forbidding the position-exchanging operations during the iteration. However, that involves too many client code files and it's just not practical to find and modify all of them.
b) Modify every single method(e.g., Method()) of Object that can change the position of itself and will be called by DoSth() directly or indirectly, in this way: first we can know that aList is doing the iteration, and we'll treat Method() accordingly. If the iteration is in progress, then we delay what Method() wants to do; otherwise, it does what it wants to right now. The question here is: what is the best (easy-to-use, yet efficient enough) way of delaying a function call here? The parameters of Method() could be rather complex. Moreover, this approach will involve quite a few functions, too!
c) Try modifying the iteration process. The real situation I encounter here is quite complex because it involves two layers of iterations: the first of them is a plain array iteration, while the second is a typical linked list iteration lying in a recursive function. The best I can do about the second layer of iteration for now, is to limit its iteration times and prevent the same element from being iterated more than once.
So I guess there could be some better way to tackle this problem? Maybe some awesome data structure will help?
Your question is a little light on detail, but from what you have written it seems that you are making the mistake of mixing concerns.
It is likely that your object can perform some action that causes it to either continue to exist or not. The decision that it should no longer exist is a separate concern to that of actually storing it in a container.
So let's split those concerns out:
#include <vector>
enum class ActionResult {
Dies,
Lives,
};
struct Object
{
ActionResult performAction();
};
using Container = std::vector<Object>;
void actions(Container& cont)
{
for (auto first = begin(cont), last = end(cont)
; first != last
; )
{
auto result = first->performAction();
switch(result)
{
case ActionResult::Dies:
first = cont.erase(first); // object wants to die so remove it
break;
case ActionResult::Lives: // object wants to live to continue
++first;
break;
}
}
}
If there are indeed only two results of the operation, lives and dies, then we could express this iteration idiomatically:
#include <algorithm>
// ...
void actions(Container& cont)
{
auto actionResultsInDeath = [](Object& o)
{
auto result = o.performAction();
return result == ActionResult::Dies;
};
cont.erase(remove_if(begin(cont), end(cont),
actionResultsInDeath),
end(cont));
}
Well, problem solved, at least in regard to the situation I'm interested in right now. In my situation, aList is really a linked list and the Object elements are accessed through pointers. If the size of aList is relatively small, then we have an elegant solution just like this:
Object::DoSthBig()
{
Object* pNext = GetNext();
if(pNext)
pNext->DoSthBig();
DoSth();
}
This has the underlying hypothesis that each pNext keeps being valid during the process. But if the element-deletion operation has already been dealt with discreetly, then everything is fine.
Of course, this is a very special example and is unable to be applied to other situations.

for(int i=0;i<myVector.size();++i) How many times is size() called?

If i have a myVector which is a STL vector and execute a loop like this:
for(int i=0;i<myVector.size();++i) { ... }
Does the C++ compiler play some trick to call size() only once, or it will be called size()+1 times?
I am little confused, can anyone help?
Logically, myVector.size() will be called each time the loop is iterated - or at least the compiler must produce code as if it's called each time.
If the optimizer can determine that the size of the vector will not change in the body of the loop, it could hoist the call to size() outside the loop. Note that usually, vector::size() is an inline that's just a simple difference between pointers to the end and beginning of the vector (or something similar - maybe a simple load of a member that keeps track of the number of elements).
So there's actually probably little reason for concern about what happens for vector::size().
Note that list::size() could be a different story - the C++03 standard permits it to be linear complexity (though I think this is rare, and the C++0x standard changes list::size() requirements to be constant complexity).
I'm assuming that the vector doesn't change size in the loop. If it does change size, it's impossible to tell without knowing how it changes size.
On the C++ abstract machine it will be called exactly size()+1 times. And on a concrete implementation it will have an observable behaviour equivalent to it having been called size()+1 times (this is called the as if rule).
This means that the compiler can choose to call it just once, because the observable behaviour is the same. In fact, by following the as if rule, if the body of the loop is empty, the compiler can even choose to not call it at all and just skip the whole thing altogether. The observable behaviour is the same, because making your code run faster is not considered different observable behaviour.
It will be called size + 1 times. Changing the size of the vector will affect the number of iterations.
It may be called once, may be called size+1 times, or it may never be called at all. Assuming that the vector size doesn't change, your program will behave as if it had been called size+1 times.
There are two optimizations at play here: first, std::vector::size() is probably inlined, so it may never be "called" at all in the traditional sense. Second, the compiler may determine that it evaluate size() only once, or perhaps never:
For example, this code might never evaluate std::vector::size():
for(int i = 0; i < myVector.size(); ++i) { ; }
Either of these loops might evaluate std::vector::size() only once:
for(int i = 0; i < myVector.size(); ++i) { std::cout << "Hello, world.\n"; }
for(int i = 0; i < myVector.size(); ++i) { sum += myVector[i]; }
While this loop might evaluate std::vector::size() many times:
for(int i = 0; i < myVector.size(); ++i) { ExternalFunction(&myVector); }
In the final analysis, the key questions are:
Why do you care?, and
How would you know?
Why do you care how many times size() is invoked? Are you trying to make your program go faster?
How would you even know? Since size() has no visible side-effects, how would you even know who many times it was called (or otherwise evaluated)?
It will be called size() + 1 times (it may be that the compiler can recognize it as invariant in the loop, but you shouldn't count on it)
It will be called until the condition is not falsified (size() could change each time for example). If size() remains constant, it's size() + 1 times.
From the MSDN page about for:
for ( init-expression ; cond-expression ; loop-expression )
statement
cond-expression
Before execution of each iteration of statement,
including the first iteration. statement is executed only if
cond-expression evaluates to true (nonzero). An expression that
evaluates to an integral type or a class type that has an unambiguous
conversion to an integral type. Normally used to test for
loop-termination criteria.
It will be called size + 1 times, as Ernest have mentioned. However, if you are sure that size is not changing, you can apply an optimisation and make your code look like this:
for (unsigned int i = 0, e = myVector.size (); i < e; ++i)
... in which case size () will be called only once.
Actually myVector.size() will be inlined, so there will be no call at all. Just comparing register value with memory location. Of course I am talking about release build.
In debug build it will be called size()+1 times.
EDIT: No doubt that there exists such compiler or STL implementation which cannot optimize myVector.size(). But the chances to face them are very low.

STL map insertion efficiency: [] vs. insert

There are two ways of map insertion:
m[key] = val;
Or
m.insert(make_pair(key, val));
My question is, which operation is faster?
People usually say the first one is slower, because the STL Standard at first 'insert' a default element if 'key' is not existing in map and then assign 'val' to the default element.
But I don't see the second way is better because of 'make_pair'. make_pair actually is a convenient way to make 'pair' compared to pair<T1, T2>(key, val). Anyway, both of them do two assignments, one is assigning 'key' to 'pair.first' and two is assigning 'val' to 'pair.second'. After pair is made, map inserts the element initialized by 'pair.second'.
So the first way is 1. 'default construct of typeof(val)' 2. assignment
the second way is 1. assignment 2. 'copy construct of typeof(val)'
Both accomplish different things.
m[key] = val;
Will insert a new key-value pair if the key doesn't exist already, or it will overwrite the old value mapped to the key if it already exists.
m.insert(make_pair(key, val));
Will only insert the pair if key doesn't exist yet, it will never overwrite the old value. So, choose accordingly to what you want to accomplish.
For the question what is more efficient: profile. :P Probably the first way I'd say though. The assignment (aka copy) is the case for both ways, so the only difference lies in construction. As we all know and should implement, a default construction should basically be a no-op, and thus be very efficient. A copy is exactly that - a copy. So in way one we get a "no-op" and a copy, and in way two we get two copies.
Edit: In the end, trust what your profiling tells you. My analysis was off like #Matthieu mentions in his comment, but that was my guessing. :)
Then, we have C++0x coming, and the double-copy on the second way will be naught, as the pair can simply be moved now. So in the end, I think it falls back on my first point: Use the right way to accomplish the thing you want to do.
On a lightly loaded system with plenty of memory, this code:
#include <map>
#include <iostream>
#include <ctime>
#include <string>
using namespace std;
typedef map <unsigned int,string> MapType;
const unsigned int NINSERTS = 1000000;
int main() {
MapType m1;
string s = "foobar";
clock_t t = clock();
for ( unsigned int i = 0; i < NINSERTS; i++ ) {
m1[i] = s;
}
cout << clock() - t << endl;
MapType m2;
t = clock();
for ( unsigned int i = 0; i < NINSERTS; i++ ) {
m2.insert( make_pair( i, s ) );
}
cout << clock() - t << endl;
}
produces:
1547
1453
or similar values on repeated runs. So insert is (in this case) marginally faster.
Performance wise I think they are mostly the same in general. There may be some exceptions for a map with large objects, in which case you should use [] or perhaps emplace which creates fewer temporary objects than 'insert'. See the discussion here for details.
You can, however, get a performance bump in special cases if you use the 'hint' function on the insert operator. So, looking at option 2 from here:
iterator insert (const_iterator position, const value_type& val);
the 'insert' operation can be reduced to constant time (from log(n)) if you give a good hint (often the case if you know you are adding things at the back of your map).
We have to refine the analysis by mentioning that the relative performance depends on the type(size) of the objects being copied as well.
I did a similar experiment (to nbt) with a map of (int -> set). I know it is a terrible thing to do, but, illustrative for this scenario. The "value", in this case a set of ints, has 20 elements in it.
I execute a million iterations of the []= Vs. insert operations and do RDTSC/iter-count.
[] = set | 10731 cycles
insert(make_pair<>) | 26100 cycles
It shows the magnitude of penalty added due to the copying. Of course, CPP11(move ctor's)
will change the picture.
My take on it:
Worth reminding that maps is a balanced binary tree, most of the modifications and checks take O(logN).
Depends really on the problem you are trying to solve.
1) if you just want to insert the value knowing that it is not there yet,
then [] would do two things:
a) check if the item is there or not
b) if it is not there will create pair and do what insert does (
double work of O( logN ) ), so I would use insert.
2) if you are not sure if it is there or not, then a) if you did check if the item is there by doing something like if( map.find( item ) == mp.end() ) couple of lines above somewhere, then use insert, because of double work [] would perform b) if you didn't check, then it depends, cause insert won't modify the value if it is there, [] will, otherwise they are equal
My answer is not on efficiency but on safety, which is relevant to choosing an insertion algorithm:
The [] and insert() calls would trigger destructors of the elements. This may have dangerous side effects if, say, your destructors have critical behaviors inside.
After such a hazard, I stopped relying on STL's implicit lazy insertion features and always use explicit checks if my objects have behaviors in their ctors/dtors.
See this question:
Destructor called on object when adding it to std::list