i just want to know if anyone of you knows whats faster,
L=[1,2,3,4,5], all_different(L). % needs use_module(library(clpfd)).
or
L=[1,2,3,4,5], is_set(L).
anyone knows? need the faster solution for my sudoku solver. thanks!
Use the predicate time/1 to measure the number of inferences and actual time taken to do the computation.
In your example you would do something like
time((L=[1,2,3,4,5], all_different(L))) vs. time((L=[1,2,3,4,5], is_set(L)))
Note that the time measured is up to the first success.
A distinction between all_different/1 and is_set/1 is that the former uses "constraint logic" and can impose a prospective restriction before the entries of a list are fully instantiated, such that failure occurs when the Prolog engine is compelled to unify or assign equal values to two of the list argument's entries.
We can illustrate the "constraint logic" of all_different with the following pair of queries:
?- length(L,5), all_different(L), L=[1,2,3,4,5].
L = [1, 2, 3, 4, 5].
?- length(L,5), all_different(L), L=[1,2,3,4,1].
false.
It is necessary to provide a proper list to all_different but not to have one of fully bound or "ground" entries. The above shows that all_different can prospectively impose a constraint on a list's entries.
Compare the results with is_set instead:
?- length(L,5), is_set(L), L=[1,2,3,4,5].
L = [1, 2, 3, 4, 5].
?- length(L,5), is_set(L), L=[1,2,3,4,1].
L = [1, 2, 3, 4, 1].
Once is_set succeeds, it cannot prevent future bindings that created equal entries.
So the predicate all_different relies on extra machinery in the constraint logic library to do what is_set cannot, and in most cases this extra machinery will add to the overhead. However in the simple way it was used in viktor's question, the extra machinery is not used very much. Checks are done on fully bound terms, not in a prospective manner, and the efficiency is comparable.
Related
I am supposed to write a rule in SWI Prolog, which takes a list of characters as input and then replaces each letter by a random other character with a probability of 0.01.
Example:
?- mutate([a,b,c,d,e,f,g],MutatedList).
MutatedList = [a,b,c,a,e,f,g].
Can anyone tell me how that could be implemented? I am totally clueless so far about how this could work out in Prolog.
Thanks to anyone who can help!^^
This is relatively easy. You can use maplist/3 to relate the elements of the lists in a pairwise way. (Take a look at some of my notes on maplist/3].
For each pair [InputItem,OutputItem] sampled from [InputList,OutputList], maplist/3 will call a predicate, call it choose(InputItem,OutputItem).
That predicate will relate InputItem to the same value, InputItem or to a randomly chosen character (atom of length 1), which can be generated by selecting it randomly from a list of characters. The choice on whether to perform mutation can be made using random_float/0 for example.
Of course, choose(InputItem,OutputItem) is not really a predicate (it is just called that way, both in name at runtime), as it does not behave "predicatly" at all, i.e. it will have different outcomes depending on the time of day. It's an Oracle getting information from a magic reservoir. But that's okay.
Now you are all set. Not more than 4 lines!
How can I construct a list of a list into one single list with interleaving sublists?
like recons([[1,2],[3,4]],X) will give X= [1,3,2,4]?
I have been trying hours and my code always gave me very strange results or infinite loop,
what I thinking was something like this:
recons([[A|R],REST],List):-
recons(R,REST),
append(A,[R|REST],List).
I know its completely wrong, but I don`t know how to fix this.
Instead of thinking about efficiency first, we can think about correctness first of all.
interleaving_join( [[]|X], Y):-
interleaving_join( X, Y).
that much is clear, but what else?
interleaving_join( [[H|T]|X], [H|Y]):-
append( X, [T], X2),
interleaving_join( X2, Y).
But when does it end? When there's nothing more there:
interleaving_join( [], []).
Indeed,
2 ?- interleaving_join([[1,2],[3,4]], Y).
Y = [1, 3, 2, 4] ;
false.
4 ?- interleaving_join([[1,4],[2,5],[3,6,7]], X).
X = [1, 2, 3, 4, 5, 6, 7] ;
false.
This assumes we only want to join the lists inside the list, whatever the elements are, like [[...],[...]] --> [...]. In particular, we don't care whether the elements might themselves be lists, or not.
It might sometimes be interesting to collect all the non-list elements in the inner lists, however deeply nested, into one list (without nesting structure). In fact such lists are actually trees, and that is known as flattening, or collecting the tree's fringe. It is a different problem.
I need to check if a std::set contains element/elements in a range. For example, if the set is a set<int> {1, 2, 4, 7, 8}, and given an int interval [3, 5] (inclusive with both endpoints), I need to know if it has elements in the set. In this case, return true. But if the interval is [5, 6], return false. The interval may be [4, 4], but not [5, 3].
Looks like I can use set::lower_bound, but I am not sure whether this is the correct approach. I also want to keep the complexity as low as possible. I believe using lower_bound is logarithmic, correct?
You can use lower_bound and upper_bound together. Your example of testing for elements between 3 and 5, inclusive, could be written as follows:
bool contains_elements_in_range = s.lower_bound(3) != s.upper_bound(5);
You can make the range inclusive or exclusive on either end by switching which function you are using (upper_bound or lower_bound):
s.upper_bound(2) != s.upper_bound(5); // Tests (2, 5]
s.lower_bound(3) != s.lower_bound(6); // Tests [3, 6)
s.upper_bound(2) != s.lower_bound(6); // Tests (2, 6)
Logarithmic time is the best you can achieve for this, since the set is sorted and you need to find an element in the sorted range, which requires a dichotomic search.
If you're certain that you're going to use a std::set, then I agree that its lower_bound method is the way to go. As you say, it will have logarithmic time complexity.
But depending what you're trying to do, your program's overall performance might be better if you use a sorted std::vector and the standalone std::lower_bound algorithm (std::lower_bound(v.begin(), v.end(), 3)). This is also logarithmic, but with a lower constant. (The downside, of course, is that inserting elements into a std::vector, and keeping it sorted, is usually much more expensive than inserting elements into a std::set.)
I'm translating R code to c++ and I'd like to find an equivalent (optimal) structure which would allow the same kind of operations than a data frame, but in c++.
The operations are :
add elements (rows)
remove elements (rows) from index
get the index of the lowest value
e.g. :
a <- data.frame(i = c(4, 9, 3, 1, 8, 2, 7, 10, 6, 6),
j = c(8, 8, 8, 4, 3, 9, 1, 4, 8, 9) ,
v = c(1.9, 18, 1.3, 17, 1.5, 14, 11, 1.4, 18, 2.0),
o = c(3, 3, 3, 3, 1, 2, 1, 2, 3, 3))
a[which.min(a$v), c('i', 'j')] # find lowest v value and get i,j value
a <- a[-which.min(a$v)] # remove row from index
a <- cbind(a, data.frame(i = 3, j = 9, v = 2, o = 2)) # add a row
As I'm using Rcpp, Rcpp::DataFrame might be an option (I don't know how I would which.min it however), but I guess it's quite slow for the task as these operations need to be repeated a lot and I don't need to ship it back to R.
EDIT:
Target. Just to make it clear the goal here is to gain speed. It is the obvious reason why one would translate code from R to C++ (there might be others, that's why I clarify). However, maintenance and easy implementation comes second.
More precision on the operations. The algorithm is: add lots of data to the array (multiple lines), then extract the lowest value and delete it. Repeat.
That's why I wouldn't go for a sorted vector, but instead always search the lowest data on demand as the array is updated (addition) frequently. I think it's faster, but maybe I'm wrong.
I think a vector of vectors should do what you want. You would need to implement the min-finding manually (two nested loops), which is the fastest you can do without adding overhead.
You can speed up the min-finding by keeping track of the position of the smallest element in each row along with the row.
This question is a bit stale, but I thought I would offer some general observations pertaining to this kind of task.
If you are keeping the collection of rows in an ordered state, which might be an assumption of your which.min strategy, the most difficult operation to support efficiently is row insert, if this is a common operation. You'd be hard pressed not to use a list<> data structure, with the likely consequence that which.min turns into a linear operation, since lists aren't great for bisection search.
If you keep an unordered collection, you can deal with deletes by copying records off the end of the frame to the row vacated by the deletion, and subtracting 1 from your row count. Alternatively, you can just flag deletions with another vector of bool, until the delete count hits a threshold such as sqrt(N), at which time you perform a coalescence copy. You'll come out better than amortized O(N^2) for insert/delete, but which.min will be a linear search through the entire vector each time.
The normal thing to do when you need to identify the min/max element as a common operation is to employ a priority queue of some kind, sometimes duplicating the data for the column indexed. Off the top of my head, it would be tricky to synchronize a priority queue over one data column with rows of a data frame that are moving around as a result of delete operations in the non-list implementation.
If the rows are merely marked as deleted, the priority queue would stay in sync (though you would have to discard elements popped off the queue corresponding to subsequently deleted rows until you get a good one); after the coalescence copy, you would re-index the priority queue, which is pretty fast if you're not doing it too often. Usually if you had enough memory to grow the structure to a large size, you're not overly pressed to give the memory back again if the structure shrinks; it's not obvious you ever need to coalesce if your structure tends to persist at a size anywhere near the high water mark, but beware of the case where your priority queue has both expired and fresh references to the same storage row, because you wrote new data to a row previously deleted. For efficiency, sometimes you end up using an auxiliary list to keep track of rows marked deleted so you can find storage for inserted rows in less than linear time.
It can be hard to extract stale items from the bowels of a priority queue, since these tend to be designed only for removal at the top of the queue; often you have to leave the stale items in there and arrange to ignore them if they surface at some later time.
When you get into C++ with performance objectives, there are many ways to skin the cat, and you need to be far more precise about performance trade-offs than what the original R code expressed to obtain good execution time for all required operations.
A data.frame is really just a list of vectors. In C++, we really only have those lists of vectors which makes adding rows hard.
Idem for removing rows -- as Rcpp works on the original R representation you always need copy all remaining values.
As for the which.min() equivalent: I think that come up once on the list and you can do something simple with STL idioms. I don't recall us having that in the API.
An R data frame in C++ terms is a container of objects (An R matrix might be a vector of vectors, but if you care about efficiency you are unlikely to implement it that way.)
So, represent the data frame with this class:
class A{
public:
int i,j,o;
double v;
public:
A(int i_,int j_,int v_,int o_):i(i_),j(j_),v(v_),o(o_){}
}
And prepare this algorithm parameter function to help find the minimum:
bool comp(A &x,A &y){
return x.v<y.v;
}
(In production code I'd more likely use a functor object (see Meyer's Effective STL, item 46), or boost lambda or, best of all, C++0x's lambdas)
And then this code body:
std::vector<A> a;
a.push_back(A(4,8,1.9,3));
a.push_back(A(9,8,18,3));
a.push_back(A(1,4,1.3,3));
//...
std::vector<A>::iterator lowest=std::min_element(a.begin(),a.end(),comp);
std::cout<< lowest->i << ',' << lowest->j <<"\n";
a.erase(lowest);
a.push_back( A(3,9,2,0) );
Depending on what you are really doing, it may be more efficient to sort a first, greatest first. Then if you wish to erase the lowest item(s) you simply truncate the vector.
If you are actually deleting all over the place, and which.min() was just for the sake of example, you may find a linked list more efficient.
No,
A data.frame is a bit more complex than a vector of vector.
I might say the simpliest design for speed in all case is to store each columns in a typed vector and create a list as header for rows. Then on top of it create an intrusive list.
Here we go, bear with me. The over-all goal is to return the max alignment between two lists. If there are more than one alignment with the same length it can just return the first.
With alignment I mean the elements two lists share, in correct order but not necessarily in order. 1,2,3 and 1,2,9,3; here 1,2,3 would be the longest alignment. Any who, know for the predicates that I already have defined.
align(Xs, Ys, [El | T]) :-append(_, [El | T1], Xs),append(_, [El | T2], Ys),align(T1, T2, T).
align(_Xs, _Ys, []).
Then I use the built-in predicate findall to get a a list of all the alignments between these lists? In this case it puts the biggest alignment first, but I'm not sure why.
findall(X,align([1,2,3],[1,2,9,3],X),L).
That would return the following;
L = [[1, 2, 3], [1, 2], [1, 3], [1], [2, 3], [2], [3], []].
That is correct, but now I need a predicate that combines these two and returns the biggest list in the list of lists.
Use the solution given in this answer.
You could also try to avoid using findall/3,
because you don't really need to build a list in order to find its largest element.
So you just need to find the largest item in the list?
Edit:
Ok, so a better answer is this:
If you care about performance, then you need to write your own predicate which scans the list keeping track of the largest item.
If you don't carte so much about performance and you just want it to work, you could just reverse sort it and then take the first item in the sorted list. The advantage of this is that by using a sort library predicate you should be able to implement it in a few lines.