Stata: Generate all possible pairs of elements in a list - combinations

I have a list of elements as a macro, and I would like to generate a macro that contains all possible pairs of these elements, separated by an ampersand. For example, with three elements az, by and cx:
local elementList az by cx
I would like to dynamically generate a macro pairList containing:
az&by az&cx by&cx
(The order of the two elements within a pair should not matter, so that either az&by or by&az should be in pairList but not both.)
It sounds fairly simple but I'm not sure how to do it elegantly. (In my case I have around ten elements to start with.)

I agree with Nick's recommendation of tuples for this task. The example below suggests a slightly more elegant version of the approach you gave.
local elementList a b c d e f g h i j k l
local pairList // initialize the list of pairs
local seenList // initialize the list of elements already seen
foreach first of local elementList {
foreach second of local seenList {
local pairList `pairList' `second'&`first'
}
local seenList `seenList' `first'
}
display "List of unique pairs: `pairList'"

I wasn't sure how to use a recursive algorithm like I found on other S.O. threads, so here is my "brute force" approach. Definitely not the most elegant but does th the job:
local elementList a b c d e f g h i j k l
local pairList // initialize the list of pairs
foreach first in `elementList' {
foreach second in `elementList' {
// only do something if the elements are not the same
if("`first'" != "`second'") {
local pair `first'&`second' // pair
local pairReverse `second'&`first' // pair in reverse order
// if pair (or its inverse) is not already in the list, add the pair
if(strpos("`pairList'","`pair'") == 0 & strpos("`pairList'","`pairReverse'") == 0 ) {
local pairList `pairList' `pair'
}
}
} // end of loop on second element
} // end of loop on first element
display "List of unique pairs: `pairList'"

Related

Most efficient algorithm for Two-sum problem (involving indices)

The problem statement is given an array and a given sum "T", find all the pairs of indices of the elements in the array which add up to T. Additional requirements/constraints:
Indexing starts from 0
The indices must be displayed with lower index first (Ex: 24, 30 instead of 30, 24)
The indices must be displayed in ascending order (Ex: if we find (1,3), (0,2) and (5,8) the output must be (0,2) (1,3) (5,8)
There can be duplicate elements in the array, which also have to be considered
Here's my code in C++, I used the hash-table approach using unordered_set:
void Twosum(vector <int> res, int T){
int temp; int ti = -1;
unordered_set<int> s;
vector <int> res2 = res; //Just a copy of the input vector
vector <tuple<int, int>> indices; //Result to be output
for (int i = 0; i < (int)res.size(); i++){
temp = T - res[i];
if (s.find(temp) != s.end()){
while(ti < (int)res.size()){ //While loop for finding all the instances of temp in the array,
//not part of the original hash-table algorithm, something I added
ti = find(res2.begin(), res2.end(), temp) - res2.begin();
//Here find() takes O(n) time which is an issue
res2[ti] = lim; //To remove that instance of temp so that new instances
//can be found in the while loop, here lim = 10^9
if(i <= ti) indices.push_back(make_tuple(i, ti));
else indices.push_back(make_tuple(ti, i));
}
}
s.insert(res[i]);
}
if(ti == -1)
{cout<<"-1 -1"; //if no indices were found
return;}
sort(indices.begin(), indices.end()); //sorting since unordered_set stores elements randomly
for(int i=0; i<(int)indices.size(); i++)
cout<<get<0>(indices[i])<<" "<<get<1>(indices[i])<<endl;
}
This has multiple issues:
firstly that while loop doesn't work as intended, instead it shows SIGABRT error (free(): invalid pointer). The ti index is also somehow going beyond the vector bounds, even though I have that check in the while loop.
Secondly the find() function works in O(n) time, which increases the overall complexity to O(n^2), which is causing my program to timeout during execution. However that function is required since we have to output indices.
Lastly this unordered-set implementation doesn't seem to work when there are many duplicate elements in the array (since sets only take unique elements), which is one of the main constraints of the problem. This makes me think we need some sort of hash function or hashmap to deal with the duplicates? I'm not sure...
All the different algorithms I've found for this on the internet have dealt with just printing the elements and not the indices, hence I've had no luck with this problem.
If any of you know an optimal algorithm for this while also satisfying the constraints and running under O(n) time, your help would be highly appreciated. Thank you in advance.
Here is a pseudo-code answering your question, using hash tables (or maps) and set. I let you translate this to cpp using adapted data structures (in this case, classic hashmaps and sets will do the job well).
Notations: we will denote A the array, n its length, and T the "sum".
// first we build a map element -> {set of indices corresponding to this element}
Let M be an empty map; // or hash map, or hash table, or dictionary
for i from 0 to n-1 do {
Let e = A[i];
if e is not a key of M then {
M[e] = new_set()
}
M[e].add(i)
}
// Now we iterate over the elements
for each key e of M do {
if T-e is a key of M then {
display_combinations(M[e], M[T-e]);
}
}
// The helper function display_combinations
function display_combinations(set1, set2) {
for each element e1 of set1 do {
for element e2 of set2 do {
if e1 < e2 then {
display "(e1, e2)";
} else if e1 > e2 then {
display "(e2, e1)";
}
}
}
}
As said in the comments, the complexity in the worst case of this algorithm is in O(n²). A way to see that we cannot go below this complexity is that the size of the output may be in O(n²), in the case where all elements of the array have the value T/2.
Edit: this pseudo code does not output the pairs in the order. Just store them in an array of pairs, and sort this array before displaying it. Same, I did not treat the case where a pair (i, i) may satisfy the requirement. You may have to consider it (just change e1 > e2 by e1 >= e2 in the last loop)

Invalid use of a type field or array element as a loop counter

In the following code, I have tried using a field variable (of class or record) or an array element directly as a loop counter, but this was illegal ("error: invalid index expression"). Is this simply because the loop counter must be a scalar variable?
class Cls {
var n: int;
}
proc main()
{
var x = new Cls( 100 );
var k: [1..10] int;
for x.n in 1..3 do // error: invalid index expression
writeln( x.n );
for k[1] in 1..3 do // error: invalid index expression
writeln( k[1] );
}
On the other hand, if I create a reference to x.n, it compiled successfully but x in the outer scope was not modified. Is this because a new loop variable named n is created in the for-loop? (which I'm afraid is almost the same as my another question...)
proc main()
{
var x = new Cls( 100 );
ref n = x.n;
for n in 1..3 do
writeln( n );
writeln( "x = ", x ); // x = {n = 100}
}
If a loop variable is created independently, I guess something like "var x.n = ..." might happen (internally) if I write for x.n in 1..3, which seems really invalid (because it means that I'm trying to declare a loop variable with a name x.n).
You're correct that this relates to your other question. As described there, Chapel's for-loops create new index variables to store the values yielded by the iterator expression(s), so a loop like for i in ... results in a new variable i being declared rather than using an existing variable or expression. If you think the error message should be improved to make this clearer, please consider suggesting a new wording in a GitHub issue.
Note that in addition to single variable names, a loop can also use tuples of index variables to capture the results of a zippered iteration or an iterand that yields tuple values. For instance, the values of the following zippered iteration can either be captured as scalar values i and j:
for (i,j) in zip(1..3, 2..6 by 2) do // store values in 'i' and 'j' respectively
writeln((i,j));
or as a single variable of tuple type:
for ij in zip(1..3, 2..6 by 2) do // store values in 2-tuple 'ij'
writeln(ij);
Similarly, when iterating over something that yields tuple values, such as a multidimensional index set, the results can be captured either as scalar values or tuples:
const D = {1..3, 0..2}; // declare a 2D rectangular domain
for (i,j) in D do // store indices in new scalars 'i' and 'j'
writeln((i,j));
for ij in D do // store indices in new 2-tuple 'ij'
writeln(ij);
More complex iterators that return larger, or nested tuples, can similarly be de-tupled or not in the declaration of the index variable(s).

how to get pairs of elements efficiently in a linked list (Haxe)

I have a list of objects and I would like to return each possible unique pair of objects within this list. Is the following the most efficient way to do that in Haxe?
for (elem1 in my_list)
{
for (elem2 in my_list)
{
if (elem1 == elem2)
{
break;
}
trace(elem1, elem2);
}
}
I would rather avoid the equality check if possible. The reason that I am not using arrays or vectors is that these lists will be added to/removed from very frequently and I have no need for index level access.
If you want to efficient (the less amount of iterations), you could loop like this:
for (i in 0 ... my_list.length-1) // loop to total minus 1
for (j in i+1 ... my_list.length) // start 1 further than i, loop to end
if (my_list[i] != my_list[j]) // not match
[my_list[i], my_list[j]]]; // make pair
Btw, it depends on the content if linked list or array actually faster, since this uses indexes now. You should test/measure it your case (don't assume anything if it's performance critic piece of code).
try online:
http://try.haxe.org/#2Ab3F

Vector of pairs to map

I have a little problem.
I have a vector of pairs patternOccurences. The pairs are <string,int>, where string is the pattern(name) and int the index where it appears. My problem is that patternOccurences has multiple pairs with the same .first(same pattern) but different int values.
For example: The vector has 10 entries. 5 of pattern "a" and 5 of pattern "b". all have different indices. Now i want to have a map (or something similar) so that i have a vector/list with each pattern(in my example "a" and "b") as a key and a vector of their indices as the value. The indices are in the different pairs in my vector of pairs and i want all indices for pattern "a" in a int vector as value for key "a".
I tried the following:
std::map<std::string,std::vector<int>> occ;
for(int i = 0;i<patternOccurences.size();i++){
if(occ.find(patternOccurences.at(i).first)==occ.end()){
occ[patternOccurences.at(i).first]=std::vector<int>(patternOccurences.at(i).second);
}
else{
occ[patternOccurences.at(i).first].push_back(patternOccurences.at(i).second);
}
}
patternOccurences is the vector of pairs and occ the desired map. First i check if there is already an entry for the string(pattern) and if not i create one with a vector as value. If there is already one I try to push_back the vector with the index. However it doesnt seem to be working right. For the first pattern i get a vector with 0 only as values and for the second there are only 3 indices which are right and the other ones are 0 as well.
I hope you can help me.
Kazoooie
You are calling the constructor for the vector in the wrong way:
std::vector<int>(patternOccurences.at(i).second);
This creates a vector with N default constructed elements, not a vector with one element with value N. You need:
std::vector<int>(1, patternOccurences.at(i).second);
That should fix the problem, but your code doesn't have to be that complicated. The following would work just fine:
for(int i = 0;i<patternOccurences.size();i++){
occ[patternOccurences.at(i).first].push_back(patternOccurences.at(i).second);
}
or with C++11, the even simpler:
for(auto& p:patternOccurences) {
occ[p.first].push_back(p.second);
}
What you are asking for already exists in STL and it's called std::multimap (and std::unordered_multimap).
Take a look here. Basically it's a map which allows more values to have the same key.
std::multimap<std::string, int> occ;
occ.insert(std::pair<std::string,int>("foo", 5));
occ.insert(std::pair<std::string,int>("foo", 10));
std::pair<std::multimap<std::string,int>::iterator, std::multimap<std::string,int>::iterator> group = occ.equal_range("foo");
std::multimap<std::string,int>::iterator it;
for (it = ret.first; it != ret.second; ++it) {
..
}
Change this statement
occ[patternOccurences.at(i).first]=std::vector<int>(patternOccurences.at(i).second);
to
occ[patternOccurences.at(i).first]=std::vector<int>(1, patternOccurences.at(i).second);

Traverse MultiMap to Find Path from a Given Value to a Given Key

Details:
I have a multimap implementation that represents the adjacency list for the subset of a graph.
I need to find a path through this subset of the graph, which is actually all the possible paths from a start node F to an end node G, acquired by running a breadth first search on the full graph.
Implementation Ideas:
The BFS quits once G is found, so you end up with G only in the values of the multimap. My idea is that if you start at the value G, get G's "key" (let's call it H), if H == F then we have our path. Else you continue and look for H as a value associated to another key (call it D), if D == F then we have our path... and at this point our path starting from F would look like F -> H -> G
Issues:
Will this scale? Since the map only contains the subset of possible paths from F to G, stopping at G, it shouldn't accidentally make a circular path or make duplicate keys. And if the subset is of cardinality n, then n would be the most amount of values for any given key, and therfore the number of edges you connect can never be more than n.
How Would You Code This??
I can think through the logic and the math involved but I don't understand the map library well enough yet to write it out myself. After reading the c++ reference I get the idea I may use the map methods upper/lowerbound but I can't find an example that supports that.
Turns out to be relatively trivial:
typedef multimap<int, int> MapType;
typedef MapType::const_iterator MapItr;
vector<int> path;
path.push_back(G);
int end = G; // we know G, so mark it
while ( end != F ) { // as long as mark is not F
// now loop through map searching for value that matches G
for (MapItr iter = pathMap.begin(); iter != pathMap.end(); iter++)
{
if (iter->second == end) { // once we find our marked value/vertex
path.push_back(iter->first); // push it's key onto the vector
end = iter->first; // and mark it's key for next iteration
// continue this until end reaches F
} // at which point will have our path
// from G to F
}
}
// avoid this step by using a container with a push_front method
reverse(path.begin(), path.end()); // reverse path
You can just loop through the entire map as
C++11
for(const auto& key_val: the_map)
{
std::cout<<key_val.first<<":"<<key_val.second<<std::endl;
}
Non C++11
for(the_map_type::const_iterator itr = the_map.begin(); itr != the_map.end();++itr)
{
std::cout<<itr->first<<":"<<itr->second<<std::endl;
}
the_map.lower_bound(key) will give you an iterator to the first element having the key key
the_map.upper_bound(key) will give you an iterator to the element one past any element with key key