Finding the intersection of two vectors of strings - c++

I have two vectors of strings and want to find the strings which are present in both, filling a third vector with the common elements. EDIT: I've added the complete code listing with the respective output so that things are clear.
std::cout << "size " << m_HLTMap->size() << std::endl;
/// Vector to store the wanted, present and found triggers
std::vector<std::string> wantedTriggers;
wantedTriggers.push_back("L2_xe25");
wantedTriggers.push_back("L2_vtxbeamspot_FSTracks_L2Star_A");
std::vector<std::string> allTriggers;
// Push all the trigger names to a vector
std::map<std::string, int>::iterator itr = m_HLTMap->begin();
std::map<std::string, int>::iterator itrLast = m_HLTMap->end();
for(;itr!=itrLast;++itr)
{
allTriggers.push_back((*itr).first);
}; // End itr
/// Sort the list of trigger names and find the intersection
/// Build a typdef to make things clearer
std::vector<std::string>::iterator wFirst = wantedTriggers.begin();
std::vector<std::string>::iterator wLast = wantedTriggers.end();
std::vector<std::string>::iterator aFirst = allTriggers.begin();
std::vector<std::string>::iterator aLast = allTriggers.end();
std::vector<std::string> foundTriggers;
for(;aFirst!=aLast;++aFirst)
{
std::cout << "Found:" << (*aFirst) << std::endl;
};
std::vector<std::string>::iterator it;
std::sort(wFirst, wLast);
std::sort(aFirst, aLast);
std::set_intersection(wFirst, wLast, aFirst, aLast, back_inserter(foundTriggers));
std::cout << "Found this many triggers: " << foundTriggers.size() << std::endl;
for(it=foundTriggers.begin();it!=foundTriggers.end();++it)
{
std::cout << "Found in both" << (*it) << std::endl;
}; // End for intersection
The output is then
Here is the partial output, there are over 1000 elements in the vector so I didn't include the full output:
Found:L2_te1400
Found:L2_te1600
Found:L2_te600
Found:L2_trk16_Central_Tau_IDCalib
Found:L2_trk16_Fwd_Tau_IDCalib
Found:L2_trk29_Central_Tau_IDCalib
Found:L2_trk29_Fwd_Tau_IDCalib
Found:L2_trk9_Central_Tau_IDCalib
Found:L2_trk9_Fwd_Tau_IDCalib
Found:L2_vtxbeamspot_FSTracks_L2Star_A
Found:L2_vtxbeamspot_FSTracks_L2Star_B
Found:L2_vtxbeamspot_activeTE_L2Star_A_peb
Found:L2_vtxbeamspot_activeTE_L2Star_B_peb
Found:L2_vtxbeamspot_allTE_L2Star_A_peb
Found:L2_vtxbeamspot_allTE_L2Star_B_peb
Found:L2_xe25
Found:L2_xe35
Found:L2_xe40
Found:L2_xe45
Found:L2_xe45T
Found:L2_xe55
Found:L2_xe55T
Found:L2_xe55_LArNoiseBurst
Found:L2_xe65
Found:L2_xe65_tight
Found:L2_xe75
Found:L2_xe90
Found:L2_xe90_tight
Found:L2_xe_NoCut_allL1
Found:L2_xs15
Found:L2_xs30
Found:L2_xs45
Found:L2_xs50
Found:L2_xs60
Found:L2_xs65
Found:L2_zerobias_NoAlg
Found:L2_zerobias_Overlay_NoAlg
Found this many triggers: 0
Possible Reason
I am starting to think that the way in which I compile my code is to blame. I am currently compiling with ROOT (the physics data analysis framework) instead of doing a standalone compile. I get the feeling that it doesn't work all that well with the STL Algorithm library and that's the cause of the issue, especially given how many people seem to have the code working for them. I will try to do a stand-alone compilation and re-running.

Passing foundTriggers.begin(), with foundTriggers empty, as the output argument will not cause the output to be pushed onto foundTriggers. Instead, it will increment the iterator past the end of the vector without resizing it, randomly corrupting memory.
You want to use an insert iterator:
std::set_intersection(wFirst, wLast, aFirst, aLast,
std::back_inserter(foundTriggers));
UPDATE: As pointed out in the comments, the vector is resized to be at least large enough for the result, so your code should work. Note that you should use the iterator returned from set_intersection to indicate the end of the intersection - your code ignores it, so you will also iterate over the empty strings left at the end of the output.
Could you post a complete test case so that we can see whether the intersection is actually empty or not?

Your allTrigers vector is empty, afterall. You never reset itr to the beginning of the map when you're filling it.
EDIT:
Actually, you never reset aFirst:
for(;aFirst!=aLast;++aFirst)
{
std::cout << "Found:" << (*aFirst) << std::endl;
};
// here aFirst == aLast
std::vector<std::string>::iterator it;
std::sort(wFirst, wLast);
std::sort(aFirst, aLast); // **** sorting empty range ****
std::set_intersection(wFirst, wLast, aFirst, aLast, back_inserter(foundTrigger));
// ^^^^^^^^^^^^^^
// ***** empty range *****
I hope you can now see why it is good practice to narrow down the scope of your variables.

You never use the return value of set_intersection. In this case you could use it to resize foundIterators after set_intersection has returned, or as the upper limit of the for loop. Otherwise your code seems to work. Can we see a full compilable program and its actual output please?

Related

list<pair<float,float>> iterating through a list that holds pairs?

As a part of runtime analysis I've got a small game that after calculating every Frame puts a new element in this list:
typedef std::list<std::pair<float, float>> PairList;
PairList Frames; //in pair: index 0 = elapsed time, index 1 = frames
The txt file is later used to draw a graph.
I decided to use a list, because while playing I do not need to process data held in the list and I think lists are the fastest containers when it comes to only adding or deleting items. As a next step I want to write the frames in an external txt file.
void WriteStats(PairList &pairList)
{
// open a file in write mode.
std::ofstream outfile;
outfile.open("afile.dat");
PairList::iterator itBegin = pairList.begin();
PairList::iterator itEnd = pairList.end();
for (auto it = itBegin; it != itEnd; ++it)
{
outfile << *it.first << "\t" << *it.second;
}
outfile.close();
}
With normal lists the pointer to "it" should return the item right?
Except visual studio says pair<float, float>* does not have a member called first
How do I want to do it then, when access via my iterator does not work? Is it because I pass in the reference to the list?
*it.first is parsed as *(it.first).
You need (*it).first or, better yet it->first.
Or, even better yet use range for:
for (auto& elem : pairList)
{
float a = elem.first;
}
I decided to use a list, because [...] I think lists are the fastest containers when it comes to only adding or deleting items.
The first go-to container should be std::vector. In practice it will outperform std::list even on algorithms that on paper should be faster on std::list because of cache locality. So I would test your theory with a good-ol benchmarking if performance is a concern.
The issue is one of operator precedence. Specifically, the member access operator '.' has higher precedence than indirection '*' so *it.first is effectively parsed as...
*(it.first)
Hence the warning. Instead use...
it->first
Use a range-based for loop instead of messing with iterators:
void WriteStats(const PairList &pairList)
{
// open a file in write mode.
std::ofstream outfile("afile.dat");
for (const auto &elem : pairList) {
outfile << elem.first << "\t" << elem.second << '\n';
}
}

How to search in a map/multimap starting from specific position

I want to search in a map/multimap but not all of it. Instead I want to start in a specific position.
In the following example I want to find the two first numbers that sum b. And return their value.
multimap<int, int> a;
a.insert(make_pair(2, 0));
a.insert(make_pair(2, 1));
a.insert(make_pair(5, 2));
a.insert(make_pair(8, 3));
int b = 4;
for(auto it = a.begin(); it != a.end(); ++it) {
auto it2 = a.find(b - it->first); //Is there an equivalent that starts from "it+1"?
if(it2 != a.end()) {
cout << it->second << ", " << it2->second << endl;
break;
}
}
output:
0, 0
desired output:
0, 1
Is it possible to achieve specific position search in a map?
How to search in a map starting from specific position
You could use std::find. But this is not ideal, since it has linear complexity compared to logarithmic complexity of a map lookup. The interface of std::map doesn't support such operation for lookups.
If you need such operation, then you need to use another data structure. It should be possible to implement by augmenting a (balanced) search tree with a parent node pointer. The downside is of course increased memory use and constant overhead on operations that modify the tree structure.
not from the beginning to the end.
Map look ups do not start from "the beginning" of the range. They start from the root of the tree.
If you're using an ordered map (which it sounds like you are), then it already does binary search with std::find. This function returns an iterator type, so assuming you were looking for the value of some key x, then consider the following lines:
std::map<char,int> mymap;
mymap['x'] = 24;
std::map<char,int>::iterator itr = mymap.find('x');
std::cout << "x=" << itr->second << std::endl;
The reason your code wasn't compiling was likely because you tried to return a pair iterator, which won't exactly print to output all that well. Instead, calling itr->second allows you to retrieve the value associated with the desired key.

Why iterator is not dereferenced as an lvalue

Apologies if my question does not contain all relevant info. Please comment and I will amend accordingly.
I use CLion on Win7 with MinGW and gcc
I have been experimenting with circular buffers and came across boost::circular_buffer, but for the size of my project I want to use circular buffer by Pete Goodlife, which seems like a solid implementation in just one .hpp.
Note: I am aware of how to reduce boost dependecies thanks to Boost dependencies and bcp.
However, the following example with Pete's implementation does not behave as expected, i.e. the result to std::adjacent_difference(cbuf.begin(),cbuf.end(),df.begin()); comes out empty. I would like to understand why and possibly correct its behaviour.
Follows a MWE:
#include "circular.h"
#include <iostream>
#include <algorithm>
typedef circular_buffer<int> cbuf_type;
void print_cbuf_contents(cbuf_type &cbuf){
std::cout << "Printing cbuf size("
<<cbuf.size()<<"/"<<cbuf.capacity()<<") contents...\n";
for (size_t n = 0; n < cbuf.size(); ++n)
std::cout << " " << n << ": " << cbuf[n] << "\n";
if (!cbuf.empty()) {
std::cout << " front()=" << cbuf.front()
<< ", back()=" << cbuf.back() << "\n";
} else {
std::cout << " empty\n";
}
}
int main()
{
cbuf_type cbuf(5);
for (int n = 0; n < 3; ++n) cbuf.push_back(n);
print_cbuf_contents(cbuf);
cbuf_type df(5);
std::adjacent_difference(cbuf.begin(),cbuf.end(),df.begin());
print_cbuf_contents(df);
}
Which prints the following:
Printing cbuf size(3/5) contents...
0: 0
1: 1
2: 2
front()=0, back()=2
Printing cbuf size(0/5) contents...
empty
Unfortunately, being new to c++ I can’t figure out why the df.begin() iterator is not dereferenced as an lvalue.
I supsect the culprit is (or don't completely uderstand) the member call of the circular_buffer_iterator on line 72 in Pete's circular.h:
elem_type &operator*() { return (*buf_)[pos_]; }
Any help is very much appreciated.
The iterator you pass as the output iterator is dereferenced and treated as an lvalue, and most probably the data you expect is actually stored in the circular buffer's buffer.
The problem is, that apart from the actual storage buffer, most containers also contain some internal book-keeping state that has to be maintained. (for instance: how many elements is in the buffer, how much frees space is left etc).
Dereferencing and incrementing the container doesn't update the internal state, so the container does not "know" that new data has been added.
Consider the following code:
std::vector<int> v;
v.reserve(3);
auto i = v.begin();
*(i++) = 1; // this simply writes to memory
*(i++) = 2; // but doesn't update the internal
*(i++) = 3; // state of the vector
assert(v.size() == 0); // so the vector still "thinks" it's empty
Using push_back would work as expected:
std::vector<int> v;
v.reserve(3);
v.push_back(1); // adds to the storage AND updates internal state
v.push_back(2);
v.push_back(3);
assert(v.size() == 3); // so the vector "knows" it has 3 elements
In your case, you should use std::back_inserter, an iterator that calls "push_back" on a container every time it is dereferenced:
std::adjacent_difference(
cbuf.begin(), cbuf.end(),
std::back_inserter(df));
std::adjacent_difference writes to the result iterator. In your case, that result iterator points into df, which has a size of 0 and a capacity of 5. Those writes will be into the reserved memory of df, but will not change the size of the container, so size will still be 0, and the first 3 ints of the reserved container space will have your difference. In order to see the results, the container being written into must already have data stored in the slots being written to.
So to see the results you must put data into the circular buffer before the difference, then resize the container to the appropriate size (based in the iterator returned by adjacent_difference.

How to chain delete pairs from a vector in C++?

I have this text file where I am reading each line into a std::vector<std::pair>,
handgun bullets
bullets ore
bombs ore
turret bullets
The first item depends on the second item. And I am writing a delete function where, when the user inputs an item name, it deletes the pair containing the item as second item. Since there is a dependency relationship, the item depending on the deleted item should also be deleted since it is no longer usable. For example, if I delete ore, bullets and bombs can no longer be usable because ore is unavailable. Consequently, handgun and turret should also be removed since those pairs are dependent on bullets which is dependent on ore i.e. indirect dependency on ore. This chain should continue until all dependent pairs are deleted.
I tried to do this for the current example and came with the following pseudo code,
for vector_iterator_1 = vector.begin to vector.end
{
if user_input == vector_iterator_1->second
{
for vector_iterator_2 = vector.begin to vector.end
{
if vector_iterator_1->first == vector_iterator_2->second
{
delete pair_of_vector_iterator_2
}
}
delete pair_of_vector_iterator_1
}
}
Not a very good algorithm, but it explains what I intend to do. In the example, if I delete ore, then bullets and bombs gets deleted too. Subsequently, pairs depending on ore and bullets will also be deleted (bombs have no dependency). Since, there is only one single length chain (ore-->bullets), there is only one nested for loop to check for it. However, there may be zero or large number of dependencies in a single chain resulting in many or no nested for loops. So, this is not a very practical solution. How would I do this with a chain of dependencies of variable length? Please tell me. Thank you for your patience.
P. S. : If you didn't understand my question, please let me know.
One (naive) solution:
Create a queue of items-to-delete
Add in your first item (user-entered)
While(!empty(items-to-delete)) loop through your vector
Every time you find your current item as the second-item in your list, add the first-item to your queue and then delete that pair
Easy optimizations:
Ensure you never add an item to the queue twice (hash table/etc)
personally, I would just use the standard library for removal:
vector.erase(remove_if(vector.begin(), vector.end(), [](pair<string,string> pair){ return pair.second == "ore"; }));
remove_if() give you an iterator to the elements matching the criteria, so you could have a function that takes in a .second value to erase, and erases matching pairs while saving the .first values in those being erased. From there, you could loop until nothing is removed.
For your solution, it might be simpler to use find_if inside a loop, but either way, the standard library has some useful things you could use here.
I couldn't help myself to not write a solution using standard algorithms and data structures from the C++ standard library. I'm using a std::set to remember which objects we delete (I prefer it since it has log-access and does not contain duplicates). The algorithm is basically the same as the one proposed by #Beth Crane.
#include <iostream>
#include <vector>
#include <utility>
#include <algorithm>
#include <string>
#include <set>
int main()
{
std::vector<std::pair<std::string, std::string>> v
{ {"handgun", "bullets"},
{"bullets", "ore"},
{"bombs", "ore"},
{"turret", "bullets"}};
std::cout << "Initially: " << std::endl << std::endl;
for (auto && elem : v)
std::cout << elem.first << " " << elem.second << std::endl;
// let's remove "ore", this is our "queue"
std::set<std::string> to_remove{"bullets"}; // unique elements
while (!to_remove.empty()) // loop as long we still have elements to remove
{
// "pop" an element, then remove it via erase-remove idiom
// and a bit of lambdas
std::string obj = *to_remove.begin();
v.erase(
std::remove_if(v.begin(), v.end(),
[&to_remove](const std::pair<const std::string,
const std::string>& elem)->bool
{
// is it on the first position?
if (to_remove.find(elem.first) != to_remove.end())
{
return true;
}
// is it in the queue?
if (to_remove.find(elem.second) != to_remove.end())
{
// add the first element in the queue
to_remove.insert(elem.first);
return true;
}
return false;
}
),
v.end()
);
to_remove.erase(obj); // delete it from the queue once we're done with it
}
std::cout << std::endl << "Finally: " << std::endl << std::endl;
for (auto && elem : v)
std::cout << elem.first << " " << elem.second << std::endl;
}
#vsoftco I looked at Beth's answer and went off to try the solution. I did not see your code until I came back. On closer examination of your code, I see that we have done pretty much the same thing. Here's what I did,
std::string Node;
std::cout << "Enter Node to delete: ";
std::cin >> Node;
std::queue<std::string> Deleted_Nodes;
Deleted_Nodes.push(Node);
while(!Deleted_Nodes.empty())
{
std::vector<std::pair<std::string, std::string>>::iterator Current_Iterator = Pair_Vector.begin(), Temporary_Iterator;
while(Current_Iterator != Pair_Vector.end())
{
Temporary_Iterator = Current_Iterator;
Temporary_Iterator++;
if(Deleted_Nodes.front() == Current_Iterator->second)
{
Deleted_Nodes.push(Current_Iterator->first);
Pair_Vector.erase(Current_Iterator);
}
else if(Deleted_Nodes.front() == Current_Iterator->first)
{
Pair_Vector.erase(Current_Iterator);
}
Current_Iterator = Temporary_Iterator;
}
Deleted_Nodes.pop();
}
To answer your question in the comment of my question, that's what the else if statement is for. It's supposed to be a directed graph so it removes only next level elements in the chain. Higher level elements are not touched.
1 --> 2 --> 3 --> 4 --> 5
Remove 5: 1 --> 2 --> 3 --> 4
Remove 3: 1 --> 2 4 5
Remove 1: 2 3 4 5
Although my code is similar to yours, I am no expert in C++ (yet). Tell me if I made any mistakes or overlooked anything. Thanks. :-)

Erasing an element from a list container

I am having difficulty understanding why the code is behaving this way. First of all I have read the relevant answered material and still found the explanations abit advanced. So I'm wondering if some-one could explain this in a simple fashion.
Ok, so I am erasing elements from a list.
The list contains int elements that are both odd and even numbers. This part I understand.
Here is the code I originally wrote to remove the odd numbers from the list
for(list<int>::iterator i = lNo.begin(); i != lNo.end(); i++)
{
if(*i%2 == 0 )
{
lNo.erase(i);
}
else
{
cout << " " << *i;
}
}
With this code, the program simply does not compile, and I read a message stating that the program has to shut down.
The erase function works when I write this code:
for(list<int>::iterator i = lNo.begin(); i != lNo.end(); i++)
{
if(*i%2 == 0 )
{
i = lNo.erase(i);
}
else
{
cout << " " << *i;
}
}
I just need to uderstand why the program works when I code i = lNo.erase(i) and not with just lNo.erase(i)?
A simple concise answer would be much appreciated.
I know that different containers have different constraints, so which constraint did I violate with the original piece of code?.
As stated in the documentation, the erase function invalidates the iterator passed in. That means it cannot be used again. The loop cannot proceed with that iterator.
The documentation also states that it returns an iterator to the element that was after the erased one. That iterator is valid and can be used to proceed.
Note however that since it returns an iterator to the element after the one that was erased, there is no need to increment that to advance, or that element will not be checked for oddness. The loop should catter for that and only increment when no erasure was done.
Even your second code is incorrect.
The correct code should be this:
for(list<int>::iterator i = lNo.begin(); i != lNo.end(); /*NOTHING HERE*/ )
{
if(*i%2 == 0 )
{
i = lNo.erase(i);
}
else
{
cout << " " << *i;
++i; //INCREMENT HERE, not in the for loop
}
}
Note that erase() erases the item and returns the iterator to the next item. That means, you don't need to increment i in your code when you erase; instead you just need to update i with the returned value from erase.
You could use erase-remove idiom as:
lNo.erase(std::remove_if(lNo.begin(),
lNo.end(),
[](int i) { return i%2 == 0; }),
lNo.end());
Live demo
The thing is that you're using an iterator that doesn't expect the chaining of your list to be modified.
So when you're calling erase() on your list, the chaining is effectively modified and so your iterator isn't valid anymore. The i++ statement doesn't work anymore.
But, in the 2nd version, you re-assign your iterator to valid object that still have the chaining intact, so the i++ statement can still work.
In some framework, you have 2 kinds of iterators, the kind that do reflect immediately what's happening to the underlying dataset (here is what you're using), and the kind that doesn't change their chaining whatever happening to the underlying dataset (so you don't have to use the weird trick of the 2nd version).