to compare sub-items in vector - c++

Overview of problem : I am using std::vector to hold objects of Subject. Now this vector contains lots of objects( with lots I mean 10-20 objects at max) .
These objects have string member values like category and sub_category.
Both category and sub_category can have string which can be same of other objects's sub_category & category.
Issue: Now I want my std::vector to have only those objects whose's sub_category are unique. If category is not unique that's not a problem .
Secondly if we found 2 objects having same sub_category then we have to delete one of them from the vector. we will delete it based on some rules example
Rules for deleting are if
i) instance of Subject ->category = " Land " OR if category = "Jungle" then delete other duplicate object ,
ii) if above condition doesn't match then delete either of them.
I am wondering , how would I compare the sub-items from the vector . For example
I have class say Subject
class Subject
{
public :
// some constructors,
// functions to get ., set category and sub category
std::String get_sub_category()
std::string get_category();
private:
std::string category;
std::string sub_category;
}
I have vector which stores object of Subjects. example
vector<Subject> sub_vec;
Now what I want is to delete the object from vector that has same sub_category
I am not looking for source code buT i need a starting point,?
example
say
sub_vec[0] = Animal object that has sub_category Tiger
sub_vec [1] = Animal object with Lion as sub category
sub_vec[2] = Forest object with sub_category Tiger
so what I want is to based on some conditions(which I can do ) remove either Forest or Animal object containing Tiger.
But for that how would I do comparison?
Thanks everyone for the help. I have written the function and have checked it but I am sure there is a room for hell lot of improvement. May you guys please pin out out my pitfalls.
std::vector< Subject >copy_vector; // copy_vector conatins all the objects of SUbject with redundant sub_category
for( std::vector< Subject >::iterator ii = copy_vector.begin() ; ii != copy_vector.end() ; ++ii )
{
sub_category = ii->get_sub_category();
std::cout <<" sub_category-- in main for loop " << sub_category << std::endl ;
std::vector< Subject >::iterator it = ii+1;
while( it != copy_vector.end() )
{
std::cout <<" the size of copy _vector is = " << copy_vector.size() << std::endl ; // for debug purpose
if( it->get_sub_category() == sub_category )
{
std::cout <<" we got a match here" << std::endl ;
// since both are duplicate , we have to delete one of them. Rules for deleting are if
i) instance of Subject ->category = " Land " OR if category = "Jungle" then delete other duplicate object ,
ii) if above condition doesn't match then delete either of them.
if( ( it->get_category == "Land" ) || ( it->get_category == "Jungle" ) )
{
std::cout <<" we are deleting it reference value " << std::endl ;
it = copy_vector.erase(ii);
// increment the counter
++ii;
}
else if( ( ii->get_category == "Land" ) || ( ii->get_category == "Jungle" ) )
{
std::cout <<" we are deleting from copy_vector " << std::endl ;
it = copy_vector.erase(it);
}
else
{
std::cout <<" we are deleting from copy_vector when there is no match for rules " << std::endl ;
it = copy_vector.erase(it);
}
std::cout <<" the size of copy _vector is = " << copy_vector.size() << std::endl ;
}
else
{
std::cout <<" No Match" << std::endl;
// increase main iterator
if( it != copy_vector.end() )
{
++it;
}
}
}
}
//print value
for( std::vector< Subject >::iterator ii = copy_vector.begin() ; ii != copy_vector.end() ; ++ii )
{
std::cout <<" New list = " << ii->get_category <<" \t " << ii->get_sub_category() << std::endl;
}

One way to do it is by using remove_if. To check if an object has a duplicate sub_category you can use a function or functor that stores the subcategories it finds in a set or an unordered_map and the remove all objects where its sub_category already exists in the set/unordered_map.
Note, unordered_map is only available in c++11.

Your solution has time complexity O(n*n) but the problem can be solved with complexity O(n*log(n)) or even O(n).
First, let's define such category comparison function (if a category is "Land" or "Jungle" then it's greater then other categories):
bool CategoryLess(string sCategory1, string sCategory2){
return sCategory1 != "Land" && sCategory1 != "Jungle"
&& (sCategory2 == "Land" || sCategory2 == "Jungle");
}
Now iterate through the vector and store all found subcategories and corresponding Subjects in a std::unordered_map (or std::map in if you don't use C++11). If the subcategory is already in the map then replace corresponding Subject if the category of the already found Subject less then category of the new Subject:
unordered_map<string, Subject*> Subcategories;
for (int i=0; i<sub_vec.size(); ++i){
unordered_map<string, Subject*>::iterator
it = Subcategories.find(sub_vec[i].get_sub_category());
if (it != Subcategories.end()){
if (CategoryLess((*it)->get_category(), sub_vec[i].get_category())
it->second = &sub_vec[i];
}
else
Subcategories[sub_vec[i].get_sub_category()] = &sub_vec[i];
}
Now you have the map of all subcategories and corresponding Subjects.
If we found two or more Subjects with the same subcategory then the map contains a pointer to the Subject with greater category.
Now iterate sub_vec once more and delete Subjects if
Subcategories[sub_vec[i].get_sub_category()] != &sub_vec[i];
Time complexity:
If we use std::unordered_map then expected time complexity is O(n) for the both cycles (O(n*n) in worst case).
If we use std::map then time complexity is O(n*log(n)) for the both cycles.
(I didn't take into account time complexities of string comparison and vector.erase as irrelevant)
Please note than when you delete a Subject from the vector, the addresses of other Subjects can be changed. So you need to take care when compare pointers to Subjects (for example copy needed Subjects to another vector instead of deleting other Subjects from the vector). But it doesn't change the general idea of my solution.

You could try to use BOOST_FOREACH to iterate thru vector elements
I'm doing something similar like this :
BOOST_FOREACH( Subject f, sub_vec )
{
///TODO: do your filtering here
if(f.sub_category == "<bla bla>")
}
What I like about using BOOST_FOREACH is that it makes very readable code and when you are dealing with many vector elements and many filtering possibilities, then that is certainly a factor to consider

Either you should use a lambda expression or define a functional object.
An example with using a lambda expression
#include <vector>
#include <string>
#include <algorithm>
// ...
std:string tiger = "Tiger";
sub_vec.erase( std::remove_if( sub_vec.begin(), sub_vec.end(),
[&]( const Subject &s ) { return ( s.sub_category == tiger ); } ),
sub_vec.end() );
Take into account that the code above removes all obexts that have sub_category equal to "Tiger". If you need to remove only duplicates then at first you should find the first object of the sub category and then remove all other objects with the same subcategory. In this case the code could look as
#include <vector>
#include <string>
#include <algorithm>
// ...
std:string tiger = "Tiger";
auto equal_sb_category = [&]( const Subject &s ) { return ( s.sub_category == tiger ); };
auto it = std::find_if( sub_vec.begin(), sub_vec.end(), equal_sb_category );
if ( it != sub_vec.end() )
{
sub_vec.erase( std::remove_if( std::next( it ), sub_vec.end(), equal_sb_category ),
sub_vec.end() );
}

Related

Simple pivot table display in c++? (not full pivot table feature)

I am writing a program making the inventory of the computers of a school.
I have a class Equip with 2 string as variable members m_owner and m_type.
I have a vector<Equip> v filled with a bunch of objects Equip.
I am using a csv file as argument to my program to fill the vector.
Say v contains 5 objects :
Equip e1("LaboA", "laptop");
Equip e2("LobaA", "server");
Equip e3("HR", "printer");
Equip e4("LobaA", "laptop");
Equip e5("LobaC", "router");
I am trying trying to display the content of v in the form of a pivot table like on the following screenshot :
I have tried the following :
void matrix(const vector<Equip>& v)
{
vector<Equip> tmp = v;
sort(tmp.begin(), tmp.end());
vector<Equip>::iterator it = unique(tmp.begin(), tmp.end(), SameOwner());
tmp.resize(std::distance(tmp.begin(), it));
for(auto i = tmp.begin(); i != tmp.end(); ++i)
{
int j = count_if(v.begin(), v.end(), IsPrinter());
cout << i->getOwner() << " : " << j << endl;
}
}
I have all the methods written so the code compiles and executes but it does not do what I want.
I understand why but I don't understand how to get this right.
Can you guys give me hints on how to achieve that please? I am at a loss here.
Thank you,
J.

Sorting of two vectors separately?

I have to make a program which uses the following two vectors:-
vector<double> age;
vector<string> name;
I take their input separately. I have to make a function sort() such that it sorts name alphabetically and then reorganizes age accordingly to match name.
Please help!!
If you can group them within struct or equivalent, you may create an additional vector for indexes that you sort and use for indirection:
std::vector<double> ages = /**/;
std::vector<string> names = /**/;
// ages.size() == names.size()
std::vector<std::size_t> indexes(names.size());
std::iota(indexes.begin(), indexes.end(), 0u);
std::sort(indexes.begin(), indexes.end(), [&](std::size_t lhs, std::size_t rhs) {
return names[lhs] < names[rhs];
});
for (auto index : indexes) {
std::cout << names[index] << " has " << ages[index] << std::endl;
}
And with range-v3 you can do:
std::vector<double> ages = /**/;
std::vector<string> names = /**/;
auto zip = ranges::view::zip(names, ages);
ranges::sort(zip);
for (const auto& z : zip) {
std::cout << std::get<0>(z) << " " << std::get<1>(z) << std::endl;
}
Demo
If the sort function accepts both the vectors, the easiest way is to copy everything to std::set<std::pair<string,double>> which sorts first on name and then copy the sorted entries to the input vectors. If you can't use sets, you can use vector and sort yourself.
The reason is that sorting changes the order so you lose the link between the entries of both vectors. If you can't or won't use the combined set method, you need to make sure that the link is maintained in another way, probably via a temporary container with references.
Assuming you really need a function that takes two vectors and modifies them.
The sort function can be implemented as:
void sort ( vector<double>& ages, vector<string>& names)
{
if ( ages.size() != names.size() )
return;
std::map< string, double > helper_map;
for ( size_t id = 0; id < names.size(); ++id)
{
helper_map.emplace( names[id], ages[id] );
}
names.clear();
ages.clear();
for (const auto& helper : helper_map)
{
names.push_back( helper.first );
ages.push_back( helper.second );
}
}
Working example:
http://coliru.stacked-crooked.com/a/2457c832c0b612b2
However keep in mind that this problem should be solved using different approaches as pointed out in the comments. As homework those things don't always apply though.

How to remove/erase rows from a selection of a tree view

I want to remove selected rows from a treeview or the underlying model.
The following code snipped works, but I have no idea which function of which class I have to call to remove the selected elements.
std::vector<Gtk::TreeModel::Path> pathlist;
pathlist = get_selection()->get_selected_rows();
for ( std::vector<Gtk::TreeModel::Path>::iterator it = pathlist.begin(); it!=pathlist.end(); it++)
{
Gtk::TreeModel::iterator iter = get_model()->get_iter( *it );
Gtk::TreeModel::Row row = *iter;
int val;
std::string str;
row.get_value( 0, val );
row.get_value( 1, str );
std::cout << "val " << val << std::endl;
std::cout << "String:" << str << std::endl;
}
The above code works fine.
Now I want to delete the elements which are selected!
Attention: MULTIPLE selection is activated.
I understand that the main problem is MULTIPLE selection - if you get only one item then it's simple:
get_model()->erase(iter);
The problem is after that; the rest of iterators become invalid.
Do you have any unique ID for each row? If YES, then you can use that:
Store all IDs in container
go throu all items and delete the stored
something like that:
std::vector<Gtk::TreeModel::Path> pathlist;
pathlist = get_selection()->get_selected_rows();
std::set<int> IDs;
for ( std::vector<Gtk::TreeModel::Path>::iterator it = pathlist.begin(); it!=pathlist.end(); it++)
{
Gtk::TreeModel::iterator iter = get_model()->get_iter( *it );
iter->get_value(ID, id);
IDs.insert(id);
}
auto iter = get_model()->erase( get_model()->get_iter( *pathlist.begin() );
while (iter)
{
int id;
iter->get_value(ID, id);
if (IDs.find(id) != IDs.end()) {
iter = get_model()->erase( iter );
else
++iter;
}
Or something like that. Sorry, I don't remember whole API.
I assume that erasing row from model changes also PATH.

Deleting an object in a vector

I am trying to delete an element in a vector of Objects. The vector is filled with instances of Object and at some point, I want to remove a certain element in a vector not by index, but by the element itself.
A simple example would be:
std::vector< string > strVector;
strVector.push_back( "abc" );
strVector.push_back( "def" );
strVector.push_back( "ghi" ); // So strVector should contain "abc", "def", and "ghi"
How do I remove "ghi" from that vector? Note that I don't know where "ghi" is in that vector.
// Something like this. Assume strVector = [ "abc", "cba", "ccb", "bac", "aaa" ]
strVector.removeElement( "ccb" );
A more relevant example to what I a working on:
class MyClass {
std::vector< Object > myObjVector;
void main( ARGS ) {
for ( int i = 0; i < 10; i++ ) {
Object myObject = Object( );
myObjVector.push_back( myObject );
}
int j = getANumber( ); // j could be any number within the size of the vector
Object myOtherObject = myObjectVector.at( j );
// How do I erase myOtherObject (which is an object inside the vector) ?
removeFromVector( myOtherObject );
}
}
I hope the question's clear. Thanks in advance.
EDIT: I figured it out, thanks to all those who answered. The trick was to give the class something unique that identifies it (like a name or a tag, as long as they are guaranteed to be unique) then use the erase-remove idiom to remove the object from the array.
If your use-case has no duplicates, then you are better off using an std::set and using the std::set::erase which takes a value.
std::set< string > strSet;
strSet.insert( "abc" );
strSet.insert( "def" );
strSet.insert( "ghi" );
strSet.insert( "ccb" );
strSet.erase("ccb");
If you need to cope with duplicates, then you have to specify the desired behaviour of the removal. Should it remove one or all of the elements matching a value? Do you care about preserving the order of the remaining elements? If you require using a vector, then lokk at the erase-remove idiom. But note that std::vector::erase has linear time complexity, whereas the relevant variant of std::set::erase has logarithmic time complexity. And erase-remove would remove all elements equal to the given value.
Note: if you want to use an std::set for a user defined type, you must provide either a less-than bool operator<(const UserType&) const or a comparison function or functor, implementing strict weak ordering.
If you must use a vector, then use erase(remove()):
#include <algorithm>
#include <string>
#include <vector>
strVector.erase(std::remove(strVector.begin(), strVector.end(), "ghi"),
strVector.end());
this will remove all instances of "ghi" from strVector.
If the objects in the vector support equality, and that's the condition
for removal, then you can use:
v.erase( std::remove( v.begin(), v.end(), "ghi" ), v.end() );
Otherwise, you'll need remove_if, with a functional object (or lambda,
if you have C++11) which returns true if the element is to be removed.
#include <iostream>
#include <vector>
class Object
{
public:
Object(int n){secret_num = n;}
virtual ~Object(){}
int getSecretNum(){return secret_num;}
private:
int secret_num;
};
int main()
{
int index= -1;
Object *urobj = new Object(104);
std::vector<Object*> urvector;
for(int i = 0; i < 10; ++i)
{
Object *obj = new Object(i+1);
urvector.push_back(obj);
}
urvector.push_back(urobj);
for(int j = 0; j < urvector.size(); ++j)
{
Object *tmp = urvector.at(j);
std::cout << tmp->getSecretNum() << std::endl;
if(urobj == tmp)
index = j;
}
if(index == -1)
std::cout << " not match " << std::endl;
else
std::cout << " match " << index << std::endl;
return 0;
}

Efficient way to re-order a C++ map-based collection

I have a large(ish - >100K) collection mapping a user identifier (an int) to the count of different products that they've bought (also an int.) I need to re-organise the data as efficiently as possible to find how many users have different numbers of products. So for example, how many users have 1 product, how many users have two products etc.
I have acheived this by reversing the original data from a std::map into a std::multimap (where the key and value are simply reversed.) I can then pick out the number of users having N products using count(N) (although I also uniquely stored the values in a set so I could be sure of the exact number of values I was iterating over and their order)
Code looks like this:
// uc is a std::map<int, int> containing the original
// mapping of user identifier to the count of different
// products that they've bought.
std::set<int> uniqueCounts;
std::multimap<int, int> cu; // This maps count to user.
for ( map<int, int>::const_iterator it = uc.begin();
it != uc.end(); ++it )
{
cu.insert( std::pair<int, int>( it->second, it->first ) );
uniqueCounts.insert( it->second );
}
// Now write this out
for ( std::set<int>::const_iterator it = uniqueCounts.begin();
it != uniqueCounts.end(); ++it )
{
std::cout << "==> There are "
<< cu.count( *it ) << " users that have bought "
<< *it << " products(s)" << std::endl;
}
I just can't help feeling that this is not the most efficient way of doing this. Anyone know of a clever method of doing this?
I'm limited in that I can't use Boost or C++11 to do this.
Oh, also, in case anyone is wondering, this is neither homework, nor an interview question.
Assuming you know the maximum number of products that a single user could have bought, you might see better performance just using a vector to store the results of the operation. As it is you're going to need an allocation for pretty much every entry in the original map, which likely isn't the fastest option.
It would also cut down on the lookup overhead on a map, gain the benefits of memory locality, and replace the call to count on the multimap (which is not a constant time operation) with a constant time lookup of the vector.
So you could do something like this:
std::vector< int > uniqueCounts( MAX_PRODUCTS_PER_USER );
for ( map<int, int>::const_iterator it = uc.begin();
it != uc.end(); ++it )
{
uniqueCounts[ uc.second ]++;
}
// Now write this out
for ( int i = 0, std::vector< int >::const_iterator it = uniqueCounts.begin();
it != uniqueCounts.end(); ++it, ++i )
{
std::cout << "==> There are "
<< *it << " users that have bought "
<< i << " products(s)" << std::endl;
}
Even if you don't know the maximum number of products, it seems like you could just guess a maximum and adapt this code to increase the size of the vector if required. It's sure to result in less allocations than your original example anyway.
All this is assuming that you don't actually require the user ids after you've processed this data of course (and as pointed out in the comments below, that the number of products bought for each user is a relatively small & contiguous set. Otherwise you might be better off using a map in place of a vector - you'll still avoid calling the multimap::count function, but potentially lose some of the other benefits)
It depends on what you mean by "more efficient". First off, is this really a bottle neck? Sure, 100k entries is a lot, but if you only have to this every few minutes, it's ok if the algorithm takes a couple seconds.
The only area for improvement I see is memory usage. If this is a concern, you can skip the generation of the multimap and just keep a counter map around, something like this (beware, my C++ is a little rusty):
std::map<int, int> countFrequency; // count => how many customers with that count
for ( std::map<int, int>::const_iterator it = uc.begin();
it != uc.end(); ++it )
{
// If it->second is not yet in countFrequency,
// the default constructor initializes it to 0.
countFrequency[it->second] += 1;
}
// Now write this out
for ( std::map<int, int>::const_iterator it = countFrequency.begin();
it != countFrequency.end(); ++it )
{
std::cout << "==> There are "
<< it->second << " users that have bought "
<< it->first << " products(s)" << std::endl;
}
If a user is added and buys count items, you can update countFrequency with
countFrequency[count] += 1;
If an existing user goes from oldCount to newCount items, you can update countFrequency with
countFrequency[oldCount] -= 1;
countFrequency[newCount] += 1;
Now, just as an aside, I recommend using an unsigned int for count (unless there's a legitimate reason for negative counts) and typedef'ing a userID type, for added readability.
If you can, I would recommend keeping both pieces of data current all the time. In other words, I would maintain a second map which is mapping number of products bought to number of customers who bought that many products. This map contains the exact answer to your question if you maintain it. Each time a customer buys a product, let n be the number of products this customer has now bought. Subtract one from the value at key n-1. Add one to the value at key n. If the range of keys is small enough this could be an array instead of a map. Do you ever expect a single customer to buy hundreds of products?
Just for larks, here's a mixed approach that uses a vector if the data is smallish, and a map to cover the case where one user has bought a truly absurd number of products. I doubt you'll really need the latter in a store app, but a more general version of the problem might benefit from it.
typedef std::map<int, int> Map;
typedef Map::const_iterator It;
template <typename Container>
void get_counts(const Map &source, Container &dest) {
for (It it = source.begin(); it != source.end(); ++it) {
++dest[it->second];
}
}
template <typename Container>
void print_counts(Container &people, int max_count) {
for (int i = 0; i <= max_count; ++i) {
if contains(people, i) {
std::cout << "==> There are "
<< people[i] << " users that have bought "
<< i << " products(s)" << std::endl;
}
}
}
// As an alternative to this overloaded contains(), you could write
// an overloaded print_counts -- after all the one above is not an
// efficient way to iterate a sparsely-populated map.
// Or you might prefer a template function that visits
// each entry in the container, calling a specified functor to
// will print the output, and passing it the key and value.
// This is just the smallest point of customization I thought of.
bool contains(const Map &c, int key) {
return c.count(key);
}
bool contains(const std::vector<int, int> &c, int key) {
// also check 0 < key < c.size() for a more general-purpose function
return c[key];
}
void do_everything(const Map &uc) {
// first get the max product count
int max_count = 0;
for (It it = uc.begin(); it != uc.end(); ++it) {
max_count = max(max_count, it->second);
}
if (max_count > uc.size()) { // or some other threshold
Map counts;
get_counts(uc, counts);
print_counts(counts, max_count);
} else {
std::vector<int> counts(max_count+1);
get_counts(uc, counts);
print_counts(counts, max_count);
}
}
From here you could refactor, to create a class template CountReOrderer, which takes a template parameter telling it whether to use a vector or a map for the counts.