I have to develop a component which
will have a more than
100,000 instances of a class. And i
want to generate a report based on the
different different criteria (members)
of the particular class. for example,
A employee class with data fields id,
names, addr, phoneno. Report
generation wiil be based on
names_ascending
names_descending
addr_ascending
phoneno_asceding
unique_names
unique_addr
unique_phoneno
runtime iteration of instances for each call is very slow since it is a linear operation on large number of instances and requires sorting mechanism.
So i have stored a pointers of each instances in a container on different sorted manner. But requires more memory than required. Please suggest me a better way of doing this. I have posted sample code snippet that i followed to achieve above.
class Employee
{
int m_id;
string m_name;
string m_addr;
string m_phone;
public:
Employee(int id, string name, string addr, string phone) :
m_id(id), m_name(name), m_addr(addr), m_phone(phone) { }
int id() const { return m_id; }
string name() const { return m_name; }
string addr() const { return m_addr; }
string phoneno() const { return m_phone; }
};
//custom predicate for std containers
struct IDComparator
{
bool operator() (const Employee* e1, const Employee* e2 )
{
return e1->id() < e2->id();
}
};
struct NameComparator
{
bool operator() (const Employee* e1, const Employee* e2 )
{
return e1->name() < e2->name();
}
}
struct AddressComparator
{
bool operator() (const Employee* e1, const Employee* e2 )
{
return e1->addr() < e2->addr();
}
};
struct PhoneComparator
{
bool operator() (const Employee* e1, const Employee* e2 )
{
return e1->phoneno() < e2->phoneno();
}
};
//Class which holds huge number of employee instances
class Dept
{
private:
typedef set<Employee*, IDComparator> EMPID; //unnique id
typedef EMPID::iterator EMPID_ITER;
typedef multiset<const Employee*, NameComparator> EMPNAME; // for sorted names
typedef EMPNAME::iterator NAME_ITER;
typedef multiset<const Employee*, AddressComparator> EMPADDR; // for sorted addr
typedef EMPADDR::iterator ADDR_ITER;
typedef multiset<const Employee*, PhoneComparator> EMPPHONE; // for sorted phoneno
typedef EMPPHONE::iterator PHONE_ITER;
private:
EMPID m_empids;
EMPNAME m_names ;
EMPADDR m_addr;
EMPPHONE m_phoneno;
public:
Dept() { }
~Dept() { //delete the instances of employees }
void add(Employee* e)
{
EMP_ITER iter = m_empids.insert(e).first;
const Employee* empptr = &*iter;
m_names.insert(empptr); // adds employee pointer to name multimap
m_addr.insert(empptr); // adds employee pointer to addr multimap
m_phoneno.insert(empptr); // adds employee pointer to phone multimap
}
void print_emp_dtls() const; //prints all the emp dtls iterating though EMPID
void print_unique_names() const; //iterate EMPNAME & use upperbound & lowerbound, prints unique names
void print_asc_name() const; //iterate EMPNAME & prints all names in ascending order
void print_desc_name() const; //back iterate EMPNAME & prints all names in descending order
void print_unique_adrr() const; //iterate EMPADDR & use upperbound & lowerbound, prints unique address
void print_asc_addr() const; //iterate EMPADDR & prints all addr in ascending order
void print_desc_addr() const; //back iterate EMPADDR & prints all address in descending order
void print_unique_phoneno() const; //iterate EMPPHONE & use upperbound & lowerbound,prints unique phoneno
void print_asc_phoneno() const; //iterate EMPPHONE & prints all phoneno in ascending order
void print_desc_phoneno() const; //back iterate EMPPHONE & prints all phoneno in };
Seems like a perfect candidate for Boost.MultiIndex :
The Boost Multi-index Containers
Library provides a class template
named multi_index_container which
enables the construction of containers
maintaining one or more indices with
different sorting and access
semantics.
I have used Boost.Multi_index successfully in the past. You might find it strange from a first look but in reality it is quit interesting library. Keep in mind when using it, that you don't provide "how" but "what" in your customized container. Assume that you have the following type:
struct user_t
{
string id, name, email;
int age;
friend ostream& operator<<(ostream& output_stream, const user_t& user)
{
return output_stream
<< user.id << " "
<< user.name << " "
<< user.age << " "
<< user.email << "\n";
}
friend istream& operator>>(istream& input_stream, user_t& user)
{
return input_stream >> user.id >> user.name >> user.age >> user.email;
}
};
What will happen is that you create one container, that holds the objects and as many indices as you want. Before we start lets define the tags of indices. The tags are simply tags! that you use to access your indices by name instead of by magical numbers:
struct by_id { };
struct by_name { };
struct by_age { };
struct by_email { };
Then we define our "data base" with the required indices:
typedef multi_index_container<
user_t,
indexed_by
<
ordered_unique<tag<by_id>, member<user_t, string, &user_t::id> >,
ordered_non_unique<tag<by_name>, member<user_t, string, &user_t::name> >,
ordered_non_unique<tag<by_age>, member<user_t, int, &user_t::age> >,
ordered_non_unique<tag<by_email>, member<user_t, string, &user_t::email> >
>
> user_db;
First thing is the type of elements in the container. Then, you say I want to index this container by the following:
indexed_by
<
ordered_unique<tag<by_id>, member<user_t, string, &user_t::id> >,
ordered_non_unique<tag<by_name>, member<user_t, string, &user_t::name> >,
ordered_non_unique<tag<by_age>, member<user_t, int, &user_t::age> >,
ordered_non_unique<tag<by_email>, member<user_t, string, &user_t::email> >
>
You just specify the type of index you want expose. There are various types actually, and it depends on the semantics of the data you have. It is good to give a tag for each index(the first parameter), and you specify you want to index the type by what through the second template parameter. There are various ways actually to choose the "key" of the data. The key is not required to be unique actually!
From now on, you just deal with user_db just like regular std::multi_set! with a small difference that makes the difference actually ;) Lets say you want to load serilaized users' information from a file, and reserlize ordered information according to the indecies we created:
user_db load_information()
{
ifstream info_file("information.txt");
user_db db;
user_t user;
while(info_file >> user)
db.insert(user);
return db;
}
template <typename index_t>
void save_information_by(ostream& output_stream, const index_t& index)
{
ostream_iterator<user_t> serializer(output_stream);
copy(index.begin(), index.end(), serializer);
}
int main()
{
ofstream
by_id_file("by_id.txt"),
by_name_file("by_name.txt"),
by_age_file("by_age.txt"),
by_email_file("by_email.txt");
user_db db = load_information();
// You see why we created the tags,
// if we didn't we had to specify the index like the following:
// const auto& name_index = db.get<by_name>(); ==
// const auto& name_index = db.get<1>();
const auto& id_index = db.get<by_id>();
const auto& name_index = db.get<by_name>();
const auto& age_index = db.get<by_age>();
const auto& email_index = db.get<by_email>();
save_information_by(by_id_file, id_index);
save_information_by(by_name_file, name_index);
save_information_by(by_age_file, age_index);
save_information_by(by_email_file, email_index);
}
Look at boost::multi_index here. There is a container boost::multi_index_contaier which allows you to search for items using various keys.
Related
I create an array of a struct:
struct student {
char name[20];
int num1;
int num2;
} temp[20] = {0};
Its internal data is like this:
temp[0]={'a',1,11};
temp[1]={'b',2,12};
temp[2]={'c',3,13};
temp[3]={'d',4,14};
temp[4]={'e',5,15};
...
I know that I can define a compare function to tell the sort() function the role of sorting, for example:
bool cmp (student a,student b){
return a.num1 > b.num1;
}
sort(temp, temp+20, cmp);
My question is: How could I sort the array using this sort() function based on the items I read in with scanf()?
Specifically, if I scanf() the num1 field, the sort function sorts the data based on num1.
If I scanf() the num2 field, the sort function sorts the data based on num2.
The role of sorting follows my scanf() item.
So, how can I realize this?
You can have both the reading and the sorting depend on a pointer-to-member.
So instead of
void read_num1(student & s) {
std::scanf("%d", std::addressof(s.num1));
}
void compare_num1(student lhs, student rhs) {
return lhs.num1 < rhs.num1;
}
int main() {
student students[20];
for (student & s : students) {
read_num1(s);
}
std::sort(std::begin(students), std::end(students), compare_num1);
}
You have functions that have an extra parameter, and close over that where necessary
using member_t = int (student::*);
void read(student & s, member_t member) {
std::scanf("%d", std::addressof(s.*member));
}
void compare(student lhs, student rhs, member_t member) {
return (lhs.*member) < (rhs.*member);
}
int main() {
member_t member = /* some condition */ true ? &student::num1 : &student::num2;
student students[20];
for (student & s : students) {
read(s, member);
}
std::sort(std::begin(students), std::end(students), [member](student lhs, student rhs) { return compare(lhs, rhs, member); });
}
I recommend using the ranges interface for sort in the STL. Change your array to use std::array or better std::vector and then you can do sorting as simple as this:
ranges::sort(s, [](int a, int b) { return a > b; });
print("Sort using a lambda expression", s);
Particle particles[] {
{"Electron", 0.511}, {"Muon", 105.66}, {"Tau", 1776.86},
{"Positron", 0.511}, {"Proton", 938.27}, {"Neutron", 939.57},
};
ranges::sort(particles, {}, &Particle::name);
print("\nSort by name using a projection", particles, '\n');
ranges::sort(particles, {}, &Particle::mass);
print("Sort by mass using a projection", particles, '\n');
And if you want to sort by multiple fields, e.g last_name, first_name, you can use a std::tuple with references into your struct as projection. std::tuple has lexicographical comparison so you really just have to give the order of fields and the rest happens by magic.
I have a map of <MyObject, int> to count the occurrences of each instances. Upon overwrite key myObject1 with an equal myObject2, will myObject1 got deleted and memory allocated to myObject1 got recovered?
for example I have text file that consist of name, gender, age, height, of people. Assuming I just want to count how many unique (name, gender) pairs. So I create my Person (string name, int gender) objects, and add it to a std::map (let's say I have to use map instead of set)
std::map<Person, int> myMapCounter;
//for each line
Person newperson(name, gender);
myMapCounter[newperson] = 1 ;// just a dummy value
//end for
int number = myMapCounter.size();
upon creating a new Person object that is equal to old one from previous line, will myMapCounter[newperson] = 1 delete the old object (recover memory, so there is only 1 block of memory for this certain "person") , or the "old" object will still exist in the memory?
Well, it worked for me when implemented an operator < function.
The key in the std::map is queried for its < with the other one when wanting to add
When you are wanting to add a Person struct/class to your map, it is queried with others is it < or not.
Trying code like that to see what really happens
#include <map>
#include <iostream>
#include <string>
class Person {
private:
std::string name;
std::string gender;
public:
Person(std::string name, std::string gender): name(name), gender(gender) {}
Person(const Person &person): name(person.name), gender(person.gender) {}
inline std::string getName() const { return name; }
inline std::string getGender() const { return gender; }
friend bool operator <(const Person &first, const Person &other);
};
inline bool operator <(const Person &first, const Person &other)
{
std::cout << "Comparing " << first.name << " with " << other.name << "\n";
if (first.name == other.name) {
return first.gender < other.gender;
} else {
return first.name < other.name;
}
}
int main() {
std::map<Person, int> map;
map[Person("One", "male")] = 1;
map[Person("Two", "male")] = 4;
map[Person("A", "male")] = 3;
std::cout << "Ending adding\n";
std::cout << map[Person("One", "male")];
}
You would see something like that
Comparing Two with One
Comparing One with Two
Comparing A with One
Ending adding
Comparing One with One
Comparing One with One
Can a std::map's or std::unordered_map's key be shared with part of the value? Especially if the key is non-trivial, say like a std::string?
As a simple example let's take a Person object:
struct Person {
// lots of other values
std::string name;
}
std::unordered_map<std::string, std::shared_ptr<Person>> people;
void insertPerson(std::shared_ptr<Person>& p) {
people[p.name] = p;
// ^^^^^^
// copy of name string
}
std::shared_ptr<Person> lookupPerson(const std::string& name) const {
return people[name];
}
My first thought is a wrapper around the name that points to the person, but I cannot figure out how to do a lookup by name.
For your purpose, a std::map can be considered a std::set containing std::pair's which is ordered (and thus efficiently accessible) according to the first element of the pair.
This view is particularly useful if key and value elements are partly identical, because then you do not need to artificially separate value and key elements for a set (and neither you need to write wrappers around the values which select the key).
Instead, one only has to provide a custom ordering function which works on the set and extracts the relevant key part.
Following this idea, your example becomes
auto set_order = [](auto const& p, auto const& s) { return p->name < s->name; };
std::set<std::shared_ptr<Person>, decltype(set_order)> people(set_order);
void insertPerson(std::shared_ptr<Person>& p) {
people.insert(p);
}
As an alternative, here you could also drop the custom comparison and order the set by the addresses in the shared pointer (which supports < and thus can be used directly in the set):
std::set<std::shared_ptr<Person> > people;
void insertPerson(std::shared_ptr<Person>& p) {
people.insert(p);
}
Replace set by unordered_set where needed (in general you then also need to provide a suitable hash function).
EDIT: The lookup can be performed using std:lower_bound:
std::shared_ptr<Person> lookupPerson(std::string const& s)
{
auto comp = [](auto const& p, auto const& s) { return p->name < s; };
return *std::lower_bound(std::begin(people), std::end(people), s, comp);
}
DEMO.
EDIT 2: However, given this more-or-less ugly stuff, you can also follow the lines of your primary idea and use a small wrapper around the value as key, something like
struct PersonKey
{
PersonKey(std::shared_ptr<Person> const& p) : s(p->name) {}
PersonKey(std::string const& _s) : s(_s) {}
std::string s;
bool operator<(PersonKey const& rhs) const
{
return s < rhs.s;
}
};
Use it like (untested)
std::map<PersonKey, std::shared_ptr<Person> > m;
auto sptr = std::make_shared<Person>("Peter");
m[PersonKey(sptr)]=sptr;
Lookup is done through
m[PersonKey("Peter")];
Now I like this better than my first suggestion ;-)
Here's an alternative to davidhigh's answer.
struct Person {
// lots of other values
std::string name;
}
struct StrPtrCmp {
bool operator()(const std::string* a, const std::string* b) const {
return *a < *b;
}
}
std::map<const std::string*, std::shared_ptr<Person>, StrPtrCmp> people();
void insertPerson(std::shared_ptr<Person>& p) {
people[&(p.name)] = p;
}
std::shared_ptr<Person> lookupPerson(const std::string& name) const {
return people[&name];
}
And a few edits to make it work with std::unordered_map:
struct StrPtrHash {
size_t operator()(const std::string* p) const {
return std::hash<std::string>()(*p);
}
};
struct StrPtrEquality {
bool operator()(const std::string* a, const std::string* b) const {
return std::equal_to<std::string>()(*a, *b);
}
};
std::unordered_map<const std::string*, std::shared_ptr<Person>, StrPtrHash, StrPtrEquality> people();
I have a struct
typedef struct
{
int id;
string name;
string address;
string city;
// ...
} Customer;
I will have multiple customers so I need to store these structs in some sort of a list and then I need to sort by id. There a probably multiple solutions here and I have a few ideas myself but I am looking for the best solution in terms of performance.
Use the sort provided by the stl algorithms package, example:
struct Customer {
int id;
Customer(int i) : id(i) {}
};
bool sortfunc(struct Customer i, struct Customer j) {
return (i.id < j.id);
}
int main() {
vector<Customer> customers;
customers.push_back(Customer(32));
customers.push_back(Customer(71));
customers.push_back(Customer(12));
customers.push_back(Customer(45));
customers.push_back(Customer(26));
customers.push_back(Customer(80));
customers.push_back(Customer(53));
customers.push_back(Customer(33));
sort(customers.begin(), customers.end(), sortfunc);
cout << "customers:";
vector<Customer>::iterator it;
for (it = customers.begin(); it != customers.end(); ++it)
cout << " " << it->id;
return 1;
}
I would recommend you storing Customers in std::set.
You should create operator <
bool Customer::operator < (const Customer& other) const {
return id < customer.id;
}
Now, after each insert, collection is already sorted by id.
And you can iterate over whole collection by:
for(std::set<Customer>::iterator it = your_collection.begin(); it != your_collection.end(); it++)
This is fastest solution because you don't need to sort anything, and each insert takes O(log n) time.
Using std::list.sort method should be the fastest way.
Define operator< for your structure to provide an ordering relation between Customer instances:
struct Customer {
...
friend bool operator<(const Customer& a, const Customer& b) {
return a.id < b.id;
}
};
Use this cheatsheet (in particular the flowchart at the bottom) to decide which container you should use in your particular program. If it’s std::set, your sorting is done whenever you insert a Customer into the set. If it’s std::list, call the sort() member function on the list:
std::list<Customer> customers;
customers.push_back(Customer(...));
...
customers.sort();
If it’s std::vector or std::deque, use std::sort() from the <algorithm> header:
std::vector<Customer> customers;
customers.push_back(Customer(...));
...
std::sort(customers.begin(), customers.end());
If you need to sort in multiple ways, define a sorting function for each ordering:
struct Customer {
...
static bool sort_by_name(const Customer& a, const Customer& b) {
return a.name < b.name;
}
};
Then tell std::list::sort() or std::sort() to use that comparator:
customers.sort(Customer::sort_by_name);
std::sort(customers.begin(), customers.end(), Customer::sort_by_name);
Having the following code:
#include <iostream>
#include <set>
#include <string>
#include <functional>
using namespace std;
class Employee {
// ...
int _id;
string _name;
string _title;
public:
Employee(int id): _id(id) {}
string const &name() const { return _name; }
void setName(string const &newName) { _name = newName; }
string const &title() const { return _title; }
void setTitle(string const &newTitle) { _title = newTitle; }
int id() const { return _id; }
};
struct compEmployeesByID: public binary_function<Employee, Employee, bool> {
bool operator()(Employee const &lhs, Employee const &rhs) {
return lhs.id() < rhs.id();
}
};
int wmain() {
Employee emplArr[] = {0, 1, 2, 3, 4};
set<Employee, compEmployeesByID> employees(emplArr, emplArr + sizeof emplArr/sizeof emplArr[0]);
// ...
set<Employee, compEmployeesByID>::iterator iter = employees.find(2);
if (iter != employees.end())
iter->setTitle("Supervisor");
return 0;
}
I cannot compile this code having (MSVCPP 11.0):
1> main.cpp
1>d:\docs\programming\test01\test01\main.cpp(40): error C2662: 'Employee::setTitle' : cannot convert 'this' pointer from 'const Employee' to 'Employee &'
1> Conversion loses qualifiers
This helps to compile:
if (iter != employees.end())
const_cast<Employee &>(*iter).setTitle("Supervisor");
The question: I know that map and multimap store their values as pair(const K, V) where K is a key and V is a value. We cannot change the K object. But set<T> and multiset<T> store their object as T, not as const T. So WHY I NEED THIS CONST CAST??
In C++11 set (and multiset) specify that iterator as well as const_iterator is a constant iterator, i.e. you cannot use it to modify the key. This is because any modification of they key risks breaking the set's invariant. (See 23.2.4/6.)
Your const_cast opens the door to undefined behaviour.
The values in a set are not supposed to be modified. For example, if you modified your Employee's ID, then it would be in the wrong position in the set and the set would be broken.
Your Employee has three fields, and your set is using the _id field in your operator<.
class Employee {
// ...
int _id;
string _name;
string _title;
};
Therefore, you should probably use a map<int,Employee> instead of your set, then you would be able to modify the name and title. I would also make the _id field of Employee a const int _id.
(By the way, identifiers beginning with _ are technically reserved and should be avoided. It's never cause me any trouble but now I prefer to put the underscore on the end of the variable name.)
In C++, you cannot modify keys of associated STL containers because you will break their ordering. When you wish to change a key, you're supposed to (1) find the existing key, (2) delete it, and (3) insert the new key.
Unfortunately, while this isn't overly appealing, it's how associative containers work in the STL.
You can get away with const with just an indirection.
But be careful to not change the ordering of the elements in a given sorted container.