Array, Vector, Map and List data searching - c++

I'm currently working on a C++ school assignment and I have 3 object types. Customer, Hire and Tour. Customer is liked to Tour and Hire. Requirement is to use Array, Vector, Map and List to hold this information based on users's selection of the data structure type. There are data files with 1000s of records and the application will read them and create the necessary objects. For a example if the user selects vector, it'll create 3 vectors containing above objects. Then following operations will be preformed on them.
load in the large datasets we have provided for you. If the
datastructure you are using supports sorting, it should be sorted by
description.
Prepare a list of customers whose have booked a tour that will occur
before the end of the year
Prepare a list of tours booked by customers who owe us more than
$2000, sorted by the date their account is due
Prepare a list of Hires by customers whose postcodes begin with a 5
I have following in my main application header file.
private:
string structureType;
Customer** customerListArray;
Tour** tourListArray;
EquipmentHire** equipmentsListArray;
vector<Customer *> customerListVector;
vector<Tour *> tourListVector;
vector<EquipmentHire *> equipmentsListVector;
std::map<string, Customer*> customerListMap;
std::map<string, Tour*> tourListMap;
std::map<string, EquipmentHire*> equipmentsListMap;
list<Customer *> customerListList;
list<Tour *> tourListList;
list<EquipmentHire *> equipmentsListList;
Then I load data on to those objects based on the user's selection. However my question is, Do I need to write different functions for each type of data structures to preform above operations, or is there a common interface I can use on all of them?
My C++ knowledge is very limited and requirement is to use C++98.
Thank you.

The common interface is "STL container." This is not a strong interface defined in C++, but is a concept that you can rely on all STL containers to implement. You can therefore use templates to write code once that will apply to all of them.
(Note that maps and vectors differ in that maps have std::pair<> as their element type. In practice this can be worked around by providing "identity" and "map" selectors to your search function.)

Related

STL map like data structure to allow searching based on two keys

I want to make a map like structure to allow searching by two keys both will be strings, here's an example:
Myclass s;
Person p = s.find("David"); // searching by name
// OR
p = s.find("XXXXX"); // searching by ID
i don't want a code solution, i just want some help to get started like the structures i can use to achieve what i want, help is appreciated guys, it's finals week.
Put your records into a vector (or list). Add a pointer to the record objects to two maps, one with one key and one with the other.
There are many different ways how this could be achieved. The question is: what are the complexities of insert, delete and lookup operations that you aim for?
std::map is implemented as red-black tree that provides increadibly quick self-balancing (rotations) and all of mentioned operations (lookup/find, insert, delete) with complexity of O(log(n)). Note that this suits the idea of single key.
With 2 keys you can not keep elements sorted because the order based on one key will be most likely different than order based on the other one. The most straightforward and natural approach would be storing records in one container and holding the keys used by this container in 2 different structures, one optimized for retrieving this key given id and the other one for retrieving it given name.
If there is a constraint of storing everything at one place while you'd like to optimize find operation that will support two different keys, then you could create a wrapper of std::map<std::string, Person> where each element would be contained twice (each time under a different key), i.e. something like:
std::map<std::string, Person> myContainer;
...
Person p;
std::string id = "1E57A";
std::string name = "David";
myContainer[id] = p;
myContainer[name] = p;
I can think of 2 advantages of doing this:
quite satisfying performance:
lookup with complexity O(log(2*n))
insertion & deletion with complexity O(2*log(2*n))
extremely simple implementation (using existing container)
you just need to remember than the "expected" size of the container is half of its actual size
both of the keys: id and name should be attributes of Person so that when you find a concrete element given one of these keys, you immediately have the other one too
Disadvantage is that it will consume 2x so much memory and there might even be a constraint that:
none of the names should be an id of some other person at the same time and vice versa (no id should be a name of some other person)

C++: Implementing Row Iterator for Table

I have a generic table class implemented in C++ that uses a shared_ptr< ptr_vector< vector<T> > > as its backing, where T is an arbitrary typename; the ptr_vector contains pointers to the vectors corresponding to the columns in the table. I decided to wrap the ptr_vector in a shared_ptr since the tables may contain many millions of rows, and the vectors containing data for each column in a ptr_vector for the same reason. (Please tell me if this can be improved.)
Implementing column-wise operations on this table is trivial, since I have access to the native iterator supplied by the vector. However, I also need the table to support row-wise operations: relatively mundane operations such as adding and removing rows should be supported, as well as the ability to use the STL algorithms with the table. Now, I have run across some design issues that I need some help to address:
It seems that implementing a custom iterator to conduct row-wise operations is necessary to accomplish what is describe above. Would boost::iterator_adaptor be the right way to go about doing this?
When the user adds rows to the table, I do not wish to impose a specific data structure upon the user -- how would I go about doing this? I am thinking of accepting iterators as parameters to the add_row() method.
If you think that I should be implementing this table structure differently, I am open to any suggestions that you may have for me. It was originally designed with the intent to store strings read from tab-delimited files containing hundreds of thousands of row entries.
Thank you very much for your help!
The Boost library has a container called multi_array which provides and n-dimensional dynamic array which can be accessed and manipulated along each dimension. This seems to be very similar to what you are trying to build, perhaps you could use it instead?

Finding top elements of a map using Heap

I have a unordered_hashmap that maps a string (say personName or SSN) to a struct Attributes that has many attributes of that person including annualIncome. There are many such hash maps corresponding to different organizations such as mapOrganizationA, mapOrganizationB etc. I need to find the people (with attributes) with the top-k annual incomes. I was thinking of using a min-heap with k-nodes (with the minimum salary as root), so that I can scan the maps one by one, of the current element has income more than the root of the min-heap, the root can be updated. Is this the right approach to get top-k from different maps? Is there a min-heap datastucture in STL I can make use of.
You can use make_heap, push_heap, pop_heap, sort_heap, is_heap to treat any non-associative container (or sequence, really) as a heap.
That would not fit you map nicely, but I assume nothing would prevent you from storing the values (or pointers/references to those) inside, say, a list for this purpose?
Also, perhaps look at Boost.MultiIndex which is a library precisely focused on providing multiple (efficient!) indexes on the same data

Fastest C++ Container: Unique Values

I am writing an email application that interfaces with a MySQL database. I have two tables that are sourcing my data, one of which contains unsubscriptions, the other of which is a standard user table. As of now, I'm creating a vector of pointers to email objects, and storing all of the unsubscribed emails in it, initially. I then have a standard SQL loop in which I'm checking to see if the email is not in the unsubscribe vector, then adding it to the global send email vector. My question, is, is there a more efficient way of doing this? I have to search the unsub vector for every single email in my system, up to 50K different ones. Is there a better structure for searching? And, a better structure for maintaining a unique collection of values? Perhaps one that would simply discard the value if it already contains it?
If your C++ Standard Library implementation supports it, consider using a std::unordered_set or a std::hash_set.
You can also use std::set, though its overhead might be higher (it depends on the cost of generating a hash for the object versus the cost of comparing two of the objects several times).
If you do use a node based container like set or unordered_set, you also get the advantage that removal of elements is relatively cheap compared to removal from a vector.
Tasks like this (set manipulations) are better left to what is MEANT to execute them - the database!
E.g. something along the lines of:
SELECT email FROM all_emails_table e WHERE NOT EXISTS (
SELECT 1 FROM unsubscribed u where e.email=u.email
)
If you want an ALGORITHM, you can do this fast by retrieving both the list of emails AND a list of unsubscriptions as ORDERED lists. Then you can go through the e-mail list (which is ordered), and as you do it you glide along the unsubscribe list. The idea is that you move 1 forward in whichever list has the "biggest" current" element. This algo is O(M+N) instead of O(M*N) like your current one
Or, you can do a hash map which maps from unsubscribed e-mail address to 1. Then you do find() calls on that map whcih for correct hash implementations are O(1) for each lookup.
Unfortunately, there's no Hash Map standard in C++ - please see this SO question for existing implementations (couple of ideas there are SGI's STL hash_map and Boost and/or TR1 std::tr1::unordered_map).
One of the comments on that post indicates it will be added to the standard: "With this in mind, the C++ Standard Library Technical Report introduced the unordered associative containers, which are implemented using hash tables, and they have now been added to the Working Draft of the C++ Standard."
Store your email adresses in a std::set or use std::set_difference().
The best way to do this is within MySQL, I think. You can modify your users table schema with another column, a BIT column, for "is unsubscribed". Better yet: add a DATETIME column for "date deleted" with a default value of NULL.
If using a BIT column, your query becomes something like:
SELECT * FROM `users` WHERE `unsubscribed` <> 0b1;
If using a DATETIME column, your query becomes something like:
SELECT * FROM `users` WHERE `date_unsubscribed` IS NULL;

How to implement an associative array/map/hash table data structure (in general and in C++)

Well I'm making a small phone book application and I've decided that using maps would be the best data structure to use but I don't know where to start. (Gotta implement the data structure from scratch - school work)
Tries are quite efficient for implementing maps where the keys are short strings. The wikipedia article explains it pretty well.
To deal with duplicates, just make each node of the tree store a linked list of duplicate matches
Here's a basic structure for a trie
struct Trie {
struct Trie* letter;
struct List *matches;
};
malloc(26*sizeof(struct Trie)) for letter and you have an array. if you want to support punctuations, add them at the end of the letter array.
matches can be a linked list of matches, implemented however you like, I won't define struct List for you.
Simplest solution: use a vector which contains your address entries and loop over the vector to search.
A map is usually implemented either as a binary tree (look for red/black trees for balancing) or as a hash map. Both of them are not trivial: Trees have some overhead for organisation, memory management and balancing, hash maps need good hash functions, which are also not trivial. But both structures are fun and you'll get a lot of insight understanding by implementing one of them (or better, both :-)).
Also consider to keep the data in the vector list and let the map contain indices to the vector (or pointers to the entries): then you can easily have multiple indices, say one for the name and one for the phone number, so you can look up entries by both.
That said I just want to strongly recommend using the data structures provided by the standard library for real-world-tasks :-)
A simple approach to get you started would be to create a map class that uses two vectors - one for the key and one for the value. To add an item, you insert a key in one and a value in another. To find a value, you just loop over all the keys. Once you have this working, you can think about using a more complex data structure.