C++ double sorting data with multiple elements - c++

I have multiple data entries that contain the following information:
id_number
name1
date
name2
It is possible to put this into a struct like this:
struct entry {
int id_number;
string name1;
int date;
string name2;
}
In my data, I have many such entries and I would like to sort. First, I want to sort alphabetically based on name1, then sort by date. However, the sort by date is a subset of the alphabetical sort, e.g. if I have two entries with the same name1, I then want to order those entries by date. Furthermore, when I sort, I want the elements of the entry to remain together, so all four values go together.
My questions are the following:
1) What type of data structure should I use to hold this data so I can keep the set of four elements together when I sort any by any one of them?
2) What is the quickest way to do this sorting (in terms of amount of time to write the code). Ideally, I want to use something like the sort in algorithms.h since it is already built in.
3) Does STL have some built in data structure that can handle the double sorting I described efficiently?

The struct you have is fine, except that you may want to add an overload of operator< to do comparison. Here I'm doing the "compare by name, then date" comparison:
// Add this as a member function to `entry`.
bool operator<(entry const &other) const {
if (name1 < other.name1)
return true;
if (name1 > other.name1)
return false;
// otherwise name1 == other.name1
// so we now fall through to use the next comparator.
if (date < other.date)
return true;
return false;
}
[Edit: What's required is called a "strict weak ordering". If you want to get into detail about what the means, and what alternatives are possible, Dave Abrahams wrote quite a detailed post on C++ Next about it.
In the case above, we start by comparing the name1 fields of the two. If a<b, then we immediately return true. Otherwise, we check for a>b, and if so we return false. At that point, we've eliminated a<b and a>b, so we've determined that a==b, in which case we test the dates -- if a<b, we return true. Otherwise, we return false -- either the dates are equal, or b>a, either of which means the test for a<b is false. If the sort needs to sort out (no pun intended) which of those is the case, it can call the function again with the arguments swapped. The names will still be equal, so it'll still come down to the dates -- if we get false, the dates are equal. If we get true on the swapped dates, then what started as the second date is actually greater. ]
The operator< you define in the structure defines the order that will be used by default. When/if you want you can specify another order for the sorting to use:
struct byid {
bool operator<(entry const &a, entry const &b) {
return a.id_number < b.id_number;
}
};
std::vector<entry> entries;
// sort by name, then date
std::sort(entries.begin(), entries.end());
// sort by ID
std::sort(entries.begin(), entries.end(), byid());

That data structure right there should work just fine. What you should do is override the less than operator, then you could just insert them all in a map and they would be sorted. Here is more info on the comparison operators for a map
Update: upon farther reflection, I would use a set, and not a map, because there is no need for a value. But here is proof it still works
Proof this works:
#include<string>
#include<map>
#include<stdio.h>
#include <sstream>
using namespace std;
struct entry {
int m_id_number;
string m_name1;
int m_date;
string m_name2;
entry( int id_number, string name1, int date, string name2) :
m_id_number(id_number),
m_name1(name1),
m_date(date),
m_name2(name2)
{
}
// Add this as a member function to `entry`.
bool operator<(entry const &other) const {
if (m_name1 < other.m_name1)
return true;
if (m_name2 < other.m_name2)
return true;
if (m_date < other.m_date)
return true;
return false;
}
string toString() const
{
string returnValue;
stringstream out;
string dateAsString;
out << m_date;
dateAsString = out.str();
returnValue = m_name1 + " " + m_name2 + " " + dateAsString;
return returnValue;
}
};
int main(int argc, char *argv[])
{
string names1[] = {"Dave", "John", "Mark", "Chris", "Todd"};
string names2[] = {"A", "B", "C", "D", "E", "F", "G"};
std::map<entry, int> mymap;
for(int x = 0; x < 100; ++x)
{
mymap.insert(pair<entry, int>(entry(0, names1[x%5], x, names2[x%7]), 0));
}
std::map<entry, int>::iterator it = mymap.begin();
for(; it != mymap.end() ;++it)
{
printf("%s\n ", it->first.toString().c_str());
}
return 0;
}

Actually you can use function object to implement your sorting criteria
suppose that you would like to store the entries in the set
//EntrySortCriteria.h
class EntrySortCriteria
{
bool operator(const entry &e1, const entry &e2) const
{
return e1.name1 < e2.name1 ||
(!(e1.name1 < e2.name1) && e1.date < e2.date))
}
}
//main.cc
#include <iostream>
#include "EntrySortCriteria.h"
using namespace std;
int main(int argc, char **argv)
{
set<entry, EntrySortCriteria> entrySet;
//then you can put entries into this set,
//they will be sorted automatically according to your criteria
//syntax of set:
//entrySet.insert(newEntry);
//where newEntry is a object of your entry type
}

Related

I need to create MultiMap using hash-table but I get time-limit exceeded error (C++)

I'm trying to solve algorithm task: I need to create MultiMap(key,(values)) using hash-table. I can't use Set and Map libraries. I send code to testing system, but I get time-limit exceeded error on test 20. I don't know what exactly this test contains. The code must do following tasks:
put x y - add pair (x,y).If pair exists, do nothing.
delete x y - delete pair(x,y). If pair doesn't exist, do nothing.
deleteall x - delete all pairs with first element x.
get x - print number of pairs with first element x and second elements.
The amount of operations <= 100000
Time limit - 2s
Example:
multimap.in:
put a a
put a b
put a c
get a
delete a b
get a
deleteall a
get a
multimap.out:
3 b c a
2 c a
0
#include <iostream>
#include <fstream>
#include <vector>
using namespace std;
inline long long h1(const string& key) {
long long number = 0;
const int p = 31;
int pow = 1;
for(auto& x : key){
number += (x - 'a' + 1 ) * pow;
pow *= p;
}
return abs(number) % 1000003;
}
inline void Put(vector<vector<pair<string,string>>>& Hash_table,const long long& hash, const string& key, const string& value) {
int checker = 0;
for(int i = 0; i < Hash_table[hash].size();i++) {
if(Hash_table[hash][i].first == key && Hash_table[hash][i].second == value) {
checker = 1;
break;
}
}
if(checker == 0){
pair <string,string> key_value = make_pair(key,value);
Hash_table[hash].push_back(key_value);
}
}
inline void Delete(vector<vector<pair<string,string>>>& Hash_table,const long long& hash, const string& key, const string& value) {
for(int i = 0; i < Hash_table[hash].size();i++) {
if(Hash_table[hash][i].first == key && Hash_table[hash][i].second == value) {
Hash_table[hash].erase(Hash_table[hash].begin() + i);
break;
}
}
}
inline void Delete_All(vector<vector<pair<string,string>>>& Hash_table,const long long& hash,const string& key) {
for(int i = Hash_table[hash].size() - 1;i >= 0;i--){
if(Hash_table[hash][i].first == key){
Hash_table[hash].erase(Hash_table[hash].begin() + i);
}
}
}
inline string Get(const vector<vector<pair<string,string>>>& Hash_table,const long long& hash, const string& key) {
string result="";
int counter = 0;
for(int i = 0; i < Hash_table[hash].size();i++){
if(Hash_table[hash][i].first == key){
counter++;
result += Hash_table[hash][i].second + " ";
}
}
if(counter != 0)
return to_string(counter) + " " + result + "\n";
else
return "0\n";
}
int main() {
vector<vector<pair<string,string>>> Hash_table;
Hash_table.resize(1000003);
ifstream input("multimap.in");
ofstream output("multimap.out");
string command;
string key;
int k = 0;
string value;
while(true) {
input >> command;
if(input.eof())
break;
if(command == "put") {
input >> key;
long long hash = h1(key);
input >> value;
Put(Hash_table,hash,key,value);
}
if(command == "delete") {
input >> key;
input >> value;
long long hash = h1(key);
Delete(Hash_table,hash,key,value);
}
if(command == "get") {
input >> key;
long long hash = h1(key);
output << Get(Hash_table,hash,key);
}
if(command == "deleteall"){
input >> key;
long long hash = h1(key);
Delete_All(Hash_table,hash,key);
}
}
}
How can I do my code work faster?
At very first, a matter of design: Normally, one would pass the key only to the function and calculate the hash within. Your variant allows a user to place elements anywhere within the hash table (using bad hash values), so user could easily break it.
So e. g. put:
using HashTable = std::vector<std::vector<std::pair<std::string, std::string>>>;
void put(HashTable& table, std::string& key, std::string const& value)
{
auto hash = h1(key);
// ...
}
If at all, the hash function could be parametrised, but then you'd write a separate class for (wrapping the vector of vectors) and provide the hash function in constructor so that a user cannot exchange it arbitrarily (and again break the hash table). A class would come with additional benefits, most important: better encapsulation (hiding the vector away, so user could not change it with vector's own interface):
class HashTable
{
public:
// IF you want to provide hash function:
template <typename Hash>
HashTable(Hash hash) : hash(hash) { }
void put(std::string const& key, std::string const& value);
void remove(std::string const& key, std::string const& value); //(delete is keyword!)
// ...
private:
std::vector<std::vector<std::pair<std::string, std::string>>> data;
// if hash function parametrized:
std::function<size_t(std::string)> hash; // #include <functional> for
};
I'm not 100% sure how efficient std::function really is, so for high performance code, you preferrably use your hash function h1 directly (not implenting constructor as illustrated above).
Coming to optimisations:
For the hash key I would prefer unsigned value: Negative indices are meaningless anyway, so why allow them at all? long long (signed or unsigned) might be a bad choice if testing system is a 32 bit system (might be unlikely, but still...). size_t covers both issues at once: it is unsigned and it is selected in size appropriately for given system (if interested in details: actually adjusted to address bus size, but on modern systems, this is equal to register size as well, which is what we need). Select type of pow to be the same.
deleteAll is implemented inefficiently: With each element you erase you move all the subsequent elements one position towards front. If you delete multiple elements, you do this repeatedly, so one single element can get moved multiple times. Better:
auto pos = vector.begin();
for(auto& pair : vector)
{
if(pair.first != keyToDelete)
*pos++ = std::move(s); // move semantics: faster than copying!
}
vector.erase(pos, vector.end());
This will move each element at most once, erasing all surplus elements in one single go. Appart from the final erasing (which you have to do explicitly then), this is more or less what std::remove and std::remove_if from algorithm library do as well. Are you allowed to use it? Then your code might look like this:
auto condition = [&keyToDelete](std::pair<std::string, std::string> const& p)
{ return p.first == keyToDelete; };
vector.erase(std::remove_if(vector.begin(), vector.end(), condition), vector.end());
and you profit from already highly optimised algorithm.
Just a minor performance gain, but still: You can spare variable initialisation, assignment and conditional branch (the latter one can be relatively expensive operation on some systems) within put if you simply return if an element is found:
//int checker = 0;
for(auto& pair : hashTable[hash]) // just a little more comfortable to write...
{
if(pair.first == key && pair.second == value)
return;
}
auto key_value = std::make_pair(key, value);
hashTable[hash].push_back(key_value);
Again, with algorithm library:
auto key_value = std::make_pair(key, value);
// same condition as above!
if(std::find_if(vector.begin(), vector.end(), condition) == vector.end();
{
vector.push_back(key_value);
}
Then less than 100000 operations does not indicate that each operation will require a separate key/value pair. We might expect that keys are added, removed, re-added, ..., so you most likely don't have to cope with 100000 different values. I'd assume your map is much too large (be aware that it requires initialisation of 100000 vectors as well). I'd assume a much smaller one should suffice already (possibly 1009 or 10007? You might possibly have to experiment a little...).
Keeping the inner vectors sorted might give you some performance boost as well:
put: You could use a binary search to find the two elements in between a new one is to be inserted (if one of these two is equal to given one, no insertion, of course)
delete: Use binary search to find the element to delete.
deleteAll: Find upper and lower bounds for elements to be deleted and erase whole range at once.
get: find lower and upper bound as for deleteAll, distance in between (number of elements) is a simple subtraction and you could print out the texts directly (instead of first building a long string). Which of outputting directly or creating a string really is more efficient is to be found out, though, as outputting directly involves multiple system calls, which in the end might cost previously gained performance again...
Considering your input loop:
Checking for eof() (only) is critical! If there is an error in the file, you'll end up in an endless loop, as the fail bit gets set, operator>> actually won't read anything at all any more and you won't ever reach the end of the file. This even might be the reason for your 20th test failing.
Additionally: You have line based input (each command on a separate line), so reading a whole line at once and only afterwards parse it will spare you some system calls. If some argument is missing, you will detect it correctly instead of (illegally) reading next command (e. g. put) as argument, similarly you won't interpret a surplus argument as next command. If a line is invalid for whatever reason (bad number of arguments as above or unknown command), you can then decide indiviually what you want to do (just ignore the line or abort processing entirely). So:
std::string line;
while(std::getline(std::cin, line))
{
// parse the string; if line is invalid, appropriate error handling
// (ignoring the line, exiting from loop, ...)
}
if(!std::cin.eof())
{
// some error occured, print error message!
}

fast way to compare two vector containing strings

I have a vector of strings I that pass to my function and I need to compare it with some pre-defined values. What is the fastest way to do this?
The following code snippet shows what I need to do (This is how I am doing it, but what is the fastest way of doing this):
bool compare(vector<string> input1,vector<string> input2)
{
if(input1.size() != input2.size()
{
return false;
}
for(int i=0;i<input1.siz();i++)
{
if(input1[i] != input2[i])
{
return false;
}
}
return true;
}
int compare(vector<string> inputData)
{
if (compare(inputData,{"Apple","Orange","three"}))
{
return 129;
}
if (compare(inputData,{"A","B","CCC"}))
{
return 189;
}
if (compare(inputData,{"s","O","quick"}))
{
return 126;
}
if (compare(inputData,{"Apple","O123","three","four","five","six"}))
{
return 876;
}
if (compare(inputData,{"Apple","iuyt","asde","qwe","asdr"}))
{
return 234;
}
return 0;
}
Edit1
Can I compare two vector like this:
if(inputData=={"Apple","Orange","three"})
{
return 129;
}
You are asking what is the fastest way to do this, and you are indicating that you are comparing against a set of fixed and known strings. I would argue that you would probably have to implement it as a kind of state machine. Not that this is very beautiful...
if (inputData.size() != 3) return 0;
if (inputData[0].size() == 0) return 0;
const char inputData_0_0 = inputData[0][0];
if (inputData_0_0 == 'A') {
// possibly "Apple" or "A"
...
} else if (inputData_0_0 == 's') {
// possibly "s"
...
} else {
return 0;
}
The weakness of your approach is its linearity. You want a binary search for teh speedz.
By utilising the sortedness of a map, the binaryness of finding in one, and the fact that equivalence between vectors is already defined for you (no need for that first compare function!), you can do this quite easily:
std::map<std::vector<std::string>, int> lookup{
{{"Apple","Orange","three"}, 129},
{{"A","B","CCC"}, 189},
// ...
};
int compare(const std::vector<std::string>& inputData)
{
auto it = lookup.find(inputData);
if (it != lookup.end())
return it->second;
else
return 0;
}
Note also the reference passing for extra teh speedz.
(I haven't tested this for exact syntax-correctness, but you get the idea.)
However! As always, we need to be context-aware in our designs. This sort of approach is more useful at larger scale. At the moment you only have a few options, so the addition of some dynamic allocation and sorting and all that jazz may actually slow things down. Ultimately, you will want to take my solution, and your solution, and measure the results for typical inputs and whatnot.
Once you've done that, if you still need more speed for some reason, consider looking at ways to reduce the dynamic allocations inherent in both the vectors and the strings themselves.
To answer your follow-up question: almost; you do need to specify the type:
// new code is here
// ||||||||||||||||||||||||
if (inputData == std::vector<std::string>{"Apple","Orange","three"})
{
return 129;
}
As explored above, though, let std::map::find do this for you instead. It's better at it.
One key to efficiency is eliminating needless allocation.
Thus, it becomes:
bool compare(
std::vector<std::string> const& a,
std::initializer_list<const char*> b
) noexcept {
return std::equal(begin(a), end(a), begin(b), end(b));
}
Alternatively, make them static const, and accept the slight overhead.
As an aside, using C++17 std::string_view (look at boost), C++20 std::span (look for the Guideline support library (GSL)) also allows a nicer alternative:
bool compare(std::span<std::string> a, std::span<std::string_view> b) noexcept {
return a == b;
}
The other is minimizing the number of comparisons. You can either use hashing, binary search, or manual ordering of comparisons.
Unfortunately, transparent comparators are a C++14 thing, so you cannot use std::map.
If you want a fast way to do it where the vectors to compare to are not known in advance, but are reused so can have a little initial run-time overhead, you can build a tree structure similar to the compile time version Dirk Herrmann has. This will run in O(n) by just iterating over the input and following a tree.
In the simplest case, you might build a tree for each letter/element. A partial implementation could be:
typedef std::vector<std::string> Vector;
typedef Vector::const_iterator Iterator;
typedef std::string::const_iterator StrIterator;
struct Node
{
std::unique_ptr<Node> children[256];
std::unique_ptr<Node> new_str_child;
int result;
bool is_result;
};
Node root;
int compare(Iterator vec_it, Iterator vec_end, StrIterator str_it, StrIterator str_end, const Node *node);
int compare(const Vector &input)
{
return compare(input.begin(), input.end(), input.front().begin(), input.front().end(), &root);
}
int compare(Iterator vec_it, Iterator vec_end, StrIterator str_it, StrIterator str_end, const Node *node)
{
if (str_it != str_end)
{
// Check next character
auto next_child = node->children[(unsigned char)*str_it].get();
if (next_child)
return compare(vec_it, vec_end, str_it + 1, str_end, next_child);
else return -1; // No string matched
}
// At end of input string
++vec_it;
if (vec_it != vec_end)
{
auto next_child = node->new_str_child.get();
if (next_child)
return compare(vec_it, vec_end, vec_it->begin(), vec_it->end(), next_child);
else return -1; // Have another string, but not in tree
}
// At end of input vector
if (node->is_result)
return node->result; // Got a match
else return -1; // Run out of input, but all possible matches were longer
}
Which can also be done without recursion. For use cases like yours you will find most nodes only have a single success value, so you can collapse those into prefix substrings, to use the OP example:
"A"
|-"pple" - new vector - "O" - "range" - new vector - "three" - ret 129
| |- "i" - "uyt" - new vector - "asde" ... - ret 234
| |- "0" - "123" - new vector - "three" ... - ret 876
|- new vector "B" - new vector - "CCC" - ret 189
"s" - new vector "O" - new vector "quick" - ret 126
you could make use of std::equal function like below :
bool compare(vector<string> input1,vector<string> input2)
{
if(input1.size() != input2.size()
{
return false;
}
return std::equal(input1.begin(), input2.end(), input2.begin())
}
Can I compare two vector like this
The answer is No, you need compare a vector with another vector, like this:
vector<string>data = {"ab", "cd", "ef"};
if(data == vector<string>{"ab", "cd", "efg"})
cout << "Equal" << endl;
else
cout << "Not Equal" << endl;
What is the fastest way to do this?
I'm not an expert of asymptotic analysis but:
Using the relational operator equality (==) you have a shortcut to compare two vectors, first validating the size and, second, each element on them. This way provide a linear execution (T(n), where n is the size of vector) which compare each item of the vector, but each string must be compared and, generally, it is another linear comparison (T(m), where m is the size of the string).
Suppose that each string has de same size (m) and you have a vector of size n, each comparison could have a behavior of T(nm).
So:
if you want a shortcut to compare two vector you can use the
relational operator equality.
If you want an program which perform a fast comparison you should look for some algorithm for compare strings.

Rearrange list the same way as another one

I bumped into a page where there were a lot of categories and next to each one the number of items in each category, wrapped in parenthesis. Something really common. It looked like this:
Category 1 (2496)
Category 2 (34534)
Category 3 (1039)
Category 4 (9)
...
So I was curious and I wanted to see which categories had more items and such, and since all categories were all together in the page I could just select them all and copy them in a text file, making things really easy.
I made a little program that reads all the numbers, store them in a list and sort them. In order to know what category the number it belonged to I would just Ctrl + F the number in the browser.
But I thought it would be nice to have the name of the category next to the number in my text file, and I managed to parse them in another file. However, they are not ordered, obviously.
This is what I could do so far:
bool is_number(const string& s) {
return !s.empty() && find_if(s.begin(), s.end(), [](char c) { return !isdigit(c); }) == s.end();
}
int main() {
ifstream file;
ofstream file_os, file_t_os;
string word, text; // word is the item count and text the category name
list<int> words_list; // list of item counts
list<string> text_list; // list of category names
file.open("a.txt");
file_os.open("a_os.txt");
file_t_os.open("a_t_os.txt");
while (file >> word) {
if (word.front() == '(' && word.back() == ')') { // check if it's being read something wrapped in parenthesis
string old_word = word;
word.erase(word.begin());
word.erase(word.end()-1);
if (is_number(word)) { // check if it's a number (item count)
words_list.push_back(atoi(word.c_str()));
text.pop_back(); // get rid of an extra space in the category name
text_list.push_back(text);
text.clear();
} else { // it's part of the category name
text.append(old_word);
text.append(" ");
}
} else {
text.append(word);
text.append(" ");
}
}
words_list.sort();
for (list<string>::iterator it = text_list.begin(); it != text_list.end(); ++it) {
file_t_os << *it << endl;
}
for (list<int>::iterator it = words_list.begin(); it != words_list.end(); ++it) {
file_os << fixed << *it << endl;
}
cout << text_list.size() << endl << words_list.size() << endl; // I'm getting the same count
}
Now I forget about having the name next to the number, because something more interesting occured to me. I thought it would be interesting to find a way to rearrange the strings in the text_list which contain the names of the categories in the exact same way the list with the item count was sorted.
Let me explain with an example, lets say we have the following categories:
A (5)
B (3)
C (10)
D (6)
The way I'm doing it I will have a list<int> containing this: {10, 6, 5, 3} and a list<string> containing this: {A, B, C, D}.
What I'm saying is I want to find a way I can keep track of the way the elements were rearranged in the first list and apply that very pattern to the second list. What would be the rearrange pattern? It would be: the first item (5) goes to the third position, the second one (3) to the fourth one, the third one (10) to the first one, and so on.... Then this pattern should be applied to the other list, so that it would end up like this: {C, D, A, B}.
The thing would be to keep track of the Pattern and apply it to the list below.
Is there any way I can do this? Any particular function that could help me? Any way to track all the swaps and switches the sort algorithm does so it can be applied to a different list with the same size? What about a different sorting algorithm?
I know this might be highly inefficient and a bad idea, but it seemed like a little challenge.
I also know I could just pair both string and int, category and item count, in some sort of container like pair or map or make a container class of my own and sort the items based on the item count (I guess map would be the best choice, what do you think?), but this is not what I am asking.
The best way to do this would be to create a list that contains both sets of information you want to sort and feed in a custom sorting function.
For instance:
struct Record {
string name;
int count;
};
list<Record> myList;
sort(myList, [](Record a, Record b){
return a.count < b.count;
});
In the general case, it's always better to manage one list of a complex datatype, than to try to separately manage two or more lists of simple datatypes, especially when they're mutable.
Some more improve way:
First some notes:
It's recommended to storage category name and items together, for clarity, easy of read code, etc...
It's better use std::vector instead of std::list (see Bjarne Stroustrup opinion)
The code load the file with the format specified in your question, storage in the vector the info pair.
Use std::sort function to sort only by items number (the categories with the same items would be in any order, if you would like to sort for category name the categories with the same items change the lambda body to return std::tie(left.items, left.name) > std::tie(right.items, right.name);.
Added a version with info split, in one collection items and index (to correlate items with name) info, and in the other names info.
Code:
#include <iostream>
#include <fstream>
#include <algorithm>
#include <vector>
bool is_number(const std::string& s) {
return !s.empty() &&
find_if(s.begin(), s.end(), [](char c) { return !isdigit(c); }) ==
s.end();
}
struct category_info {
std::string name;
int items;
};
struct category_items_info {
int items;
size_t index;
};
int main() {
std::ifstream file("H:\\save.txt");
std::vector<category_info> categories;
std::vector<category_items_info> categories_items;
std::vector<std::string> categories_names;
std::string word;
std::string text;
while (file >> word) {
if (word.front() == '(' && word.back() == ')') {
std::string inner_word = word.substr(1, word.size() - 2);
if (is_number(inner_word)) {
std::string name = text.substr(0, text.size() - 1);
int items = atoi(inner_word.c_str());
categories.push_back(category_info{name, items});
categories_names.push_back(name);
categories_items.push_back(
category_items_info{items, categories_items.size()});
text.clear();
} else { // it's part of the category name
text.append(word);
text.append(" ");
}
} else {
text.append(word);
text.append(" ");
}
}
std::sort(categories.begin(), categories.end(),
[](const category_info& left, const category_info& right) {
return left.items > right.items;
});
std::sort(
categories_items.begin(), categories_items.end(),
[](const category_items_info& left, const category_items_info& right) {
return left.items > right.items;
});
std::cout << "Using the same storage." << std::endl;
for (auto c : categories) {
std::cout << c.name << " (" << c.items << ")" << std::endl;
}
std::cout << std::endl;
std::cout << "Using separated storage." << std::endl;
for (auto c : categories_items) {
std::cout << categories_names[c.index] << " (" << c.items << ")"
<< std::endl;
}
}
Output obtained:
Using the same storage.
Category 2 (34534)
Category 1 (2496)
Category 3 (1039)
Category 4 (9)
Using separated storage.
Category 2 (34534)
Category 1 (2496)
Category 3 (1039)
Category 4 (9)
Lists do not support random access iterators, so this is going to be a problem, since a list can't be permuted based on a vector (or array) of indices without doing a lot of list traversal back and forth to emulate random access iteration. NetVipeC's solution was to use vectors instead of lists to get around this problem. If using vectors, then you could generate a vector (or array) of indices to the vector to be sorted, then sort the vector indices using a custom compare operator. You could then copy the vectors according to the vector of sorted indices. It's also possible to reorder a vector in place according to the indices, but that algorithm also sorts the vector of indices, so you're stuck making a copy of the sorted indices (to sort the second vector), or copying each vector in sorted index order.
If you really want to use lists, you could implement your own std::list::sort, that would perform the same operations on both lists. The Microsoft version of std::list::sort uses an array of lists where the number of nodes in array[i] = 2^i, and it merges nodes one at a time into the array, then when all nodes are processed, it merges the lists in the array to produce a sorted list. You'd need two arrays, one for each list to be sorted. I can post example C code for this type of list sort if you want.

To find duplicate entry in c++ using 2D Vector (std::vector)

I wrote a program to find duplicate entry in a table. I am a beginner in C++, hence I don't know how this program is working efficient. Is there any other idea to write this program? Here I have 3 tables (2D Vector), that they are 1)aRecord_arr 2)mainTable and 3)idxTable. idxtable is use to identify the keys to check duplicate entry. aRecord_arr table to be add in maintable. If it is already exist in maintable, it will show the error "Duplicate Entry". So Check this program, and give your suggestions.
typedef vector<string> rec_t;
typedef vector<rec_t> tab_t;
typedef vector<int> cn_t;
int main()
{
tab_t aRecord_arr= { {"a","apple","fruit"},
{"b","banana","fruit"} };
tab_t mainTable = { {"o","orange","fruit"},
{"p","pineapple","fruit"},
{"b","banana","fruit"},
{"m","melon","fruit"},
{"a","apple","fruit"},
{"g","guava","fruit"} };
tab_t idxTable = { {"code","k"},
{"name","k"},
{"category","n"}};
size_t Num_aRecords = aRecord_arr.size();
int idxSize = idxTable.size();
int mainSize = mainTable.size();
rec_t r1;
rec_t r2;
tab_t t1,t2;
cn_t idx;
for(int i=0;i<idxSize;i++)
{
if(idxTable[i][1]=="k")
{
idx.push_back(i);
}
}
for(size_t j=0;j<Num_aRecords;j++)
{
for(unsigned int id=0;id<idx.size();id++)
{
r1.push_back(aRecord_arr[j][idx[id]]);
}
t1.push_back(std::move(r1));
}
for(int j=0;j<mainSize;j++)
{
for(unsigned int id=0;id<idx.size();id++)
{
r2.push_back(mainTable[j][idx[id]]);
}
t2.push_back(std::move(r2));
}
for(size_t i=0;i<t1.size();i++)
{
for(size_t j=0;j<t2.size();j++)
{
if(t1[i]==t2[j])
{
cout<<"Duplicate Entry"<<endl;
exit(0);
}
}
}
}
If you want to avoid duplicate entries in an array, you should consider using a std::setinstead.
What you want is probably a std::map or a std::set
Don't reinvent the wheel, the STL is full of goodies.
You seem to be rooted in a weakly typed language - but C++ is strongly typed.
You will 'pay' the disadvantage of strong typing almost no matter what you do, but you almost painstakingly avoid the advantage.
Let me start with the field that always says 'fruit' - my suggestion is to make this an enum, like:
enum PlantType { fruit, veggie };
Second, you have a vector that always contain 3 strings, all with the same meaning. this seems to be a job for a struct, like:
struct Post {
PlantType kind;
char firstchar;
string name;
// possibly other characteristics
};
the 'firstchar' is probably premature optimization, but lets keep that for now.
Now you want to add a new Post, to an existing vector of Posts, like:
vector<Post> mainDB;
bool AddOne( const Post& p )
{
for( auto& pp : mainDB )
if( pp.name == p.name )
return false;
mainDB.push_back(p);
return true;
}
Now you can use it like:
if( ! AddOne( Post{ fruit, 'b', "banana" } ) )
cerr << "duplicate entry";
If you need speed (at the cost of memory), switch your mainDB to map, like:
map<string,Post> mainDB;
bool AddOne( const Post& p )
{
if( mainDB.find(p.name) != mainDB.end() )
return false;
mainDB[p.name]=p;
return true;
}
this also makes it easier (and faster) to find and use a specific post, like
cout << "the fruit is called " << mainDB["banana"].name ;
beware that the above will cause a runtime error if the post dont exists
As you can see, firstchar was never used, and could be omitted. std::map
has a hash-function-specialization for string keys, and it will probably be
orders of magnitude faster than anything you or I could whip up by hand.
All of the above assumed inclusion of the correct headers, and
using namespace std;
if you dont like using namespace, prepend std:: to all the right places
hope it helps :)

how to improve natural sort program for decimals?

I have std::strings containing numbers in the leading section that I need to sort. The numbers can be integers or floats.
The vector<std::string> sort was not optimal, I found the following natural sort program which was much better. I still have a small issue with numbers smaller than zero that do not sort just right. Does anyone have a suggestion to improve? We're using Visual Studio 2003.
The complete program follows.
TIA,
Bert
#include <list>
#include <string>
#include <iostream>
using namespace std;
class MyData
{
public:
string m_str;
MyData(string str) {
m_str = str;
}
long field1() const
{
int second = m_str.find_last_of("-");
int first = m_str.find_last_of("-", second-1);
return atol(m_str.substr(first+1, second-first-1).c_str());
}
long field2() const
{
return atol(m_str.substr(m_str.find_last_of("-")+1).c_str());
}
bool operator < (const MyData& rhs)
{
if (field1() < rhs.field1()) {
return true;
} else if (field1() > rhs.field1()) {
return false;
} else {
return field2() < rhs.field2();
}
}
};
int main()
{
// Create list
list<MyData> mylist;
mylist.push_front(MyData("93.33"));
mylist.push_front(MyData("0.18"));
mylist.push_front(MyData("485"));
mylist.push_front(MyData("7601"));
mylist.push_front(MyData("1001"));
mylist.push_front(MyData("0.26"));
mylist.push_front(MyData("0.26"));
// Sort the list
mylist.sort();
// Dump the list to check the result
for (list<MyData>::const_iterator elem = mylist.begin(); elem != mylist.end(); ++elem)
{
cout << (*elem).m_str << endl;
}
return 1;
}
GOT:
0.26
0.26
0.18
93.33
485
1001
7601
EXPECTED:
0.18
0.26
0.26
93.33
485
1001
7601
Use atof() instead of atol() to have the comparison take the fractional part of the number into account. You will also need to change the return types to doubles.
If it's just float strings, I'd rather suggest to create a table with two columns (first row contains the original string, second row is filled with the string converted to float), sort this by the float column and then output/use the sorted string column.
If the data are all numbers I would create a new class to contain the data.
It can have a string to include the data but then allows you to have better methods to model behaviour - in this case espacially to implement operator <
The implementation could also include use of a library that calculates to exact precion e.g. GNU multiple precision this would do the comparison and canversion from string (or if the numbers do not have that many significant figures you could use doubles)
I would compute the values once and store them.
Because they are not actually part of the objects state (they are just calcualted values) mark them as mutable. Then they can also be set during const methods.
Also note that MyClass is a friend of itself and thus can access the private members of another object of the same class. So there is no need for the extranious accessor methods. Remember Accessor methods are to protect other classes from changes in the implementation not the class you are implementing.
The problem with ordering is that atoi() is only reading the integer (ie it stops at the '.' character. Thus all your numbers smaller than 0 have a zero value for comparison and thus they will appear in a random order. To compare against the full value you need to extract them as a floating point value (double).
class MyData
{
private:
mutable bool gotPos;
mutable double f1;
mutable double f2;
public:
/*
* Why is this public?
*/
std::string m_str;
MyData(std::string str)
:gotPos(false)
,m_str(str) // Use initializer list
{
// If you are always going to build f1,f2 then call BuildPos()
// here and then you don't need the test in the operator <
}
bool operator < (const MyData& rhs)
{
if (!gotPos)
{ buildPos();
}
if (!rhs.gotPos)
{ rhs.buildPos();
}
if (f1 < rhs.f1) return true;
if (f1 > rhs.f1) return false;
return f2 < rhs.f2;
}
private:
void buildPos() const
{
int second = m_str.find_last_of("-");
int first = m_str.find_last_of("-", second-1);
// Use boost lexical cast as it handles doubles
// As well as integers.
f1 = boost::lexical_cast<double>(m_str.substr(first + 1, second-first - 1));
f2 = boost::lexical_cast<double>(m_str.substr(second + 1));
gotPos = true;
}
};