Finding Common (or repeated) keys in unsorted_multimap - c++

Any idea about how to get common keys from large set of unsorted_multimap ??? I use file_name(string) as a key and its size(int) as a value. Basically I am scanning a directory for searching duplicate files using boost and holding entry of each file in unsorted_multimap. Once this map is ready I need to output common keys(file_name) and there sizes as a list of duplicate files.

How to find common keys of an unsorted_multimap ?
The following code searches for a specific filename, and iterates through all elements with the same key:
std::unordered_multimap<std::string, int> mymulti; // key: filename, value: size
//... fill the multimap
for (auto x = mymulti.find("fileb"); x != mymulti.end() && x->first == "fileb"; x++) {
std::cout << x->second << " "; // do something !
}
std::cout << "}\n"; // end something !
How to iterate through an unsorted_multimap, goupring processing by common keys ?
The following code iterates trhough the whole map, and for eacuh key, processes in a subloop the related values:
for (auto i = mymulti.begin(); i != mymulti.end(); ) { // iterate through multimap
auto group = i->first; // start a new group
std::cout << group << "={"; // start doing something for the group
do {
std::cout << i->second << " "; // do something for every values of the group
} while (++i != mymulti.end() && i->first == group); // until we change value
std::cout << "}\n"; // end something for the group
}
// end overal processing of the map
How to find duplicate files (same key and same value ) ?
Using the building blocks above, you could for every filename, you could create a temporary unsorted_map with the size as value, looking if the element is already in the temporary map (duplicate) or adding it (non duplicate).
If the whole purpose of your unsorted_multimapis to process these duplicates, then it would be pbetter, from the start to build an unosorted_map with filenames as keys, and value a multimap with size as sorted key and values, the other elements you collect on the file (full url ? inode ? wathever):
unsorted_map<string, multimap<long, filedata>> myspecialmap;

Related

Is there a way to make selecting a specific variable more efficient?

So I'm writing a program that has 9 different mazes in it stored in 2d arrays all filled with hard coded values. When the player chooses the maze, I want to copy the hard coded values from the maze selected into the 2d array of the active maze. When I wrote it out I did it in the most straightforward way possible as you can see below. Then I wanted to maze it better as it seems... bloated. A switch case wouldn't reduce the amount of lines, so I wanted to make some way to immediately put the int mazeSelection variable into the variable name of the maze selected. But it seems you can't alter a variable name during runtime, nor use a string variable to represent the name of another variable. For example string mazenumber = "maze" + tostring(mazeSelection); then doing mazenumber[11][11] doesn't work, but that's the basic idea of what I want to do.
So the upshot is, is there a way to make this code more efficient?
if(mazeSelection == 1)
maze[11][11] = maze1[11][11];
if(mazeSelection == 2)
maze[11][11] = maze2[11][11];
if(mazeSelection == 3)
maze[11][11] = maze3[11][11];
if(mazeSelection == 4)
maze[11][11] = maze4[11][11];
if(mazeSelection == 5)
maze[11][11] = maze5[11][11];
if(mazeSelection == 6)
maze[11][11] = maze6[11][11];
if(mazeSelection == 7)
maze[11][11] = maze7[11][11];
if(mazeSelection == 8)
maze[11][11] = maze8[11][11];
if(mazeSelection == 9)
maze[11][11] = maze9[11][11];
So your question is lacking in detail, but lets assume you have this
int maze[11][11], maze1[11][11]; // etc
Then the first thing to say is that
maze[11][11] = maze1[11][11];
does not copy your maze. It's a very common beginner misunderstanding that you can refer to a whole array this way, but maze1[11][11] just refers to one element of the maze at coordinates (11,11) not to the whole maze. And worse since the size of the array is 11 by 11, that element doesn't actually exist, so the code is just an error. There is (surprisingly) no simple way to copy an array in C++.
The simplest suggestion (thanks to #molbdnilo) is to put your maze inside a struct.
struct Maze
{
int tiles[11][11];
};
Maze maze, maze1; // etc
Now structs can be copied in the usual way, so
maze = maze1;
is legal code and does copy the maze.
Then you can go further and make an array of mazes, and write this simple code
Maze selected_maze, all_mazes[10];
selected_maze = all_mazes[mazeSelection];
Easiest way for your case is to store needed to choose values in vector and then just index that vector. I.e. instead of writing
if (x == 0)
y = z0;
else if (x == 1)
y = z1;
else if (x == 2)
y = z2;
you do
static vector<ValueT> values = {z0, z1, z2, ValueT()/*no answer for 3*/, z4, z5};
if (x < values.size())
y = values[x];
else
y = ValueT(); // No value found!
Note that above solution works good if keys to check (inside if condition) cover densely whole vector's range [0;size), if some values are absent you may store in corresponding vector's slots special values signifying that there is no answer.
If keys-space is to sparse then there will be too many no-value elements in vector and this solution may be to memory-wasteful, then next other solutions will do. But this vector-solution is the fastest regarding speed of getting right value by given key.
In more difficult cases, when you need to have arbitrary expression inside if condition or when you need to run arbitrary code in if body, then you need to use more advanced solutions like those that I've just coded into code below (I've coded all cases in growing complexity order). All these solutions are implementing fastest way to make if/then choice.
I'll explain code a bit.
When if condition just checks for equality to number in range [0;size) and result is just a value then we use vector. vector's values can be plain objects to return, or functional objects to be run (in case of complex handlers that are inside if-body). This choice works in constant O(1) time, i.e. very fast (see Time Complexity).
If keys to compare to are sparse (e.g. numbers 10, 20, 30, 40) then we use unordered_map or map, map can be used for any keys that are orderable (for map) or support equality and hashable (for unordered_map). This solution works in O(1) time for unordered_map, i.e. very fast (but constant may be not very small), use unordered_map if you have dozens of if-cases. For map it works in O(log(N)) time (N is number of handlers/if-bodies), so is also very fast, use it for cases below dozen. map is faster than unordered_map for small number of cases.
For the most complex case when if-conditions are complex expressions and if-bodies are also complex code-blocks, i.e. when we have func-func mapping, can be also solved fast in logarithmic time (O(log(N))) but only for the case if all if-cases can be ordered (sorted) in such a way that for each current if-case we can definitely tell if the correct matching case probably (if exists at all) lies in handlers to the left from current (flag -1) or that current case is matching one (flag 0) or that correct matching case probably (if exists at all) lies in handlers to the right from current. In other words all handlers can be ordered in one definite way for all possible input arguments (of if-condition expression-func). In this case we just do a logarithmic-time Binary Search using std::lower_bound(...).
So all recommendations are:
If keys are non-negative integers (or keys can be mapped to such integers via some simple function) and if this integers space is not too sparse (otherwise vector-solution is memory-wasteful) then use std::vector for mapping. Fetching from vector by index is O(1) time with small constant time (several CPU instructions), i.e. very fast.
If there are very many keys (more than hundreds) and keys are equality-comparable and hash-able then use std::unordered_map. Fetching time by key is O(1) for unordered time, but with not-so-small constant time (hundred of CPU instructions), i.e. also very fast and fetch time doesn't grow with number of map elements.
If keys are not too many (below hundred) and keys are fully-orderable (i.e. can be sorted) then use std::map. It has O(log(N)) time with small constant, i.e. also very fast.
If there are no keys, i.e. if-conditions are complex expressions and there are many (more than dozen) of if cases then use std::vector of sorted pairs of functions representing a pair of (if_condition_code_matcher, if_body_code). Searching matching cases would need O(log(N)) number of if-condition-code evaluations, i.e. also very fast.
If there are very few (below 5-6) if-cases or if you don't need speed or when if-cases-handlers can't be sorted regarding all arguments or rules 1-5 don't apply or you simply don't want any complex solutions then use just plain set of ifs.
Rules 1-5 are all about different ways of fast finding matching if-case. Regarding values - all structures above can store any type of value. So:
Store regular objects as structure's values (int, string, or any class object) if your if-cases where just returning some data without any code-computation, like in your case. Just return this value after obtaining structure's slot by key.
Store functions as structure's values if your if-bodies contains complex code. After fetching by key just run this function-value as a handler.
Also in case of keys being integers (or map-able to them) like in case 1 you can also use old-good switch-case. Clever compilers optimize code in such a way that they use only goto commands for jumping to matching if-body. But for this case you need to have all cases within switch being ordered, and also all values for range [0;size) should be covered in switch's cases. But such optimization is not guaranteed, compiler still may do regular sequential if-condition-trying. Hence vector solution is the only guaranteed optimization.
Code below can also be run online here.
#include <unordered_map>
#include <functional>
#include <iostream>
#include <stdexcept>
#include <string>
#include <vector>
#include <utility>
#include <algorithm>
using namespace std;
#define ASSERT_MSG(cond, msg) { if (!(cond)) throw runtime_error("Failed assertion [" #cond "]! Msg: " + string(msg)); }
typedef string ValT;
// Makes choice in fixed O(1) time.
// Suitable only for having possible mapping for all keys in range [0;size).
ValT const & HandleValToValVec(size_t i) {
static std::vector<ValT> handlers = {"zero", "one", "two", "three"};
if (i < handlers.size()) {
ValT const & val = handlers[i];
cout << "Key " << i << " was mapped to \"" << val << "\"" << endl;
return val;
} else {
cout << "Key " << i << " has no mapping!" << endl;
static ValT null_val;
return null_val;
}
}
typedef size_t KeyT;
// Makes choice in fixed O(1) time.
// Suitable for any comparable keys or if keys space is sparse (not covering range [0;size)).
ValT const & HandleValToValMap(KeyT const & key) {
static std::unordered_map<KeyT, ValT> handlers = {{10, "ten"}, {20, "twenty"}};
auto it = handlers.find(key);
if (it == handlers.end()) {
cout << "Key " << key << " has no mapping!" << endl;
static ValT null_val;
return null_val;
} else {
cout << "Key " << key << " was mapped to \"" << it->second << "\"" << endl;
return it->second;
}
}
// Makes choice in fixed O(1) time.
void HandleValToFunc(KeyT const & key) {
// Handlers containing any arbitrary code, "static" here is important to re-create array only once.
static std::unordered_map< KeyT, function<void()> > handlers = {
{KeyT(10), [&](){
cout << "Chosen Key 10" << endl;
}},
{KeyT(15), [&](){
cout << "Chosen Key 15" << endl;
}},
};
auto it = handlers.find(key);
if (it == handlers.end())
cout << "No Handler for Key " << key << endl;
else
it->second();
}
typedef string ArgT;
// Makes choice in logarithmic O(log(N)) time, where N is number of handlers.
void HandleFuncToFunc(ArgT const & arg0) {
// Handlers containing any arbitrary code, "static" here is important to re-create array only once.
// First function in handlers's pair should return -1 if matching handler probably lies to the left,
// 0 if this handler is matching, 1 if probably lies to the right.
static std::vector< pair< function<int(ArgT const &)>, function<void()> > > handlers = {
{[](ArgT const & arg0)->int{
return arg0.size() < 3 ? -1 : arg0.size() < 5 ? 0 : 1;
}, [&](){
cout << "Chosen String with len within range [3;5)." << endl;
}},
{[](ArgT const & arg0)->int{
return arg0.size() < 6 ? -1 : arg0.size() < 8 ? 0 : 1;
}, [&](){
cout << "Chosen String with len within range [6;8)." << endl;
}},
};
auto it = std::lower_bound(handlers.begin(), handlers.end(), arg0, [](auto const & handler, ArgT const & arg0) {
return handler.first(arg0) > 0;
});
if (it == handlers.end() || it->first(arg0) != 0)
cout << "No Handler for String \"" << arg0 << "\"" << endl;
else
it->second();
}
int main() {
try {
HandleValToValVec(0); HandleValToValVec(3); HandleValToValVec(5);
HandleValToValMap(10); HandleValToValMap(20); HandleValToValMap(30);
HandleValToFunc(10); HandleValToFunc(15); HandleValToFunc(20);
HandleFuncToFunc("ab"); HandleFuncToFunc("abcd"); HandleFuncToFunc("abcde"); HandleFuncToFunc("abcdef"); HandleFuncToFunc("abcdefgh");
return 0;
} catch (exception const & ex) {
cerr << "Exception: " << ex.what() << endl;
return -1;
}
}

How to search in a map/multimap starting from specific position

I want to search in a map/multimap but not all of it. Instead I want to start in a specific position.
In the following example I want to find the two first numbers that sum b. And return their value.
multimap<int, int> a;
a.insert(make_pair(2, 0));
a.insert(make_pair(2, 1));
a.insert(make_pair(5, 2));
a.insert(make_pair(8, 3));
int b = 4;
for(auto it = a.begin(); it != a.end(); ++it) {
auto it2 = a.find(b - it->first); //Is there an equivalent that starts from "it+1"?
if(it2 != a.end()) {
cout << it->second << ", " << it2->second << endl;
break;
}
}
output:
0, 0
desired output:
0, 1
Is it possible to achieve specific position search in a map?
How to search in a map starting from specific position
You could use std::find. But this is not ideal, since it has linear complexity compared to logarithmic complexity of a map lookup. The interface of std::map doesn't support such operation for lookups.
If you need such operation, then you need to use another data structure. It should be possible to implement by augmenting a (balanced) search tree with a parent node pointer. The downside is of course increased memory use and constant overhead on operations that modify the tree structure.
not from the beginning to the end.
Map look ups do not start from "the beginning" of the range. They start from the root of the tree.
If you're using an ordered map (which it sounds like you are), then it already does binary search with std::find. This function returns an iterator type, so assuming you were looking for the value of some key x, then consider the following lines:
std::map<char,int> mymap;
mymap['x'] = 24;
std::map<char,int>::iterator itr = mymap.find('x');
std::cout << "x=" << itr->second << std::endl;
The reason your code wasn't compiling was likely because you tried to return a pair iterator, which won't exactly print to output all that well. Instead, calling itr->second allows you to retrieve the value associated with the desired key.

How to create and push table(with key/value pair) to lua from C++?

I wanted to return table(with key/value pair) which contains functions to lua from C++ function.
On the lua side, return value of the function was table. But, table was empty.
I tried string instead of function, but it didn't worked, too.
If I use index instead of key, it works. But I want to put a key, not a index.
lua_newtable(L);
for(list<NativeFunction_t>::iterator it = nativeFuncs.begin(); it != nativeFuncs.end(); it++)
{
NativeFunction_t tmp = *it;
cout << "Loading " << tmp.Name << " to lua...";
lua_pushstring(L, tmp.Name);
//If I do lua_pushstring(L, (Index)) instead of above, it works.
//lua_pushstring(L, tmp.Name);
lua_pushcfunction(L, tmp.Func);
lua_settable(L, -3);
cout << "Success" << endl;
}
//lua_setglobal(L, loadAs);
cout << "Done" << endl;
return 1;
Is something wrong with the way I create and return the table?
And here is lua code.
print("Loading NativeLoader...")
NativeLoader = require("Module")
print("Loading library...")
NativeLoader.Load("Hello", "TestLibrary")
print("Looking library...")
print("TestLibrary: " ..#TestLibrary)
for n, item in ipairs(TestLibrary) do
print(item)
end
--t.hello()
--t.bye()
It looks like you are using string keys in the table.
ipairs only traverses through integer keys from 1 to n.
As suggested by siffejoe in the comments, you should be using pairs instead.
However you should note that pairs does not loop through the elements in the order they were inserted in the table.
If you need the elements to be in a specific order, you might want to return an additional table containing the string keys in the specific order. Or you may want to make the original table you return into an array that contains tables that contain the name and the function at different table keys.
Also note that the length operator only works on sequences of integer keys.
So for your tables using only string keys it would always return 0.
I suggest you read through the lua reference manual for the length operator, pairs and ipairs. Here are links for lua 5.3 manual:
http://www.lua.org/manual/5.3/manual.html#3.4.7
http://www.lua.org/manual/5.3/manual.html#pdf-pairs
http://www.lua.org/manual/5.3/manual.html#pdf-ipairs

Boost R-tree : unable to remove values?

First, my code:
// Reads the index
bi::managed_mapped_file file(bi::open_or_create, indexFile.c_str(), bf::file_size(indexFile.c_str()));
allocator_t alloc(file.get_segment_manager());
rtree_t * rtree_ptr = file.find_or_construct<rtree_t>("rtree")(params_t(), indexable_t(), equal_to_t(), alloc);
std::cout << "The index contains " << rtree_ptr->size() << " entries." << std::endl;
std::ifstream inf(transFile.c_str());
std::string line;
while(getline(inf,line))
{
transition t = transition(line);
point A;
A.set<0>(t.getQ1mz()-1);
A.set<1>(t.getQ3mz()-1);
A.set<2>(0.3);
A.set<3>(0.2);
value_t v = std::make_pair(A,t);
rtree_ptr->insert(v);
rtree_ptr->remove(v);
}
std::cout << "Finished. The index now contains " << rtree_ptr->size() << " entries." << std::endl;
It reads the R-tree from a memory-mapped file. Then, it reads an input file, transFile, make ten so-called "transition" objects from it's content, and inserts them in the tree. Immediately after, it removes them. This is a useless case, but it illustrates well the problem that the removal steps don't work. The output I get is :
The index contains 339569462 entries.
Finished. The index now contains 339569472 entries.
So clearly, the size of the tree increases by ten, because the ten insertions worked like a charm ; but if the removals were working, in the end the tree should have the same size as before, which is not the case.
I have followed the syntax about removing values from an R-tree described here, and all compiles properly, but for some strange reason it just doesn't remove the value. My guess might be that since it deletes by value, it might just not find the value to delete, but how can it be since the value is the one just inserted one line ago?

Rearrange list the same way as another one

I bumped into a page where there were a lot of categories and next to each one the number of items in each category, wrapped in parenthesis. Something really common. It looked like this:
Category 1 (2496)
Category 2 (34534)
Category 3 (1039)
Category 4 (9)
...
So I was curious and I wanted to see which categories had more items and such, and since all categories were all together in the page I could just select them all and copy them in a text file, making things really easy.
I made a little program that reads all the numbers, store them in a list and sort them. In order to know what category the number it belonged to I would just Ctrl + F the number in the browser.
But I thought it would be nice to have the name of the category next to the number in my text file, and I managed to parse them in another file. However, they are not ordered, obviously.
This is what I could do so far:
bool is_number(const string& s) {
return !s.empty() && find_if(s.begin(), s.end(), [](char c) { return !isdigit(c); }) == s.end();
}
int main() {
ifstream file;
ofstream file_os, file_t_os;
string word, text; // word is the item count and text the category name
list<int> words_list; // list of item counts
list<string> text_list; // list of category names
file.open("a.txt");
file_os.open("a_os.txt");
file_t_os.open("a_t_os.txt");
while (file >> word) {
if (word.front() == '(' && word.back() == ')') { // check if it's being read something wrapped in parenthesis
string old_word = word;
word.erase(word.begin());
word.erase(word.end()-1);
if (is_number(word)) { // check if it's a number (item count)
words_list.push_back(atoi(word.c_str()));
text.pop_back(); // get rid of an extra space in the category name
text_list.push_back(text);
text.clear();
} else { // it's part of the category name
text.append(old_word);
text.append(" ");
}
} else {
text.append(word);
text.append(" ");
}
}
words_list.sort();
for (list<string>::iterator it = text_list.begin(); it != text_list.end(); ++it) {
file_t_os << *it << endl;
}
for (list<int>::iterator it = words_list.begin(); it != words_list.end(); ++it) {
file_os << fixed << *it << endl;
}
cout << text_list.size() << endl << words_list.size() << endl; // I'm getting the same count
}
Now I forget about having the name next to the number, because something more interesting occured to me. I thought it would be interesting to find a way to rearrange the strings in the text_list which contain the names of the categories in the exact same way the list with the item count was sorted.
Let me explain with an example, lets say we have the following categories:
A (5)
B (3)
C (10)
D (6)
The way I'm doing it I will have a list<int> containing this: {10, 6, 5, 3} and a list<string> containing this: {A, B, C, D}.
What I'm saying is I want to find a way I can keep track of the way the elements were rearranged in the first list and apply that very pattern to the second list. What would be the rearrange pattern? It would be: the first item (5) goes to the third position, the second one (3) to the fourth one, the third one (10) to the first one, and so on.... Then this pattern should be applied to the other list, so that it would end up like this: {C, D, A, B}.
The thing would be to keep track of the Pattern and apply it to the list below.
Is there any way I can do this? Any particular function that could help me? Any way to track all the swaps and switches the sort algorithm does so it can be applied to a different list with the same size? What about a different sorting algorithm?
I know this might be highly inefficient and a bad idea, but it seemed like a little challenge.
I also know I could just pair both string and int, category and item count, in some sort of container like pair or map or make a container class of my own and sort the items based on the item count (I guess map would be the best choice, what do you think?), but this is not what I am asking.
The best way to do this would be to create a list that contains both sets of information you want to sort and feed in a custom sorting function.
For instance:
struct Record {
string name;
int count;
};
list<Record> myList;
sort(myList, [](Record a, Record b){
return a.count < b.count;
});
In the general case, it's always better to manage one list of a complex datatype, than to try to separately manage two or more lists of simple datatypes, especially when they're mutable.
Some more improve way:
First some notes:
It's recommended to storage category name and items together, for clarity, easy of read code, etc...
It's better use std::vector instead of std::list (see Bjarne Stroustrup opinion)
The code load the file with the format specified in your question, storage in the vector the info pair.
Use std::sort function to sort only by items number (the categories with the same items would be in any order, if you would like to sort for category name the categories with the same items change the lambda body to return std::tie(left.items, left.name) > std::tie(right.items, right.name);.
Added a version with info split, in one collection items and index (to correlate items with name) info, and in the other names info.
Code:
#include <iostream>
#include <fstream>
#include <algorithm>
#include <vector>
bool is_number(const std::string& s) {
return !s.empty() &&
find_if(s.begin(), s.end(), [](char c) { return !isdigit(c); }) ==
s.end();
}
struct category_info {
std::string name;
int items;
};
struct category_items_info {
int items;
size_t index;
};
int main() {
std::ifstream file("H:\\save.txt");
std::vector<category_info> categories;
std::vector<category_items_info> categories_items;
std::vector<std::string> categories_names;
std::string word;
std::string text;
while (file >> word) {
if (word.front() == '(' && word.back() == ')') {
std::string inner_word = word.substr(1, word.size() - 2);
if (is_number(inner_word)) {
std::string name = text.substr(0, text.size() - 1);
int items = atoi(inner_word.c_str());
categories.push_back(category_info{name, items});
categories_names.push_back(name);
categories_items.push_back(
category_items_info{items, categories_items.size()});
text.clear();
} else { // it's part of the category name
text.append(word);
text.append(" ");
}
} else {
text.append(word);
text.append(" ");
}
}
std::sort(categories.begin(), categories.end(),
[](const category_info& left, const category_info& right) {
return left.items > right.items;
});
std::sort(
categories_items.begin(), categories_items.end(),
[](const category_items_info& left, const category_items_info& right) {
return left.items > right.items;
});
std::cout << "Using the same storage." << std::endl;
for (auto c : categories) {
std::cout << c.name << " (" << c.items << ")" << std::endl;
}
std::cout << std::endl;
std::cout << "Using separated storage." << std::endl;
for (auto c : categories_items) {
std::cout << categories_names[c.index] << " (" << c.items << ")"
<< std::endl;
}
}
Output obtained:
Using the same storage.
Category 2 (34534)
Category 1 (2496)
Category 3 (1039)
Category 4 (9)
Using separated storage.
Category 2 (34534)
Category 1 (2496)
Category 3 (1039)
Category 4 (9)
Lists do not support random access iterators, so this is going to be a problem, since a list can't be permuted based on a vector (or array) of indices without doing a lot of list traversal back and forth to emulate random access iteration. NetVipeC's solution was to use vectors instead of lists to get around this problem. If using vectors, then you could generate a vector (or array) of indices to the vector to be sorted, then sort the vector indices using a custom compare operator. You could then copy the vectors according to the vector of sorted indices. It's also possible to reorder a vector in place according to the indices, but that algorithm also sorts the vector of indices, so you're stuck making a copy of the sorted indices (to sort the second vector), or copying each vector in sorted index order.
If you really want to use lists, you could implement your own std::list::sort, that would perform the same operations on both lists. The Microsoft version of std::list::sort uses an array of lists where the number of nodes in array[i] = 2^i, and it merges nodes one at a time into the array, then when all nodes are processed, it merges the lists in the array to produce a sorted list. You'd need two arrays, one for each list to be sorted. I can post example C code for this type of list sort if you want.