Split an std::set into vectors of equal size - c++

I have an std::set of integers of say 550 elements. I want to split this into vectors of size 100 or less. So, lets say, in this case we will get 6 vectors in all. First 5 vectors will be of sizes 100 and the sixth vector will be of size 50.
So, effectively I need to break std::setmasterSet into std::vector>final. Is there a good and crisp way to do it using some standard std algorithms. I dont need a class to have this logic, infact I can write a function to do this.

Hmm...I guess you could do something on this general order:
std::vector<vector<int>> ints(myset.size()/max_size);
size_t i = 0;
for (auto d : my_set)
ints[i++/max_size].push_back(d);

Related

Is there a way to switch the dimensions order in a multi-dimensional vector?

I have a 3D vector and I want to be able to chose which dimension to plot as a function of another dimension.
So far, I am doing this manually: I create a second 3D vector and re-organize the data accordingly. This solution is not very practical since I need to switch the indexes (inside the nested loop) every time I want to switch the dimensions...
Is there a better/cleaner solution ?
Thanks.
C++ does not provide multidimensional containers for the same reason that containers like std::vector do not provide a standard operator+ etc.: there is no standard that fits everyone's needs (in the case of a + operator, this could be concatenation, element-wise addition, increasing the dimensionality, who knows). If instead of a vector you take a class
<template typename T>
class volume {
private:
std::vector<T> data; // e.g. 3x2 array { 0, 1, 2, 3, 4, 5 }
std::vector<size_t> sizes; // e.g. 3x2 array { 3, 2 }
std::vector<size_t> strides; // e.g. 3x2 array { 1, 3 }
};
then you have all the flexibility you want - no need to stop at 3D!
As an example the data vector of a 3x2 array could be the first 6 natural numbers, the sizes vector would be { 3, 2 } and the strides array { 1, 3 }: in a row (of which there are 2) the elements are next to each other, to increase the row you need to move 3 positions forward.
In the general n-dimensional case you can make an at() operator that takes a vector (or an initializer_list) as a position argument, and the offset corresponding to that position is its inner product with strides.
If you don't feel like programming this from scratch then libraries like Blitz++ already provide this functionality.
No.
<RANT> C++ has no notion of multidimensional vectors. And has poor support for multidimensional arrays, because arrays are far from first class objects. So you are left with vectors of vectors [of vectors ...] and have to carefully control that all vectors in a containing vector have the same size (the language will not help you there). Or with multidimentional raw arrays... provided that the size for all dimensions but the last are known at compile time. </RANT>
As the number of dimensions are known (3D) I would go with a dedicated container using a 1D vector of size l*m*n (where l, m, and n are the size in the 3 dimensions). And a dedicated accessor function data(i,j,k). For there, it is possible to build another accessor tool that gives the data in one dimension starting from another one...
If you do not really like all that boiler plate code, you could have a look at the boost libraries. I do not use it, but if I correctly remember it contains a matrix class...

How to use lower_bound on vector of vectors?

I am relative new at C++ and I have little problem. I have vector and in that vector are vectors with 3 integers.
Inner vector represents like one person. 3 integers inside that inner vector represents distance from start, velocity and original index (because in input integers aren't sorted and in output I need to print original index not index in this sorted vector).
Now I have given some points representing distance from start and I need to find which person will be first at that point so I have been thinking that my first step would be that I would find closest person to the given point so basically I need to find lower_bound/upper_bound.
How can I use lower_bound if I want to find the lower_bound of first item in inner vectors? Or should I use struct/class instead of inner vectors?
You would use the version of std::lower_bound which takes a custom comparator (the versions marked "(2)" at the link); and you would write a comparator of vectors which compares vectors by their first item (or whatever other way you like).
Howerver:
As #doctorlove points out, std::lower_bound doesn't compare the vectors to each other, it compares them to a given value (be it a vector or a scalar). So it's possible you actually want to do something else.
It's usually not a good idea to keep fixed-length sequences of elements in std::vector's. Have you considered std::array?
It's very likely that your "vectors with 3 integers" actually stand for something else, e.g. points in a 3-dimensional geometric space; in which case, yes, they should be in some sort of class.
I am not sure that your inner things should be std::vector-s of 3 elements.
I believe that they should std::array-s of 3 elements (because you know that the size is 3 and won't change).
So you probably want to have
typedef std::array<double,3> element_ty;
then use std::vector<element_ty> and for the rest (your lower_bound point) do like in einpoklum's answer.
BTW, you probably want to use std::min_element with an explicit compare.
Maybe you want something like:
std::vector<element_ty> vec;
auto minit =
std::min_element(vec.begin(), vec.end(),
[](const element_ty& x, const element_ty&y) {
return x[0] < y[0]));

Iterating through one variable in a vector of struct with lower/upper bound

I have 2 structs, one simply has 2 values:
struct combo {
int output;
int input;
};
And another that sorts the input element based on the index of the output element:
struct organize {
bool operator()(combo const &a, combo const &b)
{
return a.input < b.input;
}
};
Using this:
sort(myVector.begin(), myVector.end(), organize());
What I'm trying to do with this, is iterate through the input varlable, and check if each element is equal to another input 'in'.
If it is equal, I want to insert the value at the same index it was found to be equal at for input, but from output into another temp vector.
I originally went with a more simple solution (when I wasn't using a structs and simply had 2 vectors, one input and one output) and had this in a function called copy:
for(int i = 0; i < input.size(); ++i){
if(input == in){
temp.push_back(output[i]);
}
}
Now this code did work exactly how I needed it, the only issue is it is simply too slow. It can handle 10 integer inputs, or 100 inputs but around 1000 it begins to slow down taking an extra 5 seconds or so, then at 10,000 it takes minutes, and you can forget about 100,000 or 1,000,000+ inputs.
So, I asked how to speed it up on here (just the function iterator) and somebody suggested sorting the input vector which I did, implemented their suggestion of using upper/lower bound, changing my iterator to this:
std::vector<int>::iterator it = input.begin();
auto lowerIt = std::lower_bound(input.begin(), input.end(), in);
auto upperIt = std::upper_bound(input.begin(), input.end(), in);
for (auto it = lowerIt; it != upperIt; ++it)
{
temp.push_back(output[it - input.begin()]);
}
And it worked, it made it much faster, I still would like it to be able to handle 1,000,000+ inputs in seconds but I'm not sure how to do that yet.
I then realized that I can't have the input vector sorted, what if the inputs are something like:
input.push_back(10);
input.push_back(-1);
output.push_back(1);
output.push_back(2);
Well then we have 10 in input corresponding to 1 in output, and -1 corresponding to 2. Obviously 10 doesn't come before -1 so sorting it smallest to largest doesn't really work here.
So I found a way to sort the input based on the output. So no matter how you organize input, the indexes match each other based on what order they were added.
My issue is, I have no clue how to iterate through just input with the same upper/lower bound iterator above. I can't seem to call upon just the input variable of myVector, I've tried something like:
std::vector<combo>::iterator it = myVector.input.begin();
But I get an error saying there is no member 'input'.
How can I iterate through just input so I can apply the upper/lower bound iterator to this new way with the structs?
Also I explained everything so everyone could get the best idea of what I have and what I'm trying to do, also maybe somebody could point me in a completely different direction that is fast enough to handle those millions of inputs. Keep in mind I'd prefer to stick with vectors because not doing so would involve me changing 2 other files to work with things that aren't vectors or lists.
Thank you!
I think that if you sort it in smallest to largest (x is an integer after all) that you should be able to use std::adjacent_find to find duplicates in the array, and process them properly. For the performance issues, you might consider using reserve to preallocate space for your large vector, so that your push back operations don't have to reallocate memory as often.

Hashing Function/Code

so I'm just learning (or trying to) a bit about hashing. I'm attempting to make a hashing function, however I'm confused where I save the data to. I'm trying to calculate the number of collisions and print that out. I have made 3 different files, one with 10,000 words, 20,000 words and 30,000 words. Each word is just 10 random numbers/letters.
long hash(char* s]){
long h;
for(int i = 0; i < 10; i++){
h = h + (int)s[i];
}
//A lot of examples then mod h by the table size
//I'm a bit confused what this table is... Is it an array of
//10,000 (or however many words)?
//h % TABLE_SIZE
return h
}
int main (int argc, char* argv[]){
fstream input(argv[1]);
char* nextWord;
while(!input.eof()){
input >> nextWord;
hash(nextWord);
}
}
So that's what I currently have, but I can't figure out what the table is exactly, as I said in the comments above... Is it a predefined array in my main with the number of words in it? For example, if I have a file of 10 words, do I make an array a of size 10 in my main? Then if/when I return h, lets say the order goes: 3, 7, 2, 3
The 4th word is a collision, correct? When that happens, I add 1 to collision and then add 1 to then check if slot 4 is also full?
Thanks for the help!
The point of hashing is to have a constant time access to every element you store. I'll try to explain on simple example bellow.
First, you need to know how much data you'd have to store. If for example you want to store numbers and you know, that you won't store numbers greater than 10. Simpliest solution is to create an array with 10 elements. That array is your "table", where you store your numbers. So how do I achieve that amazing constant time access? Hashing function! It's point is to return you an index to your array. Let's create a simple one: If you'd like to store 7, you just save it to array on position 7. Every time, you'd like to look, for element 7, you just pass it to your hasning funcion and bzaah! You got an position to your element in constant time! But what if you'd like to store more elements with value 7? Your simple hashing function is returning 7 for every element and now its position i already occupied! How to solve that? Well, there is not many solution, the simpliest are:
1: Chaining - you simply save element on first free position. This has significant draw back. Imagine, you want to delete some element ... (this is the method, you describing in question)
2: Linked list - if you create an array of pointers on some linked lists, you can easilly add your new element at the end of linked list, that is on position 7!
Both of this simple solutions has its drawbacks and cons. I guess you can see them. As #rwols has said, you don't have to use array. You can also use a tree or be a real C++ master and use unordered_map and unordered_set with custom hash function, which is quite cool. Also there is structure named trie, which is usefull, when you'd like to create some sort of dictionary (where is really hard to know, how many words you will need to store)
To sum it up. You has to know, how many things, you wan't to store and then, create ideal hashing function, that covers up array of apropriate size and in perfect world, it has to have uniform index distribution, with no colisions. (Achiving this is pretty hard and in the real world, I guess, this is impossible, so the less colisions, the better.)
Your hash function, is pretty bad. It will have lot of colisions (like strings "ab" and "ba") and also, you need to mod m it with m being the size of you array (aka. table), so you can save it to some array and you can profit of it. The modus is a way of simplyfiing the has function, because has function has to "fit" in table, that you specified in beginning, because you can't save element on position 11, 12, ... if you have array of 10.
How should good hashing function look like? Well, there is better sources than me. Some example (Alert! It's in Java)
To your example: You simply can't save 10k or even more words into table of size 10. That'll create a lot of collisions and you loose the main benefit of hashing function - constant access to elements you saved.
And how would your code look? Something like this:
int main (int argc, char* argv[]){
fstream input(argv[1]);
char* nextWord;
TypeOfElement table[size_of_table];
while(!input.eof()){
input >> nextWord;
table[hash(nextWord)] = // desired element which you want to save
}
}
But I guess, your goal isn't to save something somewhere, but to count number of colisions. Also note that code above doesn't solve colisions. If you'd like to count colisions, create array table of ints and initialize it to zero. Than, just increment the value, which is stored on index, which is returned by your hash funcion, like this:
table[hash(nextWord)]++;
I hope I helped. Please specify, what else you want to know.
If a hash table is required then as others have stated std::unordered_map will work in most cases. Now if you need something more powerful because of a large entry base, then I would suggest looking into tries. Tries combine the concepts of (Vector-Array) insertion, (Hashing) & Linked Lists. The run time is close to O(M) where M is the amount of characters in a string if you are hashing a string. It helps to remove the chance of collisions. And the more you add to a trie structure the less work has to be done as certain nodes are opened and created. The one draw back is that tries require more memory. Here is a diagram
Now your trie may vary on the size of the array due to what you are storing, but the overall concept and construction of one is the same. If you was doing a word - definition look up then you may want an array of 26 or a few more for each possible hashing character.
To count a number of words which have same hash, we should know hashes of all previous words. When you count a hash of some word, you should write it down, for example in some array. So you need an array with size equal to the number of words.
Then you should compare the new hash with all previous ones. Method of counting depends on what you need - number of pair of collisions or number off same elements.
Hash function should not be responsible for storing data. Normally you would have a container that uses hash function internally.
From what you wrote I understood that you want to create hashtable. One way you could do that (probably not the most efficient one, but should give you an idea):
#include <fstream>
#include <vector>
#include <string>
#include <map>
#include <memory>
using namespace std;
namespace example {
long hash(char* s){
long h;
for(int i = 0; i < 10; i++){
h = h + (int)s[i];
}
return h;
}
}
int main (int argc, char* argv[]){
fstream input(argv[1]);
char* nextWord;
std::map<long, std::unique_ptr<std::vector<std::string>>> hashtable;
while(!input.eof()){
input >> nextWord;
long newHash = example::hash(nextWord);
auto it = hashtable.find(newHash);
// Collision detected?
if (it == hashtable.end()) {
hashtable.insert(std::make_pair(newHash, std::unique_ptr<std::vector<std::string>>(new std::vector<std::string> { nextWord } )));
}
else {
it->second->push_back(nextWord);
}
}
}
I used some C++ 11 features to write an example faster.
I am not sure that I understand what you do not understand. The explanations below might help you.
A hash table is a kind of associative array. It is used to map keys to values in a similar manner an array is used to map indexes (keys) to values. For instance, an array of three numbers, { 11, -22, 33 }, associates index 0 to 11, index 1 to -22 and index 2 to 33.
Now, let us assume that we would like to associate 1 to 11, 2 to -22 and 3 to 33. The solution is simple: we keep the same array, only we transform the key by subtracting one from it, thus obtaining the original index
This is fine until we realize that this is just a particular case. What if the keys are not so “predictable”? A solution would be to put the associations in a list of {key, value} pairs and when someone is asking for a key, just search the list: { 123, 11}, {3, -22}, {0, 33} If the value associated to 3 is asked, we simply search the keys in list for a match and find -22. That’s fine, but if the list is large we’re in trouble. We could speed the search if we sort the array by keys and use binary search, but still the search may take some time if the list is large.
The search speed may be further enhanced if we break the list in sub-lists (or buckets) made of related pairs. This is what a hash function does: puts together pairs by related keys (an ideal hash function would associate one key to one value).
A hash table is a two columns table (an array):
The first column is the hash key (the index computed by a hash function). The size of the hash table is given by the maximum value of the hash function. If, for instance, the last step in computing the hash function is modulo 10, the size of the table will be 10; the pairs list will be broken into 10 sub-lists.
The second column is a list (bucket) of key/values pairs (the sub-list I was taking about).

std::vector is very slow?

Here is what I'm doing.
I have a class which I instance and it has a std::vector.
when I first instance the class this std::vector is empty.
The way I use it is I exponentially add to it and clear. Ex:
Add a number, clear the vector:
Add 2 numbers, clear the vector:
Add 3 numbers, clear the vector,
Add 4 numbers, clear the vector.
......
Is a std::vector the bst way to do what I'm doing?
I tried to do reserve(100,000) in the constructor but this did not help.
Is there maybe a better container for my usage?
Thanks
Your algorithm appears to be quadratic. If you really need 100,000 elements, you're adding an element 1 + 2 + 3 + ... + 100,000 times. That's about 5,000,000,000 operations. That many operations, no matter how trivial they are, will take a while on a standard pc, regardless of whether you're using std::vector or hand-crafted assembly language.
Something like:
struct A {
std::vector <int> v;
A() : v(100000) {}
};
is probably what you want. When you say:
A a;
This creates an instance the struct A (works for classes too) containing a vector of the required size.