Finding which bin a values fall into - c++

I'm trying to find which category C a double x belongs to.
My categories are defined as strings names and doubles values in a file like this
A 1.0
B 2.5
C 7.0
which should be interpreted like this
"A": 0 < x <= 1.0
"B": a < x <= 2.5
"C": b < x <= 7.0
(the input can have arbitrary length and may have to be sorted by their values). I simply need a function like this
std::string findCategory(categories_t categories, double x) {
...insert magic here
}
so for this example I'd expect
findCategory(categories, 0.5) == "A"
findCategory(categories, 1.9) == "B"
findCategory(categories, 6.0) == "C"
So my question is a) how to write the function and b) what the best choice of category_t may be (using stl in pre 11 C++). I made several attempts, all of which were... less than successful.

One option would be to use the std::map container with doubles as keys and values corresponding to what value is assigned to the range whose upper endpoint is the given value. For example, given your file, you would have a map like this:
std::map<double, std::string> lookup;
lookup[1.0] = "A";
lookup[2.5] = "B";
lookup[7.0] = "C";
Then, you could use the std::map::lower_bound function, given some point, to get back the key/value pair whose key (upper endpoint) is the first key in the map that is at least as large as the point in question. For example, with the above map, lookup.lower_bound(1.37) would return an iterator whose value is "B." lookup.lower_bound(2.56) would return an iterator whose value is "C." These lookups are fast; they take O(log n) time for a map with n elements.
In the above, I'm assuming that the values you're looking up are all nonnegative. If negative values are allowed, you can add a quick test in to check whether the value is negative before you do any lookups. That way, you can eliminate spurious results.
For what it's worth, if you happen to know something about the distribution of your lookups (say, they're uniformly distributed) it's possible to build a special data structure called an optimal binary search tree that will give better access times than the std::map. Also, depending on your application, there may be even faster options available. For example, if you're doing this because you want to randomly choose one of the outcomes with differing probabilities, then I would suggest looking into this article on the alias method, which lets you generate random values in O(1) time.
Hope this helps!

You can use the pair type and the 'lower_bound' from < algorithm >
http://www.cplusplus.com/reference/algorithm/lower_bound/.
Let's define your categories in terms of the upper edge:
typedef pair categories_t;
Then just make a vector of those edges and search it using binary search. See the full example below.
#include <string>
#include <vector>
#include <algorithm>
#include <iostream>
using namespace std;
typedef pair<double,string> category_t;
std::string findCategory(const vector<category_t> &categories, double x) {
vector<category_t>::const_iterator it=std::lower_bound(categories.begin(), categories.end(),category_t(x,""));
if(it==categories.end()){
return "";
}
return it->second;
}
int main (){
vector< category_t > edges;
edges.push_back(category_t(0,"bin n with upper edge at 0 (underflow)"));
edges.push_back(category_t(1,"bin A with upper edge at 1"));
edges.push_back(category_t(2.5,"bin B with upper edge at 2.5"));
edges.push_back(category_t(7,"bin C with upper edge at 7"));
edges.push_back(category_t(8,"bin D with upper edge at 8"));
edges.push_back(category_t(9,"bin E with upper edge at 9"));
edges.push_back(category_t(10,"bin F with upper edge at 10"));
vector< double > examples ;
examples.push_back(1);
examples.push_back(3.3);
examples.push_back(7.4);
examples.push_back(-5);
examples.push_back(15);
for( vector< double >::const_iterator eit =examples.begin();eit!=examples.end();++eit)
cout << "value "<< *eit << " : " << findCategory(edges,*eit) << endl;
}
The comparisons works the way we want it to since the double is the first in the pair, and pairs are compared first by comparing the first and then the second constituent. Else we would define a compare predicate as described at the page I linked above.

Related

Efficient way of ensuring newness of a set

Given set N = {1,...,n}, consider P different pre-existing subsets of N. A subset, S_p, is characterized by the 0-1 n vector x_p where the ith element is 0 or 1 depending on whether the ith (of n) items is part of the subset or not. Let us call such x_ps indicator vectors.
For e.g., if N={1,2,3,4,5}, subset {1,2,5} is represented by vector (1,0,0,1,1).
Now, given P pre-existing subsets and their associated vectors x_ps.
A candidate subset denoted by vector yis computed.
What is the most efficient way of checking whether y is already part of the set of P pre-existing subsets or whether y is indeed a new subset not part of the P subsets?
The following are the methods I can think of:
(Method 1) Basically, we have to do an element by element check against all pre-existing sets. Pseudocode follows:
for(int p = 0; p < P; p++){
//(check if x_p == y by doing an element by element comparison)
int i;
for(i = 0; i < n; i++){
if(x_pi != y_i){
i = 999999;
}
}
if(i < 999999)
return that y is pre-existing
}
return that y is new
(Method 2) Another thought that comes to mind is to store the decimal equivalent of the indicator vectors x_ps (where the indicator vectors are taken to be binary representations) and compare it with the decimal equivalent of y. That is, if set of P pre-existing sets is: { (0,1,0,0,1), (1,0,1,1,0) }, the stored decimals for this set would be {9, 22}. If y is (0,1,1,0,0), we compute 12 and check this against the set {9, 22}. The benefit of this method is that for each new y, we don't have to check against the n elements of every pre-existing set. We can just compare the decimal numbers.
Question 1. It appears to me that (Method 2) should be more efficient than (Method 1). For (Method 2), is there an efficient way (inbuilt library function in C/C++) that converts the x_ps and y from binary to decimal? What should be data type of these indicator variables? For e.g., bool y[5]; or char y[5];?
Question 2. Is there any method more efficient than (Method 2)?
As you've noticed, there's a trivial isomorphism between your indicator vectors and N-bit integers. That means the answer to your question 2 is "no": the tools available for maintain a set and testing membership in it are the same as integers (hash tables bring the normal approach). A commented mentioned Bloom fillers, which can efficiently test membership at the risk of some false positives, but Bloom filters are generally for much larger data sizes than you're looking at.
As for your question 1: Method 2 is reasonable, and it's even easier than you think. While vector<bool> doesn't give you an easy way to turn it into integer blocks, on implementations I'm aware of it's already implemented this way (the C++ standard allows special treatment of that particular vector type, something that is generally considered nowadays to have been a poor decision, but which occasionally yields some benefit). And those vectors are hashable. So just keep an unordered_set<vector<bool>> around, and you'll get performance which is reasonably close to the optimum. (If you know N at compile time you may want to prefer bitset to vector<bool>.)
Method 2 can be optimized by calculating the decimal equivalent of the given subset and hashing it using modulus 1e9+7. This results in different decimal numbers every time since N<=1000(No collision occurs).
#define M 1000000007 //big prime number
unordered_set<long long> subset; //containing decimal representation of all the
//previous found subsets
/*fast computation of power of 2*/
long long Pow(long long num,long long pow){
long long result=1;
while(pow)
{
if(pow&1)
{
result*=num;
result%=M;
}
num*=num;
num%=M;
pow>>=1;
}
return result;
}
/*checks if subset pre exists*/
bool check(vector<bool> booleanVector){
long long result=0;
for(int i=0;i<booleanVector.size();i++)
if(booleanVector[i])
result+=Pow(2,i);
return (subset.find(result)==subset.end());
}

Is there a way to use an enum for characters in a string? C++

This was taken off LeetCode but basically given a string composed of a few unique characters that each have an associated integer value, I need to quickly process the total integer value of the string. I thought enums would be useful since you know what is going to compose your strings.
The enum is the types of characters that can be in my string (can see that it's limited). If a character with a smaller value is before a character with a bigger value, like IV, then I subtract the preceding character's value from the one after it. Otherwise you add. The code is my attempt, but I can't get enums to work with my algorithm...
std::string s = "III";
int sum = 0;
enum {I = 1, V = 5, X = 10, L = 50, C = 100, D = 500, M = 1000};
// O(n) iteration.
for (int i = 0; i < s.length(); i++) {
// Must subtract.
if (s[i] < s[i+1]) {
sum += s[i+1] - s[i];
}
// Add.
else {
sum += s[i];
}
}
std::cout << "sum is: " << sum;
My questions then are 1) Is using enum with a string possible? 2) I know it's possible to do with a unordered_map but I think enums is much quicker.
If you won't mind minor memory overhead, you can do something like this:
int table[256];
table['I']=1;
table['V']=5;
...
and then
sum += table[s[i]];
and so on. This approach is guaranteed to be O(1), which is basically the fastest solution you able to get. You can also use std::array instead of POD array, encapsulate all this in some class and add assertions, but this is the idea.
2) I know it's possible to do with a unordered_map but I think enums
is much quicker.
you're comparing oranges with apples.
first, enum is not a container. it's basically just like a list of known constants.
when you mean the access time of operator[]:
for unordered_map:
Unordered map is an associative container that contains key-value
pairs with unique keys. Search, insertion, and removal of elements
have average constant-time complexity.
for string it's also constant time access.
1) Is using enum with a string possible
No. An enum key is basically like an "alias" for the value. Note that each string is a sequence of characters:
V != "V"
It is not possible to convert a char or a string to an enum without some kind of mapping. Because the compiler replaces the enum with its underlying value during compilation. So you cannot dynamically access the enum with its name stored in a string.
You have to use either any one of map family or if else construct to achieve your need.

Pseudocode or algorithm for finding 'n' sets that have the most unique values?

Say I have many sets of integers. The number of integers can vary between each set. I am looking for 'n' number of sets that has the most unique integers between them. If n=4, then I'm looking for 4 sets out of all of the available sets that has the greatest possible number of unique integers between them (so not counting duplicates).
If total number of sets = N is not too large:
a brute force approach would be the following one:
consider each of the (N choose n) possible combinations of sets and evaluate the number of unique integers they form by "merging and removing duplicates in a vector then checking size" till you get the maximum after all evaluations.
starting from this you can make more and more efficient algorithms by using dynamic programming or eliminating many of the (N choose n) for example after finding some MAX=K then if total number of ints in this specific n set is less than K you do not evaluate it...etc
that's a rough draft to get you started
You tagged C++. If I understand you correctly, the following does what you described. Since std::set stores unique values anyway, the C++ code solution becomes straightforward.
#include <vector>
#include <set>
#include <algorithm>
#include <iostream>
typedef std::set<int> IntSet;
typedef std::vector<IntSet> IntSetV;
// sort the sets in ascending order, by size
bool SortBySetSize(const IntSet& s1, const IntSet& s2)
{ return s1.size() > s2.size(); }
void OutputResults(const IntSet& s)
{ std::cout << "There are " << s.size() << " unique integers in this set" << std::endl; }
void InputData(IntSet& s)
{
// routine to input data into s
}
using namespace std;
int main()
{
size_t nSets;
cout << "Enter number of sets: ";
cin >> nSets;
IntSetV VSets(nSets);
//... input to fill in the sets in the vector
for_each(VSets.begin(), VSets.end(), InputData);
// sort the sets by size
std::sort(VSets.begin(), VSets.end(), SortBySetSize);
// VSets now contains the N largest set of unique integers.
for_each(VSets.begin(), VSets.end(), OutputResults);
}
If you need to remember the original inputted values, maybe store
std::pair<std::vector<int>, IntSet> PairSet;
std::vector<PairSet> IntSetV;
The first value in the pair is the original vector, the second is the set that represents the numbers in the vector. Then the general code solution can be used, with the requisite changes added.
This is the NP-hard maximum coverage problem. The greedy algorithm (grow the union by the set with the most new elements) achieves a solution that is within a factor 1 - 1/e (~ 63%) of optimum. Even though maximum coverage is NP-hard, integer programming often can find optimal solutions for "natural" instances (as opposed to those resulting from an intelligently designed NP-hardness reduction). The main challenge would be integrating a solver; in particular, the solver implements all of the relevant algorithms. The most straightforward formulation for maximum coverage is this.
maximize sum_{elements e} x_e
subject to
for all elements e, x_e - sum_{input sets S such that e in S} y_S <= 0
sum_{input sets S} y_S <= n
for all elements e, 0 <= x_e <= 1
for all input sets S, y_S in {0, 1}
The meaning of variable x_e is whether x_e appears in the union. The meaning of y_S is that it's 1 if S appears in the union, and 0 otherwise.

boost::unordered_map is... ordered?

I have a boost::unordered_map, but it appears to be in order, giving me an overwhelming feeling of "You're Doing It Wrong". Why is the output to this in order? I would've expected the underlying hashing algorithm to have randomized this order:
#include <iostream>
#include <boost/unordered_map.hpp>
int main()
{
boost::unordered_map<int, int> im;
for(int i = 0; i < 50; ++i)
{
im.insert(std::make_pair(i, i));
}
boost::unordered_map<int, int>::const_iterator i;
for(i = im.begin(); i != im.end(); ++i)
{
std::cout << i->first << ", " << i->second << std::endl;
}
return 0;
}
...gives me...
0, 0
1, 1
2, 2
...
47, 47
48, 48
49, 49
Upon examination of boost's source code:
inline std::size_t hash_value(int v)
{
return static_cast<std::size_t>(v);
}
...which would explain it. The answers below hold the higher level thinking, as well, which I found useful.
While I can't speak to the boost internals as I'm not a C++ guy, I can propose a few higher-level questions that may alleviate your concerns:
1) What are the guarantees of an "unordered" map? Say you have an ordered map, and you want to create a map that does not guarantee ordering. An initial implementation may simply use the ordered map. It's almost never a problem to provide stronger guarantees than you advertise.
2) A hash function is something that hashes X -> int. If you already have an integer, you could use the identity function. While it may not be the most efficient in all cases, it could explain the behavior you're seeing.
Basically, seeing behavior like this is not necessarily a problem.
It is probably because your hashes are small integers.
Hash tables usually calculate the number of bucket in which to put the item like this: bucket_index = hash%p where p is a prime number, which is the number of hashtable buckets, which is large enough to provide low frequency of collisions.
For integers hash equals to the value of the integer.
You have a lot of data, so hashtable selects a large p.
For any p larger than i, bucket_index = i%p = i.
When iterating, the hashtable returns items from its buckets in order of their indexes, which for you is the order of keys. :)
Try using larger numbers if you want to see some randomness.
You're doing it right. unordered_map doesn't claim to have random order. In fact, it makes no claims about order whatsoever. You shouldn't expect anything whatsoever in terms of order, and that goes for disorder!
This is because map by default is ordered by 'order of insertion of keys' means if you insert keys 1,2,3,4,5 and print it you will always get 1,2,3,4,5 so it looks ordered. Try to add with random key values and see the result. It will not be same every time, as it should not be.

looking for an efficient data structure to do a quick searches

I have a list of elements around 1000. Each element (objects that i read from the file, hence i can arrange them efficiently at the beginning) containing contains 4 variables. So now I am doing the following, which is very inefficient at grand scheme of things:
void func(double value1, double value2, double value3)
{
fooArr[1000];
for(int i=0;i<1000; ++i)
{
//they are all numeric! ranges are < 1000
if(fooArr[i].a== value1
&& fooArr[i].b >= value2;
&& fooArr[i].c <= value2; //yes again value2
&& fooArr[i].d <= value3;
)
{
/* yay found now do something!*/
}
}
}
Space is not too important!
MODIFIED per REQUEST
If space isn't too important the easiest thing to do is to create a hash based on "a" Depending on how many conflicts you get on "a" it may make sense to make each node in the hash table point to a binary tree based off of "b" If b has a lot of conflicts, do the same for c.
That first index into the hash, depending on how many conflicts, will save you a lot of time for very little coding or data structures work.
First, sort the list on increasing a and decreasing b. Then build an index on a (values are integers from 0 to 999. So, we've got
int a_index[1001]; // contains starting subscript for each value
a_index[1000] = 1000;
for (i = a_index[value1]; i < a_index[value1 + 1] && fooArr[i].b >= value2; ++i)
{
if (fooArr[i].c <= value2 && fooArr[i].d <= value3) /* do stuff */
}
Assuming I haven't made a mistake here, this limits the search to the subscripts where a and b are valid, which is likely to cut your search times drastically.
Since you are have only three properties to match you could use a hash table. When performing a search, you use the hash table (which indexes the a-property) to find all entries where a matches SomeConstant. After that you check if b and c also match your constants. This way you can reduce the number of comparisons. I think this would speed the search up quite a bit.
Other than that you could build three binary search trees. One sorted by each property. After searching all three of them you perform your action for those which match your values in each tree.
Based on what you've said (in both the question and the comments) there are only a very few values for a (something like 10).
That being the case, I'd build an index on the values of a where each one points directly to all the elements in the fooArr with that value of a:
std::vector<std::vector<foo *> > index(num_a_values);
for (int i=0; i<1000; i++)
index[fooArr[i].a].push_back(&fooArr[i]);
Then when you get a value to look up an item, you go directly to those for which fooArr[i].a==value1:
std::vector<foo *> const &values = index[value1];
for (int i=0; i<values.size(); i++) {
if (value2 <= values[i]->b
&& value2 >= values[i]->c
&& value3 >= values[i]->d) {
// yay, found something
}
}
This way, instead of looking at 1000 items in fooArray each time, you look at an average of 100 each time. If you want still more speed, the next step would be to sort the items in each vector in the index based on the value of b. This will let you find the lower bound for value2 using a binary search instead of a linear search, reducing ~50 comparisons to ~10. Since you've sorted it by b, from that point onward you don't have to compare value2 to b -- you know exactly where the rest of the numbers that satisfy the inequality are, so you only have to compare to c and d.
You might also consider another approach based on the limited range of the numbers: 0 to 1000 can be represented in 10 bits. Using some bit-twiddling, you could combine three fields into a single 32-bit number, which would let the compiler compare all three at once, instead of in three separate operations. Getting this right is a little tricky, but once you to, it could roughly triple the speed again.
I think using kd-tree would be appropriate.
If there aren't many conflicts with a then hashing/indexing a might resolve your problem.
Anyway if that doesn't work I suggest using kd-tree.
First do a table of multiple kd-trees. Index them with a.
Then implement a kd-tree for each a value with 3-dimensions in directions b, c, d.
Then when searching - first index to appropriate kd-tree with a, and then search from kd-tree with your limits. Basically you'll do a range search.
Kd-tree
You'll get your answer in O(L^(2/3)+m), where L is the number of elements in appropriate kd-tree and m is the number of matching points.
Something better that I found is Range Tree. This might be what you are looking for.
It's fast. It'll answer your query in O(log^3(L)+m). (Unfortunately don't know about Range Tree much.)
Well, let's have a go.
First of all, the == operator calls for a pigeon-hole approach. Since we are talking about int values in the [0,1000] range, a simple table is good.
std::vector<Bucket1> myTable(1001, /*MAGIC_1*/); // suspense
The idea of course is that you will find YourObject instance in the bucket defined for its a attribute value... nothing magic so far.
Now on the new stuff.
&& fooArr[i].b >= value2
&& fooArr[i].c <= value2 //yes again value2
&& fooArr[i].d <= value3
The use of value2 is tricky, but you said you did not care for space right ;) ?
typedef std::vector<Bucket2> Bucket1;
/*MAGIC_1*/ <-- Bucket1(1001, /*MAGIC_2*/) // suspense ?
A BucketA instance will have in its ith position all instances of YourObject for which yourObject.c <= i <= yourObject.b
And now, same approach with the d.
typedef std::vector< std::vector<YourObject*> > Bucket2;
/*MAGIC_2*/ <-- Bucket2(1001)
The idea is that the std::vector<YourObject*> at index ith contains a pointer to all instances of YourObject for which yourObject.d <= i
Putting it altogether!
class Collection:
{
public:
Collection(size_t aMaxValue, size_t bMaxValue, size_t dMaxValue);
// prefer to use unsigned type for unsigned values
void Add(const YourObject& i);
// Pred is a unary operator taking a YourObject& and returning void
template <class Pred>
void Apply(int value1, int value2, int value3, Pred pred);
// Pred is a unary operator taking a const YourObject& and returning void
template <class Pred>
void Apply(int value1, int value2, int value3, Pred pred) const;
private:
// List behaves nicely with removal,
// if you don't plan to remove, use a vector
// and store the position within the vector
// (NOT an iterator because of reallocations)
typedef std::list<YourObject> value_list;
typedef std::vector<value_list::iterator> iterator_vector;
typedef std::vector<iterator_vector> bc_buckets;
typedef std::vector<bc_buckets> a_buckets;
typedef std::vector<a_buckets> buckets_t;
value_list m_values;
buckets_t m_buckets;
}; // class Collection
Collection::Collection(size_t aMaxValue, size_t bMaxValue, size_t dMaxValue) :
m_values(),
m_buckets(aMaxValue+1,
a_buckets(bMaxValue+1, bc_buckets(dMaxValue+1))
)
)
{
}
void Collection::Add(const YourObject& object)
{
value_list::iterator iter = m_values.insert(m_values.end(), object);
a_buckets& a_bucket = m_buckets[object.a];
for (int i = object.c; i <= object.b; ++i)
{
bc_buckets& bc_bucket = a_bucket[i];
for (int j = 0; j <= object.d; ++j)
{
bc_bucket[j].push_back(index);
}
}
} // Collection::Add
template <class Pred>
void Collection::Apply(int value1, int value2, int value3, Pred pred)
{
index_vector const& indexes = m_buckets[value1][value2][value3];
BOOST_FOREACH(value_list::iterator it, indexes)
{
pred(*it);
}
} // Collection::Apply<Pred>
template <class Pred>
void Collection::Apply(int value1, int value2, int value3, Pred pred) const
{
index_vector const& indexes = m_buckets[value1][value2][value3];
// Promotion from value_list::iterator to value_list::const_iterator is ok
// The reverse is not, which is why we keep iterators
BOOST_FOREACH(value_list::const_iterator it, indexes)
{
pred(*it);
}
} // Collection::Apply<Pred>
So, admitedly adding and removing items to that collections will cost.
Furthermore, you have (aMaxValue + 1) * (bMaxValue + 1) * (dMaxValue + 1) std::vector<value_list::iterator> stored, which is a lot.
However, Collection::Apply complexity is roughly k applications of Pred where k is the number of items which match the parameters.
I am looking for a review there, not sure I got all the indexes right oO
If your app is already using a database then just put them in a table and use a query to find it. I use mysql in a few of my apps and would recommend it.
First for each a do different table...
do a tabel num for numbers that have the same a.
do 2 index tabels each with 1000 rows.
index table contains integer representation of a split which numbers
will be involved.
For example let's say you have values in the array
(ignoring a because we have a table for each a value)
b = 96 46 47 27 40 82 9 67 1 15
c = 76 23 91 18 24 20 15 43 17 10
d = 44 30 61 33 21 52 36 70 98 16
then the index table values for the row 50, 20 are:
idx[a].bc[50] = 0000010100
idx[a].d[50] = 1101101001
idx[a].bc[20] = 0001010000
idx[a].d[20] = 0000000001
so let's say you do func(a, 20, 50).
Then to get which numbers are involved you do:
g = idx[a].bc[20] & idx[a].d[50];
Then g has 1-s for each number you have to deal with. If you don't
need the array values then you can just do a populationCount on g. And
do the inner thing popCount(g) times.
You can do
tg = g
n = 0
while (tg > 0){
if(tg & 1){
// do your stuff
}
tg = tg >>> 1;
n++;
}
maybe it can be improved in tg = tg >>> 1; n++; part by skipping over many zeros, but I have no idea if that's possible. It should considerably faster than your current approach because all variables for the loop are in registers.
As pmg said, the idea is to eliminate as many comparisons as possible. Obviously you won't have 4000 comparisons. That would require that all 1000 elements pass the first test, which would then be redundant. Apparently there are only 10 values of a, hence 10% passes that check. So, you'd do 1000 + 100 + ? + ? checks. Let's assume +50+25, for a total of 1175.
You'd need to know how a,b,c,d and value1, 2 and 3 are distributed to decide exactly what's fastest. We only know that a can have 10 values, and we presume that value1 has the same domain. In that case, binning by a can reduce it to an O(1) operation to get the right bin, plus the same 175 checks further on. But if b,c and value2 effectively form 50 buckets, you could find the right bucket again in O(1). Yet each bucket would now have an average of 20 elements, so you'd only need 35 tests (80% reduction). So, data distribution matters here. Once you understand your data, the algorithm will be clear.
Look, this is just a linear search. It would be nice if you could do a search that scales up better, but your complex matching requirements make it unclear to me whether it's even possible to, say, keep it sorted and use a binary search.
Having said this, perhaps one possibility is to generate some indexes. The main index might be a dictionary keyed on the a property, associating it with a list of elements with the same value for that property. Assuming the values for this property are well-distributed, it would immediately eliminate the overwhelming majority of comparisons.
If the property has a limited number of values, then you could consider adding an additional index which sorts items by b and maybe even another that sorts by c (but in the opposite order).
You can use hash_set from Standard Template Library(STL), this will give you very efficient implementation. complexity of your search would be O(1)
here is link: http://www.sgi.com/tech/stl/hash_set.html
--EDIT--
declare new Struct which will hold your variables, overload comparison operators and make the hash_set of this new struct. every time you want to search, create new object with your variables and pass it to hash_set method "find".
It seems that hash_set is not mandatory for STL, therefore you can use set and it will give you O(LogN) complexity for searching.
here is example:
#include <cstdlib>
#include <iostream>
#include <set>
using namespace std;
struct Obj{
public:
Obj(double a, double b, double c, double d){
this->a = a;
this->b = b;
this->c = c;
this->d = d;
}
double a;
double b;
double c;
double d;
friend bool operator < ( const Obj &l, const Obj &r ) {
if(l.a != r.a) return l.a < r.a;
if(l.b != r.b) return l.a < r.b;
if(l.c != r.c) return l.c < r.c;
if(l.d != r.d) return l.d < r.d;
return false;
}
};
int main(int argc, char *argv[])
{
set<Obj> A;
A.insert( Obj(1,2,3,4));
A.insert( Obj(16,23,36,47));
A.insert(Obj(15,25,35,43));
Obj c(1,2,3,4);
A.find(c);
cout << A.count(c);
system("PAUSE");
return EXIT_SUCCESS;
}