Mapping two set of values in C++ - c++

My sincere apologies for such a naive question. I know this is simple. But nothing comes to my mind now.
I am using C++. I'm a bit concerned about efficiency since this is targeted for an embedded hardware with very less processing power and RAM.
I have 2 integer arrays with 50 members local to a function. I need to determine what is the corresponding number in the second array when an element in the first array is specified and vice versa. I have the information that the element provided to me for look-up belongs to which array i.e. array 1 or array 2.
Ex : Array1 => 500 200 1000 300 .....
Array2 => 250 170 500 400 .....
Input 500 , Output will be 250
Input 400 , Output will be 300
input 200 , Output will be 170 and so on
I think an array look-up will be least efficient. Is stl::map the best option or do i have to look for any efficient search algorithms? I would like to know if you have to do this, which option you will be choosing.
Any thoughts?

You can use std::map for readability and a little efficiency as well, though in your case efficiency is of small matter
std::map<int,int> mapping;
.... //populate
cout <<mapping[200]; //170
This is only 1 way (Array 1 -> Array 2) though. Im not sure if any easier way to do the other way, but create a second map.
To support reverse lookup, or going from (Array 2 -> Array 1), Reverse map lookup suggests using Boost.Bimap

According to me there are 2 ways of doing it both have already been suggested;
put the both arrays in a map as key pair value and traverse map to find the corresponding value or key.
Traverse the array for which the input is there and calculate the index. Get the value for that index int he other array.
I would go for the second solution as it easier. Moreover with only 50 elements in a static array you don't need to worry about performance.

Related

Redundant static data

This question applies to any type of static data. I'm only using int to keep the example simple.
I am reading in a large XML data file containing ints and storing them in a vector<int>. For the particular data I'm using, it's very common for the same value to be repeated consecutively many times.
<Node value="4" count="4000">
The count attribute means that the value is to be repeated x number of times:
for(int i = 0; i < 4000; i++)
vec.push_back(4);
It seems like a waste of memory to store the same value repeatedly when I already know that it is going to appear 4000 times in a row. However, I need to be able to index into the vector at any point.
For larger data objects, I know that I can just store a pointers but that would still involve storing 4000 identical pointers in the example above.
Is there any type of strategy to deal with an issue like this?
Use two vectors. The first vector contains the indices, the second one the actual values.
Fill in the indices vector such that the value for all indices between indices[i-1] and indices [i] is in values[i].
Then use binary search on the indices array to locate the position in the values array. Binary search is very efficient (O(log n)), and you will only use a fraction of the memory compared to the original approach.
If you assume the following data:
4000 ints with value "4"
followed by 200 ints with value "3"
followed by 5000 ints with value "10"
You would create an index vector and value vector and fill it like this:
indices = {4000, 4200, 9200}; // indices[i+1] = indices [i] + new_count or 0
values = {4,3,10};
As suggested in the other answers, you should probably wrap this in an operator[].
I would suggest to write a specific class instead of using vector.
Your class should just hold the number of times an item occurs in a list and compute the index in a smart way so you can easily retrieve an element based on the index.
Try to wrap your data into some objects with vector-like interface (operator[] and so on), so you can hide implementation detail (that is you are not actually storing 4000 numbers) yet provide similar interface.

C++ std::map vs dynamic array

I'm trying to make a 3 dimensional array of booleans that tells me if I previously visited a location in 3d space for a simple navigation algorithm. The array could be quite large (something along the lines of 1,000,000 x 1,000,000 x 1,000,000 or maybe larger), so I'm wondering if it would be faster to declare an array of that size and set each boolean value to false, or to make a map with a key of coordinate (x, y, z) and a value of type bool.
From what I figure, the array would take O(1) to find or modify a coordinate, and the map would take O(log n) to find or insert a value. Obviously, for accessing values, the array is faster. However, does this offset the time it takes to declare such an array?
Thanks
Even at 1 bit per bool, your array will take over 2**39 bytes. I'd suggest a set if there aren't too many elements that will be true.
You can use a class to hide the implementation details, and use a 1D set.
Have you tried calculating how much memory would be needed for an array like this? A lot!
Use std::map if ordering of the points is important, or std::unordeded_map if not. Also the unordered map gives you a constant time insertion and lookup.
I guess that some kind of search tree is probably what you're looking for (k-d tree for example).
You're going to make an array that is one exabyte, assuming that you use 8 bits per point? Wow, you have a lot of RAM!
I think you should re-think your approach.

Listing specific subsets using STL

Say I have a range of number, say {2,3,4,5}, stored in this order in a std::vector v, and that I want to list all possibles subsets which end with 5 using STL... that is :
2 3 4 5
2 3 5
2 4 5
3 4 5
2 5
3 5
4 5
5
( I hope i don't forget any:) )
I tried using while(next_permutation(v.begin(),v.end())) but didn't come up with the wanted result :)
Does anyone have an idea?
PS : those who have done the archives of google code jam 2010 may recognize this :)
Let's focus on the problem of printing all subsets. As you know, if you have vector of n elements, you'll have 2^n possible subsets. It's not coincidence, that if you have n-bit integer, the maximal stored value is 2^n. If you consider each integer as a vector of bits, then iterating over all possible values will give all possible subsets of bits. Well, we have subsets for free by iterating integer!
Assuming vector has not more than 32 elements (over 4 billion possible subsets!), this piece of code will print all subset of vector v (excluding empty one):
for (uint32_t mask =1; mask < (1<<v.size()); ++mask)
{
std::vector<int>::const_iterator it = v.begin();
for (uint32_t m =mask; m; (m>>=1), ++it)
{
if (m&1) std::cout << *it << " ";
}
std::cout << std::endl;
}
I just create all possible bit masks for size of vector, and iterate through every bit; if it's set, I print appropriate element.
Now applying the rule of ending with some specific number is piece of cake (by checking additional condition while looping through masks). Preferably, if there is only one 5 in your vector, you could swap it to the end and print all subsets of vector without last element.
I'm effectively using std::vector, const_iterator and std::cout, so you might think about it as being solved using STL. If I come up with something more STLish, I'll let you know (well, but how, it's just iterating). You can use this function as a benchmark for your STL solutions though ;-)
EDIT: As pointed out by Jørgen Fogh, it doesn't solve your subset blues if you want to operate on large vectors. Actually, if you would like to print all subsets for 32 elements it would generate terabytes of data. You could use 64-bit integer if you feel limited by constant 32, but you wouldn't even end iterating through all the numbers. If your problem is just answering how many are desired subsets, you definitely need another approach. And STL won't be much helpful also ;-)
As you can use any container I would use std::set because it is next to what we want to represent.
Now your task is to find all subsets ending with 5 so we take our initial set and remove 5 from it.
Now we want to have all subsets of this new set and append 5 to them at the end.
void subsets(std::set<std::set<int>> &sets, std::set<int> initial)
{
if(initial.empty())
return;
sets.insert(initial);//save the current set in the set of sets
std::set<int>::iterator i = initial.begin();
for(; i != initial.end(); i++)//for each item in the set
{
std::set<int> new_set(initial);//copy the set
new_set.erase(new_set.find(*i));//remove the current item
subsets(sets, new_set);//recursion ...
}
}
sets is a set that contains all subsets you want.
initial is the set that you want to have the subsets of.
Finally call this with subsets(all_subsets, initial_list_without_5);
This should create the subsets and finally you can append 5 to all of them. Btw don't forget the empty set :)
Also note that creating and erasing all these sets is not very efficient. If you want it faster the final set should get pointers to sets and new_set should be allocated dynamically...
tomasz describes a solution which will be workable as long as n<=32 although it will be take a very long time to print 2^32 different subsets. Since the bounds for the large dataset are 2 <= n <= 500 generating all the subsets is definitely not the way to go. You need to come up with some clever way to avoid having to generate them. In fact, this is the whole point of the problem.
You can probably find solutions by googling the problem if you want. My hint is that you need to look at the structure of the sets and avoid generating them at all. You should only calculate how many there are.
use permutation to create a vector of vectors. Then use std::partition with a function to sort it into the vectors that end with 5 and those that don't.

Indexing hash tables

I am just starting to learn hashtables, and so far, I know that you take the object you want to hash and put it through an hash function, then use the index it returns to get the corresponding object you want. There is something I don't understand though:
What structure do you use to store the objects in so you can quickly index them with the code returned by the hash function? The only thing I can think of is to use an array, but to handle all the keys, you'd have to allocate one that's 9999999999999 elements big or something ridiculous like that. Or is it as simple as iterating over a linked list or something and comparing the ID in each of the elements with the key from that hash function? And if so, that seems kind of inefficient doesn't it?
Normally, you use an array (or something similar like a vector). You pick a reasonable size (e.g., 20% larger than the number of items you expect) and some method of resolving collisions when/if two keys produce the same hash value (e.g., each of those locations is the head of a linked list of items that hashed to that value).
Yes, you usually use an array but then you do a couple of things:
You convert the hash code to an array index by using the remainder of the hash code divided by the array size.
You make the size of the array a prime number as that makes step #1 more efficient (some hash algorithms need this to get a uniform distribution)
You come up with a design to handle hash collisions. #JerryCoffin's answer gives you more detail.
Generally it's array. If the array size is N then use hash function that returns numbers in range 0..(N-1). For example apply modulo N on the hash function result.
And then use collision resolution in Wikipedia.

What algorithm is best adapt for a non-contiguous Array with Index Grouping?

I need some help writing an algorithm in C/C++ (Although any language example would work). The purpose is a container/array, which allows insertion at any index. However if inserting an element in an index that is not close to an existing index i.e. would cause an large empty space of buckets. Then the array would minimise the empty buckets.
Say you have a set of elements which need to be inserted at the following indexes:
14
54
56
57
12
8
6
5678
A contiguous array would produce a data structure. Something like this:
0
1
2
3
4
5
6 val
7
8 val
9
10
11
12 val
...
However, I'm looking for a solution that creates a new array when an index is not within x buckets of it's nearest neighbour.
Something like this:
Array1
6 val
7
8 val
10
11
12 val
13
14 val
Array2
54 val
56 val
57 val
Array 3
5678 val
Then use some kind of index map to find the array an index is in during a lookup. My question is what kind of algorithm should I be looking at to group the indexes together during inserts? (while still keeping a good space/time trade off)
Edit:
Thanks for the answers so far. The data I'm going to be looking at will contain one or two very large index ranges with no gaps, then one or two very large gaps then possibly a couple of "straggling" single values. Also the data needs to be sorted, so hash tables are out.
Why not just use a hashtable / dictionary? If you really need something this specific, the first thing that comes to mind for me is a B tree. But there's probably much better solutions than that too.
I believe you are looking for a hashmap or more generally a map. You can use the STL provided map class.
This sounds like exactly what you are looking for:
http://www.cplusplus.com/reference/stl/map/
Maybe what you want is a sparse vector? Try the Boost implementation.
You're looking either to use sparse arrays or some sort of hash, depending on circumstances. In general:
If you're going to eventually end up with long runs of filled buckets separated by large gaps, then you're better off with a sparse array, as they optimize memory use well in this situation.
If you're going to just end up with scattered entries in a huge sea of empty holes, you're better off with a hash.