Array with custom indices - c++

So I want to create an array of nine elements, but I want the indices to be specified by me, that is, instead of accesing elements of my array,
std::array<bool,9> myarray
using myarray[0], myarray[1], myarray[2]... I want to access them, for example, as
myarray[21], myarray[34], myarray[100], myarray[9], myarray[56]...
But still conserving the properties of standard library array and storing only 9 elements.
More specifically, I need an easy access to the elements of a boolean matrix.
That is, suppose I have the matrix:
Array<array<bool,100>,100> mymatrix;
And that it is going to be used for checking certain positions (Say position x,y) easily simply using mymatrix[x][y]. I also know that some of the elements are never going to be checked, so they are not really needed. In order to save the most memory possible, the idea is to get rid of those not needed elements, but still conserving the structure to check my elements.

Such an array is best represented with one of the associative containers provided by the Standard C++ Library - i.e. either a std::map<int,bool> or an std::unordered_map<int,bool>. These containers provide an idiomatic way of doing this in C++.
One added benefit of using an associative container is the ability to iterate the values along with their external "indexes".
If you insist on using an array to store the values, you would have to make your own class that builds a "mapping" between external and internal indexes. This would either take a significant amount of memory for an O(1) access time, use CPU cycles for binary search plus an index-to-index map, use linear search, or hard-code the external indexes.

On the first glance, what you want is an std::map<int, bool>, which allows you to have your own indices. But, map is not fixed in size.
In order to get both fixed size and custom indices, you may combine a map and an array with a custom add and access functions:
map<int, bool> indices; // fill it with custom indices mapped onto the array
array<bool, n> data;
bool get(int index) {
return data[map(index)]
}

Related

C++ Fixed Size Container to Store Most Recent Values

I would like to know what the most suitable data structure is for the following problem in C++
I am wanting to store 100 floats ordered by recency. So when I add (push) a new item the other elements are moved up one position. Every time an event is triggered I receive a value and then add it to my data structure.
When the number of elements reaches 100, I would like to remove (pop) the item at the end (the oldest).
I want to able to iterate over all the elements and perform some mathematical operations on them.
I have looked at all the standard C++ containers but none of them fulfill all my needs. What's the easiest way to achieve this with standard C++ code?
You want a circular buffer. You can use Boost's implementation or make your own by allocating an array, and keeping track of the beginning and end of the used range. This boils down to doing indexing modulo 100.
Without creating your own or using a library, std::vector is the most efficient standard data structure for this. Once it has reached its maximum size, there will be no more dynamic memory allocations. The cost of moving up 100 floats is trivial compared to the cost of dynamic memory allocations. (This is why std::list is a slow data structure for this). There is no push_front function for vector. Instead you have to use v.insert(v.begin(), f)
Of course this assumes what you are doing is performance-critical, which it probably isn't. In that case I would use std::deque for more convenient usage.
Just saw that you need to iterator over them. Use a list.
Your basic function would look something like this
void addToList(int value){
list100.push_back(value);
if(list100.size() > 100){
list100.pop_front();
}
}
Iterating over them is easy as well:
for(int val : list100){
sum += val;
}
// Average, or whatever you need to do
Obviously, if you're using something besides int, you'll need to change that. Although this adds a little bit more functionality than you need, it's very efficient since it's a doubly linked list.
http://www.cplusplus.com/reference/list/list/
You can use either std::array, std::dequeue, std::list or std::priority_queue
A MAP (std::map) should be able to solve your requirement. Use Key as the object and value as the current push number nPusheCount which gets incremented whenever you add an element to map.
when adding a new element to map, if you have less than 100 elements, just add the number to the MAP as key and nPushCount as the value.
If you have 100 elements already, check if the number exists in map already and do following:
If the number already exists in map, then add the number as key and nPushCount as value;
If doesnt, delete the number with lowest nPushCount as value and then add the desired number with updated nPushCount.

Complexity of boost::multi_array reshape() function

What is the complexity of boost::multi_array reshape() function? I expect it to be O(1) but I can't find this info in the documentation. The documentation for this library is actually pretty scarce.
The reason I'm asking is that I would like to iterate through a multi_array object using a single loop (I don't care about array indices). It seems like the library doesn't provide a way of iterating through an array using a single iterator. So, as a workaround, I'd like to reshape the array along a single dimension first (with other dimensions set to 1). Then I can iterate through the array using a single loop. However, I'm not sure how efficient the reshape() operation is.
Hence my second question: Is there an easy way to iterate through all the elements of a multi-array object using a single loop?
Below is the implementation of reshape function in multi_array_ref.hpp file.
template <typename SizeList>
void reshape(const SizeList& extents) {
boost::function_requires<
CollectionConcept<SizeList> >();
BOOST_ASSERT(num_elements_ ==
std::accumulate(extents.begin(),extents.end(),
size_type(1),std::multiplies<size_type>()));
std::copy(extents.begin(),extents.end(),extent_list_.begin());
this->compute_strides(stride_list_,extent_list_,storage_);
origin_offset_ =
this->calculate_origin_offset(stride_list_,extent_list_,
storage_,index_base_list_);
}
It looks like the function just re-indexes the elements in extents object associated with array size. The function is linear in the number of elements in extends. But I think it's complexity is constant in the total number of elements in the array.

c++: Random access to Dynamic Array

Following is my scenario:
I am making use of a large 2D dynamic array to store elements with following attributes:
int
vector
Now, the array elements are accessed randomly. As a result, time access to elements greatly varies.
I want time access to elements to be small as well as constant for all the accessions.
Is dynamic array best suitable for my scenario?
I tries using unordered_map of boost but it seems that unordered map takes more time to access elements as compared to dynamic array.
Please give suggestions:
Code:
Code:
for( counter1=0; counter1< sizeof(chunk1); ++counter1)
{
//code lines skipped
IndexEntries &data=IndexTable[chunk1[counter1]][chunk1[counter1+1]];
DoubleTableEntries &GetValue=NewDoubleTable[NextState_chunk1][data.index];
NextState_chunk1= GetValue.Next_State;
++Bcount;
buffer[ Bcount]=NextState_chunk1;
++counter1;
// Code lines skipped
}
Here NewDoubleTable is the 2d Array from which I am accessing randomly elements.
There is nothing that can beat an array access in terms of speed, all the higher level containers like unordered_map<> add additional work. When you can use a plain array or vector<>, that is always the fastest you can get.
You need unordered_map<> only if you have a sparsely populated keyspace which prohibits use of a plain array/vector due to space considerations. In that case, the unordered_map<> can translate the keys in the sparse keyspace to a hash index into the tightly populated hash table, which in turn is nothing more or less than an array.
For random access, nothing can beat array (dynamic or not). Only this data structure provides O(1) access time on an average because the it uses consecutive memory.

Efficient frequency counter

I have 15,000,000 std:vectors of 6 integers.
Those 15M vectors contain duplicates.
Duplicate example:
(4,3,2,0,4,23)
(4,3,2,0,4,23)
I need to obtain a list of unique sequence with their associated count. (A sequence that is only present once would have a 1 count)
Is there an algorithm in the std C++ (can be x11) that does that in one shot?
Windows, 4GB RAM, 30+GB hdd
There is no such algorithm in the standard library which does exactly this, however it's very easy with a single loop and by choosing the proper data structure.
For this you want to use std::unordered_map which is typically a hash map. It has expected constant time per access (insert and look-up) and thus the first choice for huge data sets.
The following access and incement trick will automatically insert a new entry in the counter map if it's not yet there; then it will increment and write back the count.
typedef std::vector<int> VectorType; // Please consider std::array<int,6>!
std::unordered_map<VectorType, int> counters;
for (VectorType vec : vectors) {
counters[vec]++;
}
For further processing, you most probably want to sort the entries by the number of occurrence. For this, either write them out in a vector of pairs (which encapsulates the number vector and the occurrence count), or in an (ordered) map which has key and value swapped, so it's automatically ordered by the counter.
In order to reduce the memory footprint of this solution, try this:
If you don't need to get the keys back from this hash map, you can use a hash map which doesn't store the keys but only their hashes. For this, use size_t for the key type, std::identity<std::size_t> for the internal hash function and access it with a manual call to the hash function std::hash<VectorType>.
std::unordered_map<std::size_t, int, std::identity<std::size_t> > counters;
std::hash<VectorType> hashFunc;
for (VectorType vec : vectors) {
counters[hashFunc(vec)]++;
}
This reduces memory but requires an additional effort to interpret the results, as you have to loop over the original data structure a second time in order to find the original vectors (then look-up them in your hash map by hashing them again).
Yes: first std::sort the list (std::vector uses lexicographic ordering, the first element is the most significant), then loop with std::adjacent_find to find duplicates. When a duplicate is found, use std::adjacent_find again but with an inverted comparator to find the first non-duplicate.
Alternately, you could use std::unique with a custom comparator that flags when a duplicate is found, and maintains a count through the successive calls. This also gives you a deduplicated list.
The advantage of these approaches over std::unordered_map is space complexity proportional to the number of duplicates. You don't have to copy the entire original dataset or add a seldom-used field for dup-count.
You should convert each vector element to string one by one like this "4,3,2,0,4,23".
Then add them into a new string vector by controlling their existance with find() function.
If you need original vector, convert string vector to another integer sequence vector.
If you do not need delete duplicated elements while making sting vector.

How to associate to a number another number without using array

Let's say we have read these values:
3
1241
124515
5322353
341
43262267234
1241
1241
3213131
And I have an array like this (with the elements above):
a[0]=1241
a[1]=124515
a[2]=43262267234
a[3]=3
...
The thing is that the elements' order in the array is not constant (I have to change it somewhere else in my program).
How can I know on which position does one element appear in the read document.
Note that I can not do:
vector <int> a[1000000000000];
a[number].push_back(all_positions);
Because a will be too large (there's a memory restriction). (let's say I have only 3000 elements, but they're values are from 0 to 2^32)
So, in the example above, I would want to know all the positions 1241 is appearing on without iterating again through all the read elements.
In other words, how can I associate to the number "1241" the positions "1,6,7" so I can simply access them in O(1) (where 1 actually is the number of positions the element appears)
If there's no O(1) I want to know what's the optimal one ...
I don't know if I've made myself clear. If not, just say it and I'll update my question :)
You need to use some sort of dynamic array, like a vector (std::vector) or other similar containers (std::list, maybe, it depends on your needs).
Such data structures are safer and easier to use than C-style array, since they take care of memory management.
If you also need to look for an element in O(1) you should consider using some structures that will associate both an index to an item and an item to an index. I don't think STL provides any, but boost should have something like that.
If O(log n) is a cost you can afford, also consider std::map
You can use what is commonly refered to as a multimap. That is, it stores Key and multiple values. This is an O(log) look up time.
If you're working with Visual Studios they provide their own hash_multimap, else may I suggest using Boost::unordered_map with a list as your value?
You don't need a sparse array of 1000000000000 elements; use an std::map to map positions to values.
If you want bi-directional lookup (that is, you sometimes want "what are the indexes for this value?" and sometimes "what value is at this index?") then you can use a boost::bimap.
Things get further complicated as you have values appearing more than once. You can sacrifice the bi-directional lookup and use a std::multimap.
You could use a map for that. Like:
std::map<int, std::vector<int>> MyMap;
So everytime you encounter a value while reading the file, you append it's position to the map. Say X is the value you read and Y is the position then you just do
MyMap[X].push_back( Y );
Instead of you array use
std::map<int, vector<int> > a;
You need an associative collection but you might want to associated with multiple values.
You can use std::multimap< int, int >
or
you can use std::map< int, std::set< int > >
I have found in practice the latter is easier for removing items if you just need to remove one element. It is unique on key-value combinations but not on key or value alone.
If you need higher performance then you may wish to use a hash_map instead of map. For the inner collection though you will not get much performance in using a hash, as you will have very few duplicates and it is better to std::set.
There are many implementations of hash_map, and it is in the new standard. If you don't have the new standard, go for boost.
It seems you need a std::map<int,int>. You can store the mapping such as 1241->0 124515->1 etc. Then perform a look up on this map to get the array index.
Besides the std::map solution offered by others here (O(log n)), there's the approach of a hash map (implemented as boost::unordered_map or std::unordered_map in C++0x, supported by modern compilers).
It would give you O(1) lookup on average, which often is faster than a tree-based std::map. Try for yourself.
You can use a std::multimap to store both a key (e.g. 1241) and multiple values (e.g. 1, 6 and 7).
An insert has logarithmic complexity, but you can speed it up if you give the insert method a hint where it can insert the item.
For O(1) lookup you could hash the number to find its entry (key) in a hash map (boost::unordered_map, dictionary, stdex::hash_map etc)
The value could be a vector of indices where the number occurs or a 3000 bit array (375 bytes) where the bit number for each respective index where the number (key) occurs is set.
boost::unordered_map<unsigned long, std::vector<unsigned long>> myMap;
for(unsigned long i = 0; i < sizeof(a)/sizeof(*a); ++i)
{
myMap[a[i]].push_back(i);
}
Instead of storing an array of integer, you could store an array of structure containing the integer value and all its positions in an array or vector.