Following is my scenario:
I am making use of a large 2D dynamic array to store elements with following attributes:
int
vector
Now, the array elements are accessed randomly. As a result, time access to elements greatly varies.
I want time access to elements to be small as well as constant for all the accessions.
Is dynamic array best suitable for my scenario?
I tries using unordered_map of boost but it seems that unordered map takes more time to access elements as compared to dynamic array.
Please give suggestions:
Code:
Code:
for( counter1=0; counter1< sizeof(chunk1); ++counter1)
{
//code lines skipped
IndexEntries &data=IndexTable[chunk1[counter1]][chunk1[counter1+1]];
DoubleTableEntries &GetValue=NewDoubleTable[NextState_chunk1][data.index];
NextState_chunk1= GetValue.Next_State;
++Bcount;
buffer[ Bcount]=NextState_chunk1;
++counter1;
// Code lines skipped
}
Here NewDoubleTable is the 2d Array from which I am accessing randomly elements.
There is nothing that can beat an array access in terms of speed, all the higher level containers like unordered_map<> add additional work. When you can use a plain array or vector<>, that is always the fastest you can get.
You need unordered_map<> only if you have a sparsely populated keyspace which prohibits use of a plain array/vector due to space considerations. In that case, the unordered_map<> can translate the keys in the sparse keyspace to a hash index into the tightly populated hash table, which in turn is nothing more or less than an array.
For random access, nothing can beat array (dynamic or not). Only this data structure provides O(1) access time on an average because the it uses consecutive memory.
Related
I have a simple requirement, i need a map of type . however i need fastest theoretically possible retrieval time.
i used both map and the new proposed unordered_map from tr1
i found that at least while parsing a file and creating the map, by inserting an element at at time.
map took only 2 minutes while unordered_map took 5 mins.
As i it is going to be part of a code to be executed on Hadoop cluster and will contain ~100 million entries, i need smallest possible retrieval time.
Also another helpful information:
currently the data (keys) which is being inserted is range of integers from 1,2,... to ~10 million.
I can also impose user to specify max value and to use order as above, will that significantly effect my implementation? (i heard map is based on rb trees and inserting in increasing order leads to better performance (or worst?) )
here is the code
map<int,int> Label // this is being changed to unordered_map
fstream LabelFile("Labels.txt");
// Creating the map from the Label.txt
if (LabelFile.is_open())
{
while (! LabelFile.eof() )
{
getline (LabelFile,inputLine);
try
{
curnode=inputLine.substr(0,inputLine.find_first_of("\t"));
nodelabel=inputLine.substr(inputLine.find_first_of("\t")+1,inputLine.size()-1);
Label[atoi(curnode.c_str())]=atoi(nodelabel.c_str());
}
catch(char* strerr)
{
failed=true;
break;
}
}
LabelFile.close();
}
Tentative Solution: After review of comments and answers, i believe a Dynamic C++ array would be the best option, since the implementation will use dense keys. Thanks
Insertion for unordered_map should be O(1) and retrieval should be roughly O(1), (its essentially a hash-table).
Your timings as a result are way OFF, or there is something WRONG with your implementation or usage of unordered_map.
You need to provide some more information, and possibly how you are using the container.
As per section 6.3 of n1836 the complexities for insertion/retreival are given:
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1836.pdf
One issue you should consider is that your implementation may need to continually be rehashing the structure, as you say you have 100mil+ items. In that case when instantiating the container, if you have a rough idea about how many "unique" elements will be inserted into the container, you can pass that in as a parameter to the constructor and the container will be instantiated accordingly with a bucket-table of appropriate size.
The extra time loading the unordered_map is due to dynamic array resizing. The resizing schedule is to double the number of cells each when the table exceeds it's load factor. So from an empty table, expect O(lg n) copies of the entire data table. You can eliminate these extra copies by sizing the hash table upfront. Specifically
Label.reserve(expected_number_of_entries / Label.max_load_factor());
Dividing by the max_load_factor is to account for the empty cells that are necessary for the hash table to operate.
unordered_map (at least in most implementations) gives fast retrieval, but relatively poor insertion speed compared to map. A tree is generally at its best when the data is randomly ordered, and at its worst when the data is ordered (you constantly insert at one end of the tree, increasing the frequency of re-balancing).
Given that it's ~10 million total entries, you could just allocate a large enough array, and get really fast lookups -- assuming enough physical memory that it didn't cause thrashing, but that's not a huge amount of memory by modern standards.
Edit: yes, a vector is basically a dynamic array.
Edit2: The code you've added some some problems. Your while (! LabelFile.eof() ) is broken. You normally want to do something like while (LabelFile >> inputdata) instead. You're also reading the data somewhat inefficiently -- what you apparently expecting is two numbers separated by a tab. That being the case, I'd write the loop something like:
while (LabelFile >> node >> label)
Label[node] = label;
So I want to create an array of nine elements, but I want the indices to be specified by me, that is, instead of accesing elements of my array,
std::array<bool,9> myarray
using myarray[0], myarray[1], myarray[2]... I want to access them, for example, as
myarray[21], myarray[34], myarray[100], myarray[9], myarray[56]...
But still conserving the properties of standard library array and storing only 9 elements.
More specifically, I need an easy access to the elements of a boolean matrix.
That is, suppose I have the matrix:
Array<array<bool,100>,100> mymatrix;
And that it is going to be used for checking certain positions (Say position x,y) easily simply using mymatrix[x][y]. I also know that some of the elements are never going to be checked, so they are not really needed. In order to save the most memory possible, the idea is to get rid of those not needed elements, but still conserving the structure to check my elements.
Such an array is best represented with one of the associative containers provided by the Standard C++ Library - i.e. either a std::map<int,bool> or an std::unordered_map<int,bool>. These containers provide an idiomatic way of doing this in C++.
One added benefit of using an associative container is the ability to iterate the values along with their external "indexes".
If you insist on using an array to store the values, you would have to make your own class that builds a "mapping" between external and internal indexes. This would either take a significant amount of memory for an O(1) access time, use CPU cycles for binary search plus an index-to-index map, use linear search, or hard-code the external indexes.
On the first glance, what you want is an std::map<int, bool>, which allows you to have your own indices. But, map is not fixed in size.
In order to get both fixed size and custom indices, you may combine a map and an array with a custom add and access functions:
map<int, bool> indices; // fill it with custom indices mapped onto the array
array<bool, n> data;
bool get(int index) {
return data[map(index)]
}
I've encountered this problem pattern multiple times in some work I'm doing, and I'm wondering if a known solution exists.
It's simple: I have a vector of elements, which in turn are vectors of some dynamic size. I know the size of the inner vectors will be relatively small (i.e. in the order of 10s of items, in the average case), but there will be a lot of them.
I can solve this naively:
vector<vector<item>> vss;
Using this approach memory allocations in the inner vector will be all over the place. Iterating over all elements within vss will be a mess cache-wise, and this may cause me performance problems.
I'm thinking this could be solved using some sort of linked list-structure with multiple heads within the same block of memory.
Assuming that the size of the inner vectors can't be predetermined, is there a way to construct and fill vss such that iterating over the elements is not going to be a cache disaster?
Thank you.
I just wanted to add my current, but hopefully temporary, solution. Instead of filling up vss directly, I use a temporary vector of pairs:
vector<pair<size_t, item>> temporaries;
, which denotes that some item should be inserted at a specific index. From here I count up the number of entries per index, allocate a single block of memory to hold the items, and move the data. Some additional vectors are used for book-keeping (i.e. number of items per index, and their starting position).
I want to allocate memory of 10^9*10^9 in a double dimension array but this is not possible.is their any way out?
I think vector could be solution to this but i dont know how to do it.
You cannot allocate 1018 bytes of memory in any computer today (that's roughly a million terabytes). However, if your data is mostly zeros (ie. is a sparse matrix), then you can use a different kind of data structure to store your data. It all depends on what kind of data you are storing and whether it has any redundant characteristics.
Assuming that the number of non-zero elements is much less than 10^18, you'll want to read up on sparse arrays. In fact, it's not even a requirement that most of the elements in a sparse array be zero -- they just need to be the same. The essential idea is to keep the non-default values in a structure like a list; any values not found in the list are assumed to be the default value.
I want to allocate memory of 10^9*10^9 in a double dimension array but this is not possible.is their any way out?
That's way beyond current hardware capabilities, and array this big is unsuitable for any practical purpose (you're free to calculate how many thousands of years it would take to walk through every element).
You need to create "sparse" array. Store only non-zero elements in memory, provide array-like interface to access them, but internally store them in something like std::map<std::pair<xcoord, ycoord>, value>, return zero for all elements not in map. As long as you don't do something reckless like trying to set every element to non-zero value, this should be sufficient array replacement.
so....
What do you need that much memory for?
i've created a dll for gamemaker. dll's arrays where really slow so after asking around a bit i learnt i could use maps in c++ and make a dll.
anyway, ill represent what i need to store in a 3d array:
information[id][number][number]
the id corresponds to an objects id. the first number field ranges from 0 - 3 and each number represents a different setting. the 2nd number field represents the value for the setting in number field 1.
so..
information[101][1][4];
information[101][2][4];
information[101][3][4];
this would translate to "object with id 101 has a value of 4 for settings 1, 2 and 3".
i did this to try and copy it with maps:
//declared as a class member
map<double, map<int, double>*> objIdMap;
///// lower down the page, in some function
map<int, double> objSettingsMap;
objSettingsMap[1] = 4;
objSettingsMap[2] = 4;
objSettingsMap[3] = 4;
map<int, double>* temp = &objSettingsMap;
objIdMap[id] = temp;
so the first map, objIdMap stores the id as the key, and a pointer to another map which stores the number representing the setting as the key, and the value of the setting as the value.
however, this is for a game, so new objects with their own id's and settings might need to be stored (sometimes a hundred or so new ones every few seconds), and the existing ones constantly need to retrieve the values for every step of the game. are maps not able to handle this? i has a very similar thing going with game maker's array's and it worked fine.
Do not use double's as a the key of a map.
Try to use a floating point comparison function if you want to compare two doubles.
1) Your code is buggy: You store a pointer to a local object objSettingsMap which will be destroyed as soon as it goes out of scope. You must store a map obj, not a pointer to it, so the local map will be copied into this object.
2) Maps can become arbitrarily large (i have maps with millions of entrys). If you need speed try hash_maps (part of C++0x, but also available from other sources), which are considerably faster. But adding some hundred entries each second shouldn't be a problem. But befre worring about execution speed you should always use a profiler.
3) I am not really sure if your nested structures MUST be maps. Depending of what number of setting you have, and what values they may have, a structure or bitfield or a vector might be more accurate.
If you need really fast associative containers, try to learn about hashes. Maps are 'fast enough' but not brilliant for some cases.
Try to analyze what is the structure of objects you need to store. If the fields are fixed I'd recommend not to use nested maps. At all. Maps are usually intended for 'average' number of indexes. For low number simple lists are more effective because of insert / erase operations lower complexity. For great number of indexes you really need to think about hashing.
Don't forget about memory. std::map is highly dynamic template so on small objects stored you loose tons of memory because of dynamic allocation. Is it what you are really expecting? Once I was involved in std::map usage removal which lowered memory requirements in about 2 times.
If you only need to fill the map at startup and only search for elements (don't need to change structure) I'd recommend simple std::vector with sort applied after all the elems inserted. And then you can just use binary search (as you have sorted vector). Why? std::vector is much more predictable thing. The biggest advantage is continuous memory area.